| Type: | Package | 
| Title: | Count Words and Characters in R Markdown and Jupyter Notebooks | 
| Version: | 0.3.1 | 
| Date: | 2025-05-20 | 
| Description: | Computes word, character, and non-whitespace character counts in R Markdown documents and Jupyter notebooks, with or without code chunks. Returns results as a data frame. | 
| Imports: | jsonlite, knitr, rstudioapi | 
| Suggests: | testthat | 
| License: | GPL-3 | 
| URL: | https://github.com/sigbertklinke/rmdwc | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-05-20 10:43:26 UTC; sigbert | 
| Author: | Sigbert Klinke [aut, cre] | 
| Maintainer: | Sigbert Klinke <sigbert@hu-berlin.de> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-05-20 12:00:02 UTC | 
Count text elements in Jupyter Notebook files
Description
This function extracts text from specific cell types (e.g., markdown) in one or more .ipynb files
and counts the number of characters, words, and lines. It optionally excludes certain patterns (e.g., code fences).
The function uses a helper function rmdcount() to perform the counting on the extracted text.
Usage
ipynbcount(
  files,
  celltype = c("markdown"),
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n",
  exclude = "```\\{.*?```"
)
Arguments
| files | character: vector of paths to  | 
| celltype | character: vector indicating which cell types to include (default is  | 
| space | character: pattern to split a text at spaces (default:  | 
| word | character: pattern to split a text at word boundaries (default:  | 
| line | character: pattern to split lines (default:  | 
| exclude | character: pattern to exclude text parts, e.g. code chunks (default:  | 
Details
This function assumes that the notebook files are valid JSON and contain a list of cells under the cells field.
It temporarily writes the extracted content to a file to reuse the rmdcount() logic.
Value
A data frame with counts of characters, words, and lines for each file. Additional columns include file (base name) and path (directory).
Examples
file <- system.file('ipynb/example_data_analysis.ipynb', package="rmdwc")
ipynbcount(file)                                   # without code
ipynbcount(file, celltype=c("markdown", "code"))   # with code
Word, character and non-whitespace characters count
Description
rmdcount counts lines, words, bytes, characters and non-whitespace characters in R Markdown files excluding code chunks.
txtcount counts lines, words, bytes, characters and non-whitespace characters in plain text files.
Note that the counts may differ a bit from unix wc and Libre Office because
it depends on the definition of a line, a word and a character.
Usage
rmdcount(
  files = NULL,
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n",
  exclude = "```\\{.*?```"
)
txtcount(
  files = NULL,
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n"
)
Arguments
| files | character: file name(s) | 
| space | character: pattern to split a text at spaces (default:  | 
| word | character: pattern to split a text at word boundaries (default:  | 
| line | character: pattern to split lines (default:  | 
| exclude | character: pattern to exclude text parts, e.g. code chunks (default:  | 
Details
We define:
- Line
- the number of lines. It differs from unix - wc -lsince- wccounts the number of newlines.
- Word
- it is considered to be a character or characters delimited by white space. However, a "word" is in general a fuzzy concept, for example is "3.141593" a word? Therefore different programs may count differently, for more details see the discussion to the Libreoffice bug Word count gives wrong results - Another Example Comment 5. 
The following approach is used to detect lines, words, characters and non-whitespace characters.
- lines
- strsplit(rmd, line)[[1]]with- line='\n'
- bytes
- charToRaw(rmd)
- words
- strsplit(rmd, word)[[1]]with- word='[[:space:]]+'
- characters
- strsplit(rmd, '')[[1]]
- non-whitespace characters
- strsplit(gsub(space, '', rmd), '')[[1]]with- space='[[:space:]]'
If txtcount is used then code chunks are deleted with gsub('```\\{.*?```', '', rmd) before counting.
Value
a data frame with following elements
- file
- basename of file 
- lines
- number of lines 
- words
- number of words 
- bytes
- number of bytes 
- chars
- number of characters 
- nonws
- number of non-whitespace characters 
- path
- path of file 
Examples
# count excluding code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
rmdcount(files)
# count including code chunks
txtcount(files) # or rmdcount(files, exclude='')
# count for a set of R Markdown docs
files <- list.files(path=system.file('rmarkdown', package="rmdwc"), 
                    pattern="*.Rmd", full.names=TRUE)
rmdcount(files)
# use of rmdcount() in a R Markdown document 
if (interactive()) {
  files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
  file.edit(files) # SAVE(!) the file and knit it 
}
# count including code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
txtcount(files)
rmdcountAddin
Description
Applies rmdcount to the current R Markdown document
Usage
rmdcountAddin()
Value
nothing
Examples
if (interactive()) rmdcountAddin()
Word-, character and non-whitespace characters count for a text
Description
Counts words, characters and non-whitespace characters in a string. Is used in rmdcount, see details there.
Usage
rmdwcl(rmd, space = "[[:space:]]", word = "[[:space:]]+", line = "\n")
Arguments
| rmd | character: R Markdown document as string | 
| space | character: pattern to split a text at spaces (default:  | 
| word | character: pattern to split a text at word boundaries (default:  | 
| line | character: pattern to split lines (default:  | 
Value
a list
Examples
file  <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
fcont <- readChar(file, file.info(file)$size)
rmdwcl(fcont)