Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.
| Version: | 1.2.3 | 
| Imports: | dlr (≥ 1.0.0), fastmatch, magrittr, memoise (≥ 2.0.0), morphemepiece.data, piecemaker (≥ 1.0.0), purrr (≥ 0.3.4), readr, rlang, stringr (≥ 1.4.0) | 
| Suggests: | dplyr, fs, ggplot2, here, knitr, remotes, rmarkdown, testthat (≥ 3.0.0), utils | 
| Published: | 2022-04-16 | 
| DOI: | 10.32614/CRAN.package.morphemepiece | 
| Author: | Jonathan Bratt | 
| Maintainer: | Jonathan Bratt <jonathan.bratt at macmillan.com> | 
| BugReports: | https://github.com/macmillancontentscience/morphemepiece/issues | 
| License: | Apache License (≥ 2) | 
| URL: | https://github.com/macmillancontentscience/morphemepiece | 
| NeedsCompilation: | no | 
| Materials: | README, NEWS | 
| CRAN checks: | morphemepiece results | 
| Reference manual: | morphemepiece.html , morphemepiece.pdf | 
| Vignettes: | Testing the fall-through algorithm (source, R code) Generating a Vocabulary and Lookup (source, R code) | 
| Package source: | morphemepiece_1.2.3.tar.gz | 
| Windows binaries: | r-devel: morphemepiece_1.2.3.zip, r-release: morphemepiece_1.2.3.zip, r-oldrel: morphemepiece_1.2.3.zip | 
| macOS binaries: | r-release (arm64): morphemepiece_1.2.3.tgz, r-oldrel (arm64): morphemepiece_1.2.3.tgz, r-release (x86_64): morphemepiece_1.2.3.tgz, r-oldrel (x86_64): morphemepiece_1.2.3.tgz | 
| Old sources: | morphemepiece archive | 
Please use the canonical form https://CRAN.R-project.org/package=morphemepiece to link to this page.