Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.
| Version: | 0.2.1 | 
| Depends: | R (≥ 4.2.0) | 
| Imports: | R6, cli | 
| Suggests: | rmarkdown, testthat (≥ 3.0.0), hfhub (≥ 0.1.1), withr | 
| Published: | 2025-09-30 | 
| DOI: | 10.32614/CRAN.package.tok | 
| Author: | Daniel Falbel [aut, cre],
  Regouby Christophe [ctb],
  Posit [cph] tok author details | 
| Maintainer: | Daniel Falbel <daniel at posit.co> | 
| BugReports: | https://github.com/mlverse/tok/issues | 
| License: | MIT + file LICENSE | 
| URL: | https://github.com/mlverse/tok | 
| NeedsCompilation: | yes | 
| SystemRequirements: | Cargo (Rust's package manager), rustc >= 1.75 | 
| Materials: | README, NEWS | 
| CRAN checks: | tok results | 
| Reference manual: | tok.html , tok.pdf | 
| Package source: | tok_0.2.1.tar.gz | 
| Windows binaries: | r-devel: tok_0.2.1.zip, r-release: tok_0.2.1.zip, r-oldrel: tok_0.2.1.zip | 
| macOS binaries: | r-release (arm64): tok_0.2.1.tgz, r-oldrel (arm64): tok_0.2.1.tgz, r-release (x86_64): tok_0.2.1.tgz, r-oldrel (x86_64): tok_0.2.1.tgz | 
| Old sources: | tok archive | 
Please use the canonical form https://CRAN.R-project.org/package=tok to link to this page.