1Cademy - Different standards for tokenization

How it works Courses Research Communities Benefits About Us

Learn Before

Tokenization

Relation

Different standards for tokenization

Word tokenization: Penn Treebank tokenization; NLTK
Character tokenization
Subword tokenization: byte-pair encoding(BPE); wordpiece algorithm with MaxMatch decoding; SentencePiece

0

1

Updated 2026-05-25

Contributors are:

Claude Opus

Jing Cao

Who are from:

Claude

University of Michigan - Ann Arbor

University of Michigan - Ann Arbor

References

Tags

Data Science

D2L

Dive into Deep Learning @ D2L

Related

Learn After