Question 1

How does tokenization work?

Accepted Answer

Tokenization works by applying a predefined algorithm to split text into tokens. For subword methods like BPE, the algorithm iteratively merges the most frequent character pairs in a training corpus to create a vocabulary of subword units. During inference, the tokenizer matches the input text against this vocabulary, splitting unknown words into smaller known pieces.

Question 2

What is the difference between word-level and subword tokenization?

Accepted Answer

Word-level tokenization splits text into whole words, resulting in a large vocabulary and poor handling of rare or compound words. Subword tokenization breaks words into smaller units like prefixes, suffixes, or character groups, allowing a smaller vocabulary and better coverage of unseen words. Subword methods are preferred in modern LLMs for their efficiency and flexibility.

Question 3

How does token count affect model usage?

Accepted Answer

Token count determines the computational cost and context window of a model. Models have a maximum token limit (e.g., 4096 tokens for GPT-3.5), and exceeding it truncates input or requires chunking. API pricing is often per token, so longer token sequences increase cost. Efficient tokenization reduces token count, saving resources and enabling longer conversations or documents.

Token / Tokenization

Token / Tokenization

Why it matters

FAQ

How does tokenization work?

What is the difference between word-level and subword tokenization?

How does token count affect model usage?

Token / Tokenization

Why it matters

Related terms

FAQ

How does tokenization work?

What is the difference between word-level and subword tokenization?

How does token count affect model usage?