Question 1

How does it work?

Accepted Answer

The context window is implemented through the model's attention mechanism, which calculates relationships between all tokens within the window. During processing, the model assigns attention weights to each token pair, allowing it to reference earlier tokens when generating later ones. The window size is a fixed architectural parameter, often determined by the maximum sequence length the model was trained on.

Question 2

What happens when input exceeds the context window?

Accepted Answer

When input exceeds the context window, the model typically truncates the beginning of the prompt or fails to process the excess tokens. Some models may raise an error, while others silently drop tokens beyond the limit. This can cause loss of context, especially in long conversations or documents, where early information becomes inaccessible.

Question 3

How does context window size differ between models?

Accepted Answer

Context window sizes vary significantly across models, from 2,048 tokens in older models to 128,000 tokens in recent ones like GPT-4 Turbo or Claude 3. Larger windows require more memory and computation, so they are often reserved for high-end models. Specialized models, such as those for code or long documents, may prioritize larger windows, while smaller models trade off context for efficiency.

Context Window

Context Window

Why it matters

FAQ

How does it work?

What happens when input exceeds the context window?

How does context window size differ between models?