How AI works · what it can hold in mind
Context window
30-second gist~30s read
The context window is everything an AI can "see" at once: your current message, the whole chat history, plus any documents you've pasted in. It has a hard limit. When you go past it, the earliest parts of the conversation quietly drop off.
This is why your fifty-message chat about a holiday plan eventually "forgets" the budget you mentioned in message three. The AI isn't being lazy. It's outside the window.
If you want more
How big is the window in 2026?
It varies a lot by model. The biggest commercial models now hold roughly a million tokens — the equivalent of several full-length novels. Anthropic's top Claude models (Opus, Sonnet) and Google's Gemini 2.5 Pro all sit around this mark in early 2026. OpenAI's GPT-5 is in the same ballpark; the older GPT-4o capped at 128,000 tokens.
Older or smaller models (including some you'll meet on free tiers) cap out at 8,000 to 200,000 tokens — much less. If you've ever pasted a long PDF and the AI suddenly forgot the first half, that's what hit you.
What this looks like in practice
- Long chats slowly lose the original goal you stated.
- Document Q&A misses sections far from where the question is being asked.
- Copying a whole codebase in often loses files in the middle.
- The fix is usually: start a fresh chat and re-paste only what matters.