TLDR:

The context window of an LLM is the maximum amount of text (measured in tokens) it can process in a single forward pass—including both the input prompt and the generated output. Context windows have grown dramatically: from 2,048 tokens in GPT-2 to 200K-2M+ in current frontier models, enabling fundamentally new applications.

Token Mechanics

Tokens are the units of text an LLM processes—roughly equivalent to 0.75 English words on average, but varying by tokenizer (different models tokenize differently). A 200,000-token context window holds approximately 150,000 English words, or roughly 300 pages of a typical book. Pricing for LLM APIs is typically per-token, with input and output tokens often priced differently. Context window expansion has been accompanied by efficiency improvements—newer architectures (Mamba, hybrid models) and techniques (sliding window attention, sparse attention) reduce the compute cost of long contexts.

Applications Enabled by Long Context

Large context windows enable: full-document analysis (entire contracts, codebases, books processed at once), RAG with extensive retrieved context, long-running agent conversations with full history, multi-document reasoning (comparing/synthesizing across many sources), and few-shot learning with many examples in-prompt. Use cases that previously required complex RAG architectures can now sometimes be solved with simple long-context prompts.

Limitations and Trade-offs

Long context comes with real trade-offs: cost (proportional to context length, often making naïve use expensive), latency (longer inputs take longer to process), the “lost in the middle” problem (LLMs may attend less effectively to information in the middle of very long contexts), and quality variation (performance can degrade at the extremes of advertised context length). For most production applications, retrieval-augmented approaches with focused, relevant context typically outperform pure long-context approaches, both in cost and quality. Founders should evaluate trade-offs empirically against their specific use case.