Context Profiles
Understanding the 7 context profiles, cache defeat, and how to control context size.
Overview
Context size simulates where you are in a real coding session. When Claude Code opens a file, reads 2,000 lines, edits functions, runs tests, and reads errors - that's 5+ LLM round-trips with 40–100K token contexts growing each turn.
Each profile pads requests with realistic agentic swarm content: system prompts with tool schemas, prior conversation turns, file contents, tool call results, and error traces.
The 7 Profiles
| Profile | Tokens | What it simulates |
|---|---|---|
| fresh | ~6K | Just opened the project |
| short | ~20K | A few turns in |
| medium | ~40K | Active coding session |
| long | ~70K | Deep multi-file work |
| full | ~100K | Extended session |
| xl | ~200K | Very large context |
| xxl | ~400K | Maximum context window |
| realistic | Mixed | Sweeps fresh → full (default) - simulates a full session lifecycle |
The default realistic sweep runs fresh → short → medium → long → full, simulating a full session lifecycle. Use --context-profile xl or xxl for long-context models.
Usage
Simulate a deep coding session (70K context)
asb speed -e URL -m MODEL --context-profile longLong-context models: test at 200K or 400K
asb speed -e URL -m MODEL --context-profile xlExact token count
asb speed -e URL -m MODEL --context-tokens 50000Default sweep (fresh → full)
asb speed -e URL -m MODELModel Context Window
Use --model-context-length to tell the benchmark your model's maximum context window. Any profiles that exceed it are automatically skipped.
# Model supports up to 128K - xl and xxl are skipped
asb speed -e URL -m MODEL --suite full --model-context-length 128000# Model supports 400K - run everything including xxl
asb speed -e URL -m MODEL --context-profile xxl --model-context-length 400000Prefix Cache Poisoning
LLM inference engines cache the KV state of common prefixes so repeated requests skip prefill. This makes benchmarks look artificially fast - you're measuring cache hits, not real inference.
AgenticSwarmBench defeats the prefix cache using space doubling: it finds isolated single spaces in the request context and randomly doubles some of them. This shifts BPE token boundaries (" word" → " word" splits differently) and invalidates the KV cache from that point forward - without adding any artificial content the model can see.
This mimics what actually happens in real coding sessions: when an agent edits a file mid-conversation, the context changes from the edit point onward, breaking the cache naturally.
- asb speed: each request gets a unique space-doubling pattern seeded by task ID, user ID, and timestamp.
- asb replay: finds the longest common prefix across all tasks (typically the system prompt), preserves it so the server can cache it, then applies space-doubling only after that prefix. Different repetitions get different patterns.
Cache Modes
Both asb speed and asb replay accept --cache-mode with three options.
| Mode | What it does |
|---|---|
| allcold | Every request defeats the KV cache via space-doubling. Measures true cold-start latency. Default for asb speed. |
| allwarm | No poisoning - requests arrive as-is and the server can cache freely. Measures best-case latency. |
| realistic | Preserves the shared prefix (system prompt) so it can be cached; poisons only the unique per-user portion. Default for asb replay. |
asb speed -e URL -m MODEL # allcold (default for speed)asb speed -e URL -m MODEL --cache-mode realistic # runs allcold then allwarmasb speed -e URL -m MODEL --cache-mode allwarm # best-case cached numbersasb replay -e URL -m MODEL -w scenario # realistic (default for replay)asb replay -e URL -m MODEL -w scenario --cache-mode allwarm--cache-mode realistic on asb speed runs each scenario twice (allcold then allwarm) and reports both. Anthropic charges 10× less for cached tokens ($0.30 vs $3.00/M), so knowing your cache hit rate matters.