Record & Replay
Capture real coding sessions and replay them against any endpoint.
Why Record & Replay?
This is the most valuable way to benchmark. Synthetic load tells you what an endpoint can do in theory. Record/replay tells you what it actually does with your traffic. Record a real coding session once, then replay that exact sequence of requests against any endpoint, hardware config, or model - same context, same token counts, same multi-turn patterns.
Why this matters: agentic sessions have a unique shape. Context starts small and grows unpredictably. Some turns are tiny follow-ups; others dump 20K tokens of file contents. Synthetic benchmarks can approximate this, but a recording captures the real thing.
asb record - Capture a Session
Starts a recording proxy between your agent and your LLM endpoint. Every request/response pair is saved as a JSONL line.
Record with an OpenAI-compatible upstream
asb record \
-e http://your-gpu-server:8000 \
-m your-modelRecord with Anthropic (auto-detected from URL)
asb record \
-e https://api.anthropic.com \
-m claude-sonnet-4-20250514 \
-k $ANTHROPIC_API_KEY \
--api-key-header x-api-key \
-o my-session.jsonlCustom output file and port
asb record \
-e http://your-gpu-server:8000 \
-m your-model \
-o my-session.jsonl \
-P 9000Point Your Agent at the Proxy
Once the recording proxy is running, point your agent at it:
ANTHROPIC_BASE_URL=http://localhost:19000 claudeStop recording with Ctrl+C when done.
Upstream Modes
The recorder supports two upstream modes:
OpenAI-compatible (default)
Translates Anthropic Messages API → OpenAI format before forwarding.
Anthropic passthrough
Forwards requests natively to Anthropic's API - no translation, full fidelity. Auto-detected when the endpoint is api.anthropic.com, or set explicitly with --upstream-api anthropic.
Both modes save the scenario in OpenAI format for replay.
asb replay - Replay Against Any Endpoint
Take a recorded scenario and replay it against a different endpoint, hardware, or configuration. Requests are grouped by context size and produce the same metrics as asb speed - decode tok/s (streaming speed after first token), prefill tok/s (input processing rate), TTFT, ITL, and aggregate throughput - but using your real traffic instead of synthetic padding.
Replay a single session against a new endpoint
asb replay \
-e http://new-server:8000 \
-m my-model \
-w my-session.jsonlReplay a scenario directory with a schedule
asb replay \
-e http://new-server:8000 \
-m my-model \
-w ./scenarios/my-scenario/ \
--repetitions 3 --max-concurrent 5 --policy sequentialGenerate a full report
asb replay \
-e http://new-server:8000 \
-m my-model \
-w my-session.jsonl \
-o report.mdPreview without sending requests
asb replay -e URL -m MODEL -w session.jsonl --dry-runScheduling
Control how tasks execute with --repetitions, --max-concurrent, and --policy. Available policies: round_robin, sequential, random.
Cache Mode
Replay's default is --cache-mode realistic: it preserves the shared prefix (typically the system prompt) so the server can KV-cache it, but poisons each user's unique context so it doesn't. Use allwarm for the optimistic all-cached upper bound, or allcold to defeat caching entirely.
asb replay -e URL -m MODEL -w scenario # realistic (default)asb replay -e URL -m MODEL -w scenario --cache-mode allwarm # optimistic upper boundasb replay -e URL -m MODEL -w scenario --cache-mode allcold # defeat cache entirelySee Prefix cache poisoning for how the space-doubling mechanism works.
Slicing Scenarios
Real sessions grow from small contexts to large ones. --slice-tokens N replays requests from the start until cumulative prompt tokens reach N - preserving the natural context growth while capping how much you send through the endpoint.
asb replay -e URL -m MODEL -w session.jsonl --slice-tokens 1000000Useful for targeting specific model context limits or keeping replay costs down.
Record CLI Flags
| Flag | Description |
|---|---|
| -e, --endpoint | Upstream LLM endpoint URL |
| -m, --model | Model name |
| -k, --api-key | API key for the upstream endpoint |
| --api-key-header | Custom API key header name |
| -o, --output | Output JSONL file path |
| -P, --port | Proxy listen port (default: 19000) |
| --upstream-api | Force upstream API type (openai or anthropic) |
Replay CLI Flags
| Flag | Description |
|---|---|
| -e, --endpoint | Target endpoint URL |
| -m, --model | Model name |
| -w, --scenario | JSONL scenario file path or scenario directory |
| -o, --output | Report output path |
| --cache-mode | realistic (default) | allwarm | allcold |
| --repetitions | Number of times to replay each task |
| --max-concurrent | Maximum in-flight requests |
| --policy | Execution policy: round_robin | sequential | random |
| --slice-tokens | Stop replaying after N cumulative prompt tokens |
| --dry-run | Preview without sending requests |