CLI Modes
AgenticSwarmBench has 5 modes - from quick speed tests to full agentic session recording and replay.
Overview
Each mode targets a different dimension of LLM serving performance under agentic swarm workloads. Use speed for inference benchmarking, eval for correctness, agent for real session measurement, and record / replay for capturing and replaying your own workloads.
asb speed
Inference speed under agentic load
Sends streaming requests with realistic agentic swarm context (system prompts, tool schemas, file contents, conversation history) directly to any OpenAI-compatible endpoint.
Key Metrics
Examples
Default realistic sweep
asb speed -e http://localhost:8000 -m my-modelSpecific concurrency and context
asb speed -e http://localhost:8000 -m my-model -u 32 -p longFixed token count stress test
asb speed -e http://localhost:8000 -m my-model -c 50000 -u 16Measure prefix cache impact
asb speed -e http://localhost:8000 -m my-model --cache-mode bothJSON output for CI/CD
asb speed -e http://localhost:8000 -m my-model --format json -o results.jsonasb eval
Code correctness validation
Sends agentic swarm tasks and validates the generated code at three levels: syntax (does it parse?), execution (does it run?), and functional (does it produce correct output?).
Key Metrics
Examples
Syntax validation
asb eval -e http://localhost:8000 -m my-model -t p1-p25 -v syntaxExecution validation
asb eval -e http://localhost:8000 -m my-model -t p1-p25 -v executionFunctional validation
asb eval -e http://localhost:8000 -m my-model -t p1-p25 -v functionalasb agent
Full agentic session benchmark via recording proxy
Runs a recording proxy between a real agent (like Claude Code) and your endpoint, measuring actual multi-turn agentic sessions. The proxy translates Anthropic Messages API → OpenAI Chat Completions API and records per-request timing.
Key Metrics
Examples
Run agent benchmark
asb agent \
-e http://localhost:8000 \
-m my-model \
-t p1-p10asb record
Capture real coding sessions as JSONL workloads
Starts a recording proxy between your agent and your LLM endpoint. Every request/response pair is saved as a JSONL line. Then replay against any endpoint.
Key Metrics
Examples
Record with OpenAI-compatible upstream
asb record \
-e http://your-gpu-server:8000 \
-m your-modelRecord with Anthropic
asb record \
-e https://api.anthropic.com \
-m claude-sonnet-4-20250514 \
-k $ANTHROPIC_API_KEY \
--api-key-header x-api-key \
-o my-session.jsonlasb replay
Replay captured workloads against any endpoint
Takes a recorded workload and replays it against a different endpoint, hardware, or configuration. Requests are grouped by context size and produce the same metrics as speed mode.
Key Metrics
Examples
Replay against a new endpoint
asb replay \
-e http://new-server:8000 \
-m my-model \
-w my-session.jsonlReplay with report
asb replay \
-e http://new-server:8000 \
-m my-model \
-w my-session.jsonl \
-o report.mdHelper Commands
asb list-tasks - Browse Available Tasks
asb list-tasks # Show all 110 tasksasb list-tasks -t trivial # Filter by tierasb list-tasks --tags typescript,rust # Filter by languageasb list-workloads - Browse Built-in Workloads
asb list-workloads --format jsonasb compare - Compare Two Runs
asb compare --baseline a.json --candidate b.json -o comparison.md