About ASB
What AgenticSwarmBench Is
AgenticSwarmBench (ASB) is an open-source inference performance benchmark purpose-built for agentic swarm scenarios - the kind of LLM request patterns that Claude Code, Cursor, Windsurf, and Copilot generate in practice.
It measures how fast your serving stack runs under growing multi-turn contexts (6K to 400K tokens), with tool schemas, file contents, error traces, and concurrent agents. No existing benchmark tests these specific access patterns.
ASB produces a clear verdict - š¢ GOOD, š” MARGINAL, or š“ POOR - answering one question: "Is this endpoint good enough for agentic swarm?"
Built by SwarmOne
ASB is created and maintained by SwarmOne - the AI-native cloud for agentic scenarios. SwarmOne provides optimized infrastructure for running agentic swarms at scale, and ASB was born from the need to rigorously benchmark that infrastructure.
Project Architecture
agentic-swarm-bench/ āāā agentic_swarm_bench/ ā āāā cli.py # Click CLI: record | replay | speed | agent | eval ā āāā config.py # Config: CLI > env > YAML > defaults ā āāā scenarios/ # Recording proxy, replay engine, schedule, poisoning ā āāā tasks/ # 110 agentic tasks (P1-P110) + codebase context ā āāā runner/ # Speed, eval, and agent run loops ā āāā proxy/ # FastAPI proxy: Anthropic <-> OpenAI translation ā āāā metrics/ # TTFT, tok/s, ITL, reasoning tokens, stats ā āāā report/ # Markdown reports: verdicts, grades, charts āāā skill/ ā āāā SKILL.md # Claude Code optimization skill āāā tests/ # Test suite
Key Features
- Record & replay: capture real coding sessions as JSONL, replay against any endpoint
- 110 agentic tasks across 6 difficulty tiers (trivial ā expert + multi-language)
- 7 context profiles simulating real session growth (6K ā 400K tokens)
- 5 CLI modes: record, replay, speed, agent, eval (experimental)
- Prefix cache poisoning via space-doubling - true cold-start measurements
- Three cache modes: allcold, allwarm, realistic (shared prefix preserved)
- Reasoning token detection (DeepSeek R1, o3, Claude Extended Thinking)
- Automated verdict system with per-metric grading and ASCII charts
- Docker support for reproducible benchmarking
Claude Code Optimization Skill
The repo includes a Claude Code skill (skill/SKILL.md) that turns Claude Code into an automated deployment optimizer. Point it at your serving stack and it will:
- Run asb speed to establish a baseline
- Read the verdict and key findings
- Identify the bottleneck (prefill-bound, decode-bound, scheduling, or context scaling)
- Tweak one deployment knob (tensor parallelism, batch size, chunked prefill, etc.)
- Re-run and compare - repeat until targets are met or 5 iterations show no improvement
Add the skill to Claude Code, then ask: "Optimize my vLLM deployment at http://localhost:8000 for agentic scenarios."
License
AgenticSwarmBench is open source under the Apache 2.0 License. Free to use, modify, and distribute.
How to Cite
If you use ASB in research or publications, please cite:
@software{agenticswarmbench2026,
title = {AgenticSwarmBench},
author = {SwarmOne},
url = {https://github.com/SwarmOne/agentic-swarm-bench},
year = {2026},
note = {Open-source benchmark for LLM inference under agentic swarm scenarios}
}