Getting Started

Install AgenticSwarmBench and run your first benchmark in under 2 minutes.

Installation

AgenticSwarmBench is available on PyPI. Requires Python 3.9+. uv is recommended, but pip works too.

uv pip install agentic-swarm-bench             # with uv (recommended)

pip install agentic-swarm-bench                # or with pip

For proxy support (required for asb agent, asb record, and anthropic passthrough):

uv pip install "agentic-swarm-bench[proxy]"

Record a Real Session, Then Replay It Anywhere

The fastest way to get meaningful numbers: record what you actually do with your agent, then replay that exact session against any endpoint.

1. Start the recording proxy

asb record -e http://your-gpu-server:8000 -m your-model

2. Point your agent at the proxy (listens on localhost:19000)

ANTHROPIC_BASE_URL=http://localhost:19000 claude

3. Do your normal work, Ctrl+C when done. Then replay anywhere.

asb replay \
  -e http://new-server:8000 \
  -m my-model \
  -w my-session.jsonl

This captures your real context patterns, real token counts, and real multi-turn behavior - then lets you A/B test endpoints with your actual workload. See Record & Replay for the full workflow.

Run a Synthetic Speed Test

If you don't have a recording yet, asb speed generates realistic agentic context synthetically and sweeps context sizes and concurrency levels.

asb speed \
  --endpoint http://localhost:8000 \
  --model my-model \
  --suite quick

asb is the short alias. agentic-swarm-bench also works.

Full Suite with Report

Sweep all context sizes (6K → 400K) and concurrency levels. Generate a Markdown report with verdicts, charts, and recommendations.

asb speed \
  --endpoint http://localhost:8000 \
  --model my-model \
  --suite full \
  --output report.md

Endpoint URL Handling

Pass any URL. If it doesn't end with /v1/chat/completions, the path is appended automatically. Both of these work:

asb speed -e http://localhost:8000 -m my-model

asb speed -e https://api.example.com/v1/chat/completions -m my-model

Authentication

By default, --api-key is sent as Authorization: Bearer <key>. If your endpoint uses a different header:

asb speed -e URL -m MODEL -k MY_KEY --api-key-header X-API-Key

Dry Run

Preview exactly what will be sent to the endpoint without making any requests. Useful for validating configuration.

asb speed -e URL -m MODEL --dry-run

Note:some inference endpoints don't return detailed error messages on failure. Use --dry-run to validate your configuration before running a full benchmark.

Docker Quickstart

Run without installing Python. Results are mounted to your host via volume:

docker run --rm -v $(pwd)/results:/results \
  swarmone/agentic-swarm-bench speed \
  --endpoint http://host.docker.internal:8000 \
  --model my-model \
  --suite quick \
  --output /results/report.md

Use host.docker.internal to reach services running on your host machine from inside the container.

Next Steps

Deep dive on Record & Replay - the headline feature
Learn about all 5 CLI modes
Understand context profiles and prefix cache poisoning
Read about reports and verdicts