Contributing
How to contribute tasks, scenarios, and code to AgenticSwarmBench.
Development Setup
Clone the repo. uv is recommended, but pip works too.
git clone https://github.com/swarmone/agentic-swarm-bench.gitcd agentic-swarm-benchWith uv (recommended)
uv sync --all-extrasuv run pytest tests/ -vOr with pip
pip install -e ".[dev,proxy]"make testDevelopment Commands
| Command | Description |
|---|---|
| make lint | Check code style |
| make format | Auto-format code |
| make test | Run the full test suite |
| uv run pytest tests/ -v | Run tests via uv |
Adding Tasks
Tasks are defined in agentic_swarm_bench/tasks/tasks.json. Each task has:
tasks.json (single entry)
{
"id": "P111",
"tier": "medium",
"prompt": "Build a REST API endpoint that...",
"tags": ["python", "api", "fastapi"],
"max_output_tokens": 2048
}| Field | Description |
|---|---|
| id | Unique ID (P1 through P110+) |
| tier | Difficulty: trivial, easy, medium, hard, expert |
| prompt | The agentic swarm task description |
| tags | Categorization tags (language, domain) |
| max_output_tokens | Token limit for the response |
Adding Scenarios
Record a real session and contribute it as a built-in scenario:
- 1Record a session with asb record
- 2Place the JSONL file in agentic_swarm_bench/scenarios/data/
- 3Register it in scenarios/registry.py
- 4Open a PR with a description of the session and what it tests
Project Architecture
Project structure
agentic-swarm-bench/
agentic_swarm_bench/
cli.py # Click CLI (asb record | replay | speed | agent | eval | ...)
config.py # Config: CLI > env > YAML > defaults
scenarios/
recorder.py # Recording proxy: captures real sessions as JSONL
player.py # Replay engine: replays scenarios against any endpoint
registry.py # Load/list/resolve scenarios (file path or built-in name)
schedule.py # Execution schedule: repetitions, concurrency, ordering
poison.py # Prefix-cache poisoning: breaks KV cache between reps
data/ # Built-in scenario directories
tasks/
tasks.json # 110 agentic swarm tasks, P1-P110
registry.py # Load/filter tasks by tier, range, tags, language
context/
codebase_context.py # Tool schemas, file contents, conversation turns
runner/
direct.py # Speed mode: direct endpoint benchmark
eval_runner.py # Eval mode: code correctness validation
claude_code.py # Agent mode: Claude Code orchestration through proxy
proxy/
server.py # Agent-mode proxy (FastAPI) - Anthropic <-> OpenAI
padding.py # Context padding for proxy mode
translators.py # API format translation
metrics/
collector.py # Per-request metrics: TTFT, tok/s, ITL, thinking tokens
stats.py # Statistical analysis (p50, p95, p99, distributions)
report/
markdown.py # Verdict, insights, grades, ASCII charts
skill/
SKILL.md # Claude Code optimization skillPR Guidelines
- Run make test and make lint before submitting
- Add tests for new features
- Keep commits focused - one feature or fix per PR
- Update documentation if you change CLI flags or behavior
License
AgenticSwarmBench is released under the Apache 2.0 license. See LICENSE for details.