$ Leaderboard

Real benchmark results from the AgenticSwarmBench suite. Every row is a reproducible configuration tested under agentic swarm scenarios - TTFT, throughput, and latency from 6K to 400K token contexts.

8 entries

#
🥇	OpenAI API - GPT-5.4 (Replay: markdown-note-app)	API	GPT-5.4	markdown-note-app	1.4s	63	1.6s	65	MARGINAL	2026-04-14
🥈	Anthropic API - Claude Opus 4.6 (Replay: markdown-note-app)	API	Claude Opus 4.6	markdown-note-app	1.7s	84	2.0s	87	MARGINAL	2026-04-14
🥉	xAI API - Grok 4.20 (Replay: markdown-note-app)	API	Grok 4.20	markdown-note-app	759ms	165	822ms	132	MARGINAL	2026-04-14
4	Google API - Gemini 3.1 Pro (Replay: markdown-note-app)	API	Gemini 3.1 Pro	markdown-note-app	30.9s	206	6.2s	137	POOR	2026-04-14
5	DeepSeek API - DeepSeek V3.2 (Replay: markdown-note-app)	API	DeepSeek V3.2	markdown-note-app	1.3s	32	1.5s	32	MARGINAL	2026-04-14
6	Mistral API - Mistral Large 3 (Replay: markdown-note-app)	API	Mistral Large 3	markdown-note-app	461ms	71	-	-	GOOD	2026-04-14
7	Groq API - GPT-OSS 120B (Replay: markdown-note-app)	Groq	GPT-OSS 120B	markdown-note-app	215ms	384	-	-	GOOD	2026-04-14
8	Together API - GLM-5.1 (Replay: markdown-note-app)	Together	GLM-5.1	markdown-note-app	1.4s	182	2.2s	58	MARGINAL	2026-04-14

Submit Your Results

Got a serving stack, model, or hardware combo that's not listed? Run the benchmark and submit your results via pull request. Every entry requires a reproducible configuration and raw metrics.

Contribution Guide Open a PR

1. Install: pip install agentic-swarm-bench

2. Run: asb speed --endpoint YOUR_ENDPOINT --model YOUR_MODEL --suite full

3. Submit the JSON report as a PR to data/leaderboard/