$ Leaderboard
Real benchmark results from the AgenticSwarmBench suite. Every row is a reproducible configuration tested under agentic swarm scenarios - TTFT, throughput, and latency from 6K to 400K token contexts.
8 entries
| # | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 🥇 | OpenAI API - GPT-5.4 (Replay: markdown-note-app) | API | GPT-5.4 | markdown-note-app | 1.4s | 63 | 1.6s | 65 | MARGINAL | 2026-04-14 |
| 🥈 | Anthropic API - Claude Opus 4.6 (Replay: markdown-note-app) | API | Claude Opus 4.6 | markdown-note-app | 1.7s | 84 | 2.0s | 87 | MARGINAL | 2026-04-14 |
| 🥉 | xAI API - Grok 4.20 (Replay: markdown-note-app) | API | Grok 4.20 | markdown-note-app | 759ms | 165 | 822ms | 132 | MARGINAL | 2026-04-14 |
| 4 | Google API - Gemini 3.1 Pro (Replay: markdown-note-app) | API | Gemini 3.1 Pro | markdown-note-app | 30.9s | 206 | 6.2s | 137 | POOR | 2026-04-14 |
| 5 | DeepSeek API - DeepSeek V3.2 (Replay: markdown-note-app) | API | DeepSeek V3.2 | markdown-note-app | 1.3s | 32 | 1.5s | 32 | MARGINAL | 2026-04-14 |
| 6 | Mistral API - Mistral Large 3 (Replay: markdown-note-app) | API | Mistral Large 3 | markdown-note-app | 461ms | 71 | - | - | GOOD | 2026-04-14 |
| 7 | Groq API - GPT-OSS 120B (Replay: markdown-note-app) | Groq | GPT-OSS 120B | markdown-note-app | 215ms | 384 | - | - | GOOD | 2026-04-14 |
| 8 | Together API - GLM-5.1 (Replay: markdown-note-app) | Together | GLM-5.1 | markdown-note-app | 1.4s | 182 | 2.2s | 58 | MARGINAL | 2026-04-14 |
Submit Your Results
Got a serving stack, model, or hardware combo that's not listed? Run the benchmark and submit your results via pull request. Every entry requires a reproducible configuration and raw metrics.
1. Install: pip install agentic-swarm-bench
2. Run: asb speed --endpoint YOUR_ENDPOINT --model YOUR_MODEL --suite full
3. Submit the JSON report as a PR to data/leaderboard/