$ Leaderboard

Real benchmark results from the AgenticSwarmBench suite. Every row is a reproducible configuration tested under agentic swarm scenarios - TTFT, throughput, and latency from 6K to 400K token contexts.

8 entries
#
🥇OpenAI API - GPT-5.4 (Replay: markdown-note-app)APIGPT-5.4markdown-note-app1.4s631.6s65MARGINAL2026-04-14
🥈Anthropic API - Claude Opus 4.6 (Replay: markdown-note-app)APIClaude Opus 4.6markdown-note-app1.7s842.0s87MARGINAL2026-04-14
🥉xAI API - Grok 4.20 (Replay: markdown-note-app)APIGrok 4.20markdown-note-app759ms165822ms132MARGINAL2026-04-14
4Google API - Gemini 3.1 Pro (Replay: markdown-note-app)APIGemini 3.1 Promarkdown-note-app30.9s2066.2s137POOR2026-04-14
5DeepSeek API - DeepSeek V3.2 (Replay: markdown-note-app)APIDeepSeek V3.2markdown-note-app1.3s321.5s32MARGINAL2026-04-14
6Mistral API - Mistral Large 3 (Replay: markdown-note-app)APIMistral Large 3markdown-note-app461ms71--GOOD2026-04-14
7Groq API - GPT-OSS 120B (Replay: markdown-note-app)GroqGPT-OSS 120Bmarkdown-note-app215ms384--GOOD2026-04-14
8Together API - GLM-5.1 (Replay: markdown-note-app)TogetherGLM-5.1markdown-note-app1.4s1822.2s58MARGINAL2026-04-14

Submit Your Results

Got a serving stack, model, or hardware combo that's not listed? Run the benchmark and submit your results via pull request. Every entry requires a reproducible configuration and raw metrics.

1. Install: pip install agentic-swarm-bench

2. Run: asb speed --endpoint YOUR_ENDPOINT --model YOUR_MODEL --suite full

3. Submit the JSON report as a PR to data/leaderboard/