Ollama Cloud Benchmark – Latest Run

Generated 2026-05-13 13:16 UTC

Model Size Context Quant TTFT (s) TPS Error
deepseek-v4-flash 158B 1048576 FP8 0.59 (3.33) 1s10s-6h-12h-24h 37.94 (30.78) 10100-6h-12h-24h
deepseek-v4-pro 1T 1048576 FP8 9.96 (3.53) 1s10s-6h-12h-24h 13.18 (30.01) 10100-6h-12h-24h
glm-5.1 756B 202752 FP8 3.14 (1.73) 1s10s-6h-12h-24h 15.32 (31.26) 10100-6h-12h-24h
gemma4:31b 32B 262144 BF16 0.39 (1.13) 1s10s-6h-12h-24h 127.90 (48.92) 10100-6h-12h-24h
kimi-k2.6 1T 262144 INT4 n/a (1.85) 1s10s-6h-12h-24h n/a (39.66) 10100-6h-12h-24h Server error '503 Service Unavailable' for url 'https://ollama.com/api/chat' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503
minimax-m2.5 230B 196608 FP8 0.35 (1.11) 1s10s-6h-12h-24h 37.25 (35.20) 10100-6h-12h-24h

Raw data: results.json | History: history.json (GitHub run 444)


Free Account Model Availability

Tested 2026-05-13 05:14 UTC — 24 / 39 available

cogito-2.1:671b deepseek-v3.1:671b deepseek-v3.2 deepseek-v4-flash deepseek-v4-pro devstral-2:123b devstral-small-2:24b gemini-3-flash-preview gemma3:12b gemma3:27b gemma3:4b gemma4:31b glm-4.6 glm-4.7 glm-5 glm-5.1 gpt-oss:120b gpt-oss:20b kimi-k2-thinking kimi-k2.5 kimi-k2.6 kimi-k2:1t minimax-m2 minimax-m2.1 minimax-m2.5 minimax-m2.7 ministral-3:14b ministral-3:3b ministral-3:8b mistral-large-3:675b nemotron-3-nano:30b nemotron-3-super qwen3-coder-next qwen3-coder:480b qwen3-next:80b qwen3-vl:235b qwen3-vl:235b-instruct qwen3.5:397b rnj-1:8b