Ollama Cloud Benchmark – Latest Run

Generated 2026-05-13 15:30 UTC

Model Size Context Quant TTFT (s) TPS Error
deepseek-v4-flash 158B 1048576 FP8 31.42 (3.74) 1s10s-6h-12h-24h 17.29 (29.26) 10100-6h-12h-24h
deepseek-v4-pro 1T 1048576 FP8 8.12 (3.49) 1s10s-6h-12h-24h 17.38 (29.79) 10100-6h-12h-24h
glm-5.1 756B 202752 FP8 0.97 (1.53) 1s10s-6h-12h-24h 28.12 (30.53) 10100-6h-12h-24h
gemma4:31b 32B 262144 BF16 0.58 (1.04) 1s10s-6h-12h-24h 41.66 (51.91) 10100-6h-12h-24h
kimi-k2.6 1T 262144 INT4 30.34 (1.84) 1s10s-6h-12h-24h 8.20 (39.95) 10100-6h-12h-24h
minimax-m2.5 230B 196608 FP8 18.34 (1.19) 1s10s-6h-12h-24h 16.71 (35.50) 10100-6h-12h-24h

Raw data: results.json | History: history.json (GitHub run 446)


Free Account Model Availability

Tested 2026-05-13 05:14 UTC — 24 / 39 available

cogito-2.1:671b deepseek-v3.1:671b deepseek-v3.2 deepseek-v4-flash deepseek-v4-pro devstral-2:123b devstral-small-2:24b gemini-3-flash-preview gemma3:12b gemma3:27b gemma3:4b gemma4:31b glm-4.6 glm-4.7 glm-5 glm-5.1 gpt-oss:120b gpt-oss:20b kimi-k2-thinking kimi-k2.5 kimi-k2.6 kimi-k2:1t minimax-m2 minimax-m2.1 minimax-m2.5 minimax-m2.7 ministral-3:14b ministral-3:3b ministral-3:8b mistral-large-3:675b nemotron-3-nano:30b nemotron-3-super qwen3-coder-next qwen3-coder:480b qwen3-next:80b qwen3-vl:235b qwen3-vl:235b-instruct qwen3.5:397b rnj-1:8b