Vision Model Leaderboard

Intelligence vs. Efficiency Trade-offs

The latest benchmarks reveal a clear bifurcation in the vision model landscape. Gemini 3 Pro Preview has set a new high-water mark for intelligence with an 80% score on MMMU Pro. However, this comes at a steep latency cost of over 21 seconds for the first token.

For production environments requiring speed, Gemini 2.5 Flash and Llama 4 Maverick dominate the "Golden Quadrant." Both offer exceptional throughput (>120 tokens/sec) and very low image input pricing ($0.20-$0.40 per 1k images), making them ideal for high-volume vision tasks.

Legacy and high-reasoning models like Claude Opus 4.5 and Claude 4 Sonnet remain expensive options ($4.00+ per 1k images), creating a significant price disparity compared to the newer, optimized Flash and Qwen architectures.

Highest Intelligence

80%

Gemini 3 Pro (MMMU Pro)

Fastest Speed

180 t/s

Gemini 2.5 Flash

Best Latency

0.42s

Apriel-v1.5 (TTFT)

Best Price

$0.20

Per 1k Images (Flash/Qwen)