Text-to-Video Arena • Dec 2026

Text to Video Models

Comparing frontier AI capabilities in video generation. Analysis based on 103,163+ human preference votes.

The Battle for Video Supremacy

As of December 2025, the Text-to-Video landscape is witnessing a fierce rivalry between established giants and agile newcomers. Google's Veo 3.1 series has claimed the top spot with an ELO of 1386, edging out OpenAI's Sora 2 Pro.

A significant trend is the rise of audio-integrated video generation. The top 3 models (Veo 3.1 Fast, Veo 3.1, and Veo 3 Fast) all feature native audio capabilities, suggesting that users heavily prioritize multimodal coherence in their evaluations.

While proprietary models dominate the top 15, Alibaba's Wan 2.5 (Rank 7) and Mochi-v1 (Rank 23) represent the open-weight ecosystem, with Wan 2.5 achieving an impressive 1305 ELO score, proving competitive against closed source alternatives.

#1 Model
Veo 3.1
Google (Fast Audio)
Highest ELO
1,386
+145 / -9 Confidence
Total Votes
103k+
Crowdsourced Evaluation
Top Open Model
Wan 2.5
Alibaba (Rank 7)
Top 5 Models (ELO Score)
Win Rate vs Average
Organization Presence (Top 20)
Rank Model Organization ELO Score Votes License