Text-to-Video Arena Leaderboard

The Battle for Video Supremacy

As of December 2025, the Text-to-Video landscape is witnessing a fierce rivalry between established giants and agile newcomers. Google's Veo 3.1 series has claimed the top spot with an ELO of 1386, edging out OpenAI's Sora 2 Pro.

A significant trend is the rise of audio-integrated video generation. The top 3 models (Veo 3.1 Fast, Veo 3.1, and Veo 3 Fast) all feature native audio capabilities, suggesting that users heavily prioritize multimodal coherence in their evaluations.

While proprietary models dominate the top 15, Alibaba's Wan 2.5 (Rank 7) and Mochi-v1 (Rank 23) represent the open-weight ecosystem, with Wan 2.5 achieving an impressive 1305 ELO score, proving competitive against closed source alternatives.

#1 Model

Veo 3.1

Google (Fast Audio)

Highest ELO

1,386

+145 / -9 Confidence

Total Votes

103k+

Crowdsourced Evaluation

Top Open Model

Wan 2.5

Alibaba (Rank 7)