Comparing frontier AI capabilities in video generation. Analysis based on 103,163+ human preference votes.
As of December 2025, the Text-to-Video landscape is witnessing a fierce rivalry between established giants and agile newcomers. Google's Veo 3.1 series has claimed the top spot with an ELO of 1386, edging out OpenAI's Sora 2 Pro.
A significant trend is the rise of audio-integrated video generation. The top 3 models (Veo 3.1 Fast, Veo 3.1, and Veo 3 Fast) all feature native audio capabilities, suggesting that users heavily prioritize multimodal coherence in their evaluations.
While proprietary models dominate the top 15, Alibaba's Wan 2.5 (Rank 7) and Mochi-v1 (Rank 23) represent the open-weight ecosystem, with Wan 2.5 achieving an impressive 1305 ELO score, proving competitive against closed source alternatives.
| Rank | Model | Organization | ELO Score |
|---|