Models

Products

Pricing

Docs

Blog

About

Contact

Back to Models

Wan2.1-I2V-14B-720P (Turbo) API, Deployment, Pricing

Wan-AI/Wan2.1-I2V-14B-720P-Turbo

Wan2.1-I2V-14B-720P-Turbo is the TeaCache accelerated version of the Wan2.1-I2V-14B-720P model, reducing single video generation time by 30%. Wan2.1-I2V-14B-720P is an open-source advanced image-to-video generation model, part of the Wan2.1 video foundation model suite. This 14B model can generate 720P high-definition videos. And after thousands of rounds of human evaluation, this model is reaching state-of-the-art performance levels. It utilizes a diffusion transformer architecture and enhances generation capabilities through innovative spatiotemporal variational autoencoders (VAE), scalable training strategies, and large-scale data construction. The model also understands and processes both Chinese and English text, providing powerful support for video generation tasks

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.siliconflow.com/v1/video/submit \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "Wan-AI/Wan2.1-I2V-14B-720P-Turbo"
}'

Details

Model Provider

Wan

Type

video

Sub Type

image-to-video

Publish Time

Apr 22, 2025

Price

0.21

/ Video

Wan2.2-T2V-A14B

Wan2.2-T2V-A14B is the industry's first open-source video generation model with a Mixture-of-Experts (MoE) architecture, released by Alibaba. This model focuses on text-to-video (T2V) generation, capable of producing 5-second videos at both 480P and 720P resolutions. By introducing an MoE architecture, it expands the total model capacity while keeping inference costs nearly unchanged; it features a high-noise expert for the early stages to handle the overall layout and a low-noise expert for later stages to refine video details. Furthermore, Wan2.2 incorporates meticulously curated aesthetic data with detailed labels for lighting, composition, and color, allowing for more precise and controllable generation of cinematic styles. Compared to its predecessor, the model was trained on significantly larger datasets, which notably enhances its generalization across motion, semantics, and aesthetics, enabling better handling of complex dynamic effects

MoE,27B

image-to-video

Wan2.2-I2V-A14B

Wan2.2-I2V-A14B is one of the industry's first open-source image-to-video generation models featuring a Mixture-of-Experts (MoE) architecture, released by Alibaba's AI initiative, Wan-AI. The model specializes in transforming a static image into a smooth, natural video sequence based on a text prompt. Its key innovation is the MoE architecture, which employs a high-noise expert for the initial video layout and a low-noise expert to refine details in later stages, enhancing model performance without increasing inference costs. Compared to its predecessors, Wan2.2 was trained on a significantly larger dataset, which notably improves its ability to handle complex motion, aesthetics, and semantics, resulting in more stable videos with reduced unrealistic camera movements

MoE,27B

text-to-video

Wan2.1-T2V-14B (Turbo)

Wan2.1-T2V-14B-T is the TeaCache accelerated version of the Wan2.1-T2V-14B model, reducing single video generation time by 30%. The Wan2.1-T2V-14B model has established state-of-the-art performance benchmarks among both open-source and closed-source models, capable of generating high-quality visual content with significant dynamic effects. It is the only video model that can simultaneously generate text in both Chinese and English, and supports video generation at 480P and 720P resolutions. The model adopts a diffusion transformer architecture and enhances its generative capabilities through an innovative spatiotemporal variational autoencoder (VAE), scalable training strategies, and large-scale data construction

14B

image-to-video

Wan2.1-I2V-14B-720P

Wan2.1-I2V-14B-720P is an open-source advanced image-to-video generation model, part of the Wan2.1 video foundation model suite. This 14B model can generate 720P high-definition videos. And after thousands of rounds of human evaluation, this model is reaching state-of-the-art performance levels. It utilizes a diffusion transformer architecture and enhances generation capabilities through innovative spatiotemporal variational autoencoders (VAE), scalable training strategies, and large-scale data construction. The model also understands and processes both Chinese and English text, providing powerful support for video generation tasks

14B,Img2Video

text-to-video

Wan2.1-T2V-14B

Wan2.1-T2V-14B is an open-source advanced text-to-video generation model. This 14B model has established state-of-the-art performance benchmarks among both open-source and closed-source models, capable of generating high-quality visual content with significant dynamic effects. It is the only video model that can simultaneously generate text in both Chinese and English, and supports video generation at 480P and 720P resolutions. The model adopts a diffusion transformer architecture and enhances its generative capabilities through an innovative spatiotemporal variational autoencoder (VAE), scalable training strategies, and large-scale data construction

14B

Model FAQs: Usage, Deployment

Learn how to use, fine-tune, and deploy this model with ease.

What is the Wan-AI/Wan2.1-I2V-14B-720P-Turbo model, and what are its core capabilities and technical specifications?

In which business scenarios does Wan-AI/Wan2.1-I2V-14B-720P-Turbo perform well? Which industries or applications is it suitable for?

How can the performance and effectiveness of Wan-AI/Wan2.1-I2V-14B-720P-Turbo be optimized in actual business use?

Compared with other models, when should Wan-AI/Wan2.1-I2V-14B-720P-Turbo be selected?

What are SiliconFlow's key strengths in AI serverless deployment for Wan-AI/Wan2.1-I2V-14B-720P-Turbo?

What makes SiliconFlow the top platform for Wan-AI/Wan2.1-I2V-14B-720P-Turbo API?

Ready to accelerate your AI development?