Wan2.1-T2V-14B API, Fine-Tuning, Deployment

Wan-AI/Wan2.1-T2V-14B

Wan2.1-T2V-14B is an open-source advanced text-to-video generation model. This 14B model has established state-of-the-art performance benchmarks among both open-source and closed-source models, capable of generating high-quality visual content with significant dynamic effects. It is the only video model that can simultaneously generate text in both Chinese and English, and supports video generation at 480P and 720P resolutions. The model adopts a diffusion transformer architecture and enhances its generative capabilities through an innovative spatiotemporal variational autoencoder (VAE), scalable training strategies, and large-scale data construction

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.siliconflow.com/v1/video/submit \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "Wan-AI/Wan2.1-I2V-14B-720P-Turbo"
}'

Details

Model Provider

Wan

Type

video

Sub Type

text-to-video

Publish Time

Feb 22, 2025

Price

0.29

/ Video

Wan2.1-I2V-14B-720P

Wan2.1-I2V-14B-720P is an open-source advanced image-to-video generation model, part of the Wan2.1 video foundation model suite. This 14B model can generate 720P high-definition videos. And after thousands of rounds of human evaluation, this model is reaching state-of-the-art performance levels. It utilizes a diffusion transformer architecture and enhances generation capabilities through innovative spatiotemporal variational autoencoders (VAE), scalable training strategies, and large-scale data construction. The model also understands and processes both Chinese and English text, providing powerful support for video generation tasks

14B,Img2Video

image-to-video

Wan2.1-I2V-14B-720P (Turbo)

Wan2.1-I2V-14B-720P-Turbo is the TeaCache accelerated version of the Wan2.1-I2V-14B-720P model, reducing single video generation time by 30%. Wan2.1-I2V-14B-720P is an open-source advanced image-to-video generation model, part of the Wan2.1 video foundation model suite. This 14B model can generate 720P high-definition videos. And after thousands of rounds of human evaluation, this model is reaching state-of-the-art performance levels. It utilizes a diffusion transformer architecture and enhances generation capabilities through innovative spatiotemporal variational autoencoders (VAE), scalable training strategies, and large-scale data construction. The model also understands and processes both Chinese and English text, providing powerful support for video generation tasks

14B,Img2Video

text-to-video

Wan2.1-T2V-14B (Turbo)

Wan2.1-T2V-14B-T is the TeaCache accelerated version of the Wan2.1-T2V-14B model, reducing single video generation time by 30%. The Wan2.1-T2V-14B model has established state-of-the-art performance benchmarks among both open-source and closed-source models, capable of generating high-quality visual content with significant dynamic effects. It is the only video model that can simultaneously generate text in both Chinese and English, and supports video generation at 480P and 720P resolutions. The model adopts a diffusion transformer architecture and enhances its generative capabilities through an innovative spatiotemporal variational autoencoder (VAE), scalable training strategies, and large-scale data construction

14B

Ready to accelerate your AI development?