Qwen2.5-VL-72B-Instruct
About Qwen2.5-VL-72B-Instruct
Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks
Available Serverless
Run queries immediately, pay only for usage
$
0.59
/
$
0.59
Per 1M Tokens (input/output)
Metadata
Specification
State
Available
Architecture
Vision-Language Model (VLM) with a Streamlined and Efficient Vision Encoder (ViT with window attention, SwiGLU, RMSNorm) aligned with the Qwen2.5 LLM structure. Features include Dynamic Resolution and Frame Rate Training for video understanding, mRoPE for temporal sequence and speed, and YaRN for long text context length extrapolation.
Calibrated
No
Mixture of Experts
No
Total Parameters
72B
Activated Parameters
72B
Reasoning
No
Precision
FP8
Context length
131K
Max Tokens
4K
Supported Functionality
Serverless
Supported
Serverless LoRA
Not supported
Fine-tuning
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported
JSON Mode
Not supported
Structured Outputs
Not supported
Tools
Not supported
Fim Completion
Not supported
Chat Prefix Completion
Supported
Compare with Other Models
See how this model stacks up against others.

Qwen
chat
Qwen3-VL-32B-Instruct
Release on: Oct 21, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Output:
$
0.6
/ M Tokens

Qwen
chat
Qwen3-VL-32B-Thinking
Release on: Oct 21, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Output:
$
1.5
/ M Tokens

Qwen
chat
Qwen3-VL-8B-Instruct
Release on: Oct 15, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.18
/ M Tokens
Output:
$
0.68
/ M Tokens

Qwen
chat
Qwen3-VL-8B-Thinking
Release on: Oct 15, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.18
/ M Tokens
Output:
$
2.0
/ M Tokens

Qwen
chat
Qwen3-VL-235B-A22B-Instruct
Release on: Oct 4, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.3
/ M Tokens
Output:
$
1.5
/ M Tokens

Qwen
chat
Qwen3-VL-235B-A22B-Thinking
Release on: Oct 4, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.45
/ M Tokens
Output:
$
3.5
/ M Tokens

Qwen
chat
Qwen3-VL-30B-A3B-Instruct
Release on: Oct 5, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.29
/ M Tokens
Output:
$
1.0
/ M Tokens

Qwen
chat
Qwen3-VL-30B-A3B-Thinking
Release on: Oct 11, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.29
/ M Tokens
Output:
$
1.0
/ M Tokens

Qwen
image-to-video
Wan2.2-I2V-A14B
Release on: Aug 13, 2025
$
0.29
/ Video
