Qwen2.5-VL-72B-Instruct
Qwen/Qwen2.5-VL-72B-Instruct
Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks

Details
Model Provider
Qwen2.5
Type
text
Sub Type
chat
Size
72
Publish Time
Jan 28, 2025
Input Price
$
0.59
/ M Tokens
Output Price
$
0.59
/ M Tokens
Context length
131072
Tags
72B,128K