Models

Products

Pricing

Docs

Blog

About

Contact

Back to Models

Qwen2.5-VL-72B-Instruct

Qwen/Qwen2.5-VL-72B-Instruct

Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.ap.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "Qwen/Qwen2.5-VL-72B-Instruct",
  "stream": false,
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "frequency_penalty": 0.5,
  "n": 1,
  "stop": []
}'

Details

Model Provider

Qwen2.5

Type

text

Sub Type

chat

Size

72

Publish Time

Jan 28, 2025

Input Price

$

0.59

/ M Tokens

Output Price

$

0.59

/ M Tokens

Context length

131072

Tags

72B,128K

Open in Playround

API Reference

Ready to accelerate your AI development?

Ready to accelerate your AI development?

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·