Models

Products

Pricing

Docs

Blog

About

Contact

🎉 Kimi-K3 is available on SiliconFlow. Try it NOW.

Models

Qwen2.5-VL-32B-Instruct

Qwen2.5-VL-32B-Instruct

API Reference

About Qwen2.5-VL-32B-Instruct

Qwen2.5-VL-32B-Instruct is a multimodal large language model released by the Qwen team, part of the Qwen2.5-VL series. This model is not only proficient in recognizing common objects but is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. It acts as a visual agent that can reason and dynamically direct tools, capable of computer and phone use. Additionally, the model can accurately localize objects in images, and generate structured outputs for data like invoices and tables. Compared to its predecessor Qwen2-VL, this version has enhanced mathematical and problem-solving abilities through reinforcement learning, with response styles adjusted to better align with human preferences

Use Case

Explore how Qwen2.5-VL-32B-Instruct's multimodal intelligence and agentic capabilities solve complex visual and analytical challenges.

Document Data Extraction

Automate data extraction from invoices, forms, and reports, structuring information for efficient processing.

Use Case Example:

"Extracted vendor, item, and total amounts from thousands of scanned invoices, populating a database and cutting manual entry time by 80%."

Visual UI Automation

Automate complex interactions on web or mobile apps by visually understanding layouts and directing actions.

Use Case Example:

"An AI agent navigated an e-commerce site, added items, and completed checkout, adapting to UI changes for robust automation."

Video Event Detection

Analyze long video streams to detect specific events, objects, or activities with precise timestamps and summaries.

Use Case Example:

"Monitored security footage, pinpointing unauthorized access instances and generating alerts with relevant video clips."

Interactive STEM Learning

Provide step-by-step solutions for problems in textbooks, diagrams, or handwritten notes, enhancing STEM education.

Use Case Example:

"Solved a challenging physics problem by analyzing a diagram and equations, providing a detailed, step-by-step derivation."

Metadata

Create on

Mar 24, 2025

License

APACHE-2.0

Provider

Qwen

HuggingFace

Qwen2.5-VL-32B-Instruct

Specification

State

Deprecated

Architecture

Multimodal Transformer

Calibrated

Yes

Mixture of Experts

Total Parameters

32B

Activated Parameters

32B

Reasoning

Precision

FP8

Context length

131K

Max Tokens

131K

Compare with Other Models

See how this model stacks up against others.

Qwen

chat

Qwen3-VL-32B-Instruct

Release on: Oct 21, 2025

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

chat

Qwen3-VL-32B-Thinking

Release on: Oct 21, 2025

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

chat

Qwen3-VL-8B-Instruct

Release on: Oct 15, 2025

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Output:

0.68

/ M Tokens

Qwen

chat

Qwen3-VL-8B-Thinking

Release on: Oct 15, 2025

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Output:

2.0

/ M Tokens

Qwen

chat

Qwen3-VL-235B-A22B-Instruct

Release on: Oct 4, 2025

Total Context:

262K

Max output:

262K

Input:

0.3

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

chat

Qwen3-VL-235B-A22B-Thinking

Release on: Oct 4, 2025

Total Context:

262K

Max output:

262K

Input:

0.45

/ M Tokens

Output:

3.5

/ M Tokens

Qwen

chat

Qwen3-VL-30B-A3B-Instruct

Release on: Oct 5, 2025

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

chat

Qwen3-VL-30B-A3B-Thinking

Release on: Oct 11, 2025

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

image-to-video

Wan2.2-I2V-A14B

Release on: Aug 13, 2025

0.29

/ Video

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?

PAGES

MODELS

PRODUCTS

PAGES

MODELS

PRODUCTS

PAGES

MODELS

PRODUCTS