Qwen3-VL-32B Now on SiliconFlow: Flagship-Level Intelligence with Dense-Model Efficiency

Oct 28, 2025

TL;DR: Qwen3-VL-32B — the latest addition to the Qwen3-VL family — is now available on SiliconFlow. With just 32B dense parameters, it achieves flagship-level multimodal reasoning and comprehension, outperforming GPT-5 mini and Claude 4 Sonnet while delivering faster response, lower cost, and outstanding balance between efficiency and performance. Start building today via SiliconFlow's OpenAI**/Anthropic-compatible API**, and unlock dense-level performance with flagship intelligence.

Building on the success of the Qwen3-VL-235B and Qwen3-VL-8B models already available on SiliconFlow, Qwen3-VL-32B further enriches the Qwen3-VL family, completing full coverage across vision-language understanding scenarios — from lightweight to flagship-level models. Despite using only 32B parameters, it achieves performance comparable to models as large as 235B, and even surpasses them in benchmarks like OSWorld, showcasing remarkable efficiency and reasoning strength.

Through SiliconFlow's Qwen3-VL-32B API, you can expect:

Cost-effective Pricing:
- Qwen3-VL-32B-Instruct: $0.2/M tokens (input) and $0.6/M tokens (output)
- Qwen3-VL-32B-Thinking: $0.2/M tokens (input) and $1.5/M tokens (output)
Two Model Variants:
- Instruct: delivers faster response and more stable execution, ideal for dialogue and tool-calling tasks.
- Thinking: enhances long-chain reasoning and complex visual understanding, capable of "seeing and thinking" through challenging multimodal problems.
262K Context Window: Enables seamless processing of lengthy documents and multi-turn conversations.

Whether you're exploring visual reasoning, document analysis, or multimodal agent development, SiliconFlow's Qwen3-VL-32B API makes it effortless to bring flagship-level multimodal intelligence into real-world applications.

Key Features & Benchmark Performance

The Qwen3-VL series empowers multimodal intelligence across tasks — from visual understanding, content generation to reasoning and creative creation, making seeing and understanding the world lighter, faster, and smarter.

Building on this foundation, Qwen3-VL-32B series achieves new heights in both multimodal and pure-text benchmarks, combining dense-level efficiency with flagship-grade performance:

Multimodal Performance: Qwen3-VL-32B excels in STEM reasoning, VQA, OCR, video understanding, and agentic tasks, consistently outperforming GPT-5 mini and Claude 4 Sonnet across key categories.
Ranks #1 on OSWorld: highlights its ability to "see, reason, and act" across complex visual-agentic tasks.
Textual & Reasoning Performance: Qwen3-VL-32B also demonstrates outstanding pure-text reasoning, showing robust performance in language understanding and logical inference.

As of today, SiliconFlow offers a complete lineup of Qwen3-VL models, featuring:

Dense models: Qwen3-VL-8B and Qwen3-VL-32B
MoE models: Qwen3-VL-30B-A3B and Qwen3-VL-235B-A22B

Each model is available in both Instruct and Thinking variants, allowing developers to flexibly access the corresponding API services and choose the right balance between performance, efficiency, and reasoning depth.

Real-world Application Scenarios

Built for both developers and researchers, Qwen3-VL-32B unlocks new possibilities across multimodal AI applications:

Video Comprehension & Analysis: identify actions, summarize scenes, and track temporal dynamics in long videos for automation or media intelligence.
Visual Reasoning & STEM Tasks: interpret diagrams, scientific charts, and complex math problems with contextual reasoning, ideal for education, research, and technical documentation.

Multimodal Agents: connect perception and reasoning to build intelligent assistants capable of understanding images, analyzing data, and taking contextual actions.
Document & OCR Understanding: extract and summarize key information from scanned documents, receipts, or handwritten notes with high precision.

Get Started Immediately

Explore: Try Qwen3-VL-32B in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "Qwen/Qwen3-VL-32B-Thinking",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": "https://sf-maas.s3.us-east-1.amazonaws.com/images/recufyDh5zjKVl.png"}
                },
                {
                    "type": "text",
                    "text": "what's this?"
                }
            ]
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)