Qwen3-VL-235B-A22B-Instruct
About Qwen3-VL-235B-A22B-Instruct
Qwen3-VL-235B-A22B-Instruct is a 235B parameters Mixture-of-Experts (MoE) vision-language model, with 22B activated parameters. It is an instruction-tuned version of Qwen3-VL-235B-A22B and is aligned for chat applications.
Explore how Qwen3-VL-235B-A22B-Instruct's advanced vision-language capabilities and multimodal reasoning can solve complex, real-world problems.
AI UI Automation
Automate complex UI tasks across web and mobile applications by visually understanding interfaces and executing actions.
Use Case Example:
"Automatically navigates a new e-commerce website, adds items to cart, and completes checkout by interpreting visual cues and interacting with UI elements, without explicit API calls."
Visual Code Generation
Transform visual designs (sketches, mockups, or video demonstrations) directly into functional web components or diagrams.
Use Case Example:
"Converts a hand-drawn wireframe of a web page into responsive HTML/CSS/JS code, including interactive elements, significantly accelerating front-end development workflows."
Advanced Video Analytics
Analyze lengthy video footage for specific events, objects, or actions, generating detailed summaries and insights with second-level indexing.
Use Case Example:
"Processes an 8-hour security camera feed, identifying all instances of unauthorized access, tracking specific individuals, and generating a timestamped report with visual evidence."
Multimodal Document AI
Extract, analyze, and reason over information from complex, visually rich documents, including scanned images, reports, and engineering schematics.
Use Case Example:
"Parses a multi-page engineering blueprint, extracting component lists, identifying spatial relationships between parts, and flagging potential design inconsistencies based on visual and textual data."
Spatial Reasoning for Robotics
Enable AI systems to understand and interact with physical environments by accurately perceiving object positions, orientations, and spatial relationships.
Use Case Example:
"Guides a robotic arm to precisely pick and place irregularly shaped objects from a cluttered bin, adapting to varying viewpoints and partial occlusions in real-time."
Metadata
Specification
State
Deprecated
Architecture
Mixture of Experts
Calibrated
Yes
Mixture of Experts
Yes
Total Parameters
235B
Activated Parameters
22B
Reasoning
No
Precision
FP8
Context length
262K
Max Tokens
262K
Compare with Other Models
See how this model stacks up against others.

Qwen
chat
Qwen3.6-35B-A3B
Release on: Apr 17, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Output:
$
1.6
/ M Tokens

Qwen
chat
Qwen3.6-27B
Release on: Apr 23, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.3
/ M Tokens
Output:
$
3.2
/ M Tokens

Qwen
chat
Qwen3.5-397B-A17B
Release on: Apr 24, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.39
/ M Tokens
Output:
$
2.34
/ M Tokens

Qwen
chat
Qwen3.5-122B-A10B
Release on: Apr 24, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.26
/ M Tokens
Output:
$
2.08
/ M Tokens

Qwen
chat
Qwen3.5-35B-A3B
Release on: Feb 25, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.24
/ M Tokens
Output:
$
1.8
/ M Tokens

Qwen
chat
Qwen3.5-27B
Release on: Apr 24, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.25
/ M Tokens
Output:
$
2.0
/ M Tokens

Qwen
chat
Qwen3.5-9B
Release on: Apr 24, 2026
Total Context:
262K
Max output:
262K
Input:
$
0.1
/ M Tokens
Output:
$
0.15
/ M Tokens

Qwen
chat
Qwen3-VL-32B-Instruct
Release on: Oct 21, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Output:
$
0.6
/ M Tokens

Qwen
chat
Qwen3-VL-32B-Thinking
Release on: Oct 21, 2025
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Output:
$
1.5
/ M Tokens
