GLM-4.6V
About GLM-4.6V
GLM-4.6V achieves SOTA (State-of-the-Art) accuracy in visual understanding among models of the same parameter scale. For the first time, it natively integrates Function Call capabilities into the visual model architecture, bridging the gap between "Visual Perception" and "Executable Action." This provides a unified technical foundation for multimodal Agents in real-world business scenarios. Additionally, the visual context window has been expanded to 128k, supporting long video stream processing and high-resolution multi-image analysis.
Explore how GLM-4.6V's advanced visual understanding and function calling capabilities can solve complex, real-world problems.
Visual Scientific Data Analysis
Interpret complex scientific images, charts, and video streams to extract insights, validate experiments, and generate visual summaries.
Use Case Example:
"Analyzed microscopy video of cell division, identifying anomalies and generating a time-series chart, accelerating research into cellular dynamics."
UI/UX Code Generation & Editing
Generate pixel-accurate HTML/CSS from design mockups or screenshots, then refine and edit the UI using natural language commands.
Use Case Example:
"Replicated a complex dashboard UI from a Figma screenshot into clean React components, then adjusted button styles via text command, saving hours of frontend development."
Multimodal Financial Intelligence
Process diverse financial documents閳ユ敃canned reports, market charts, video briefings閳ユ敄o identify trends, assess risks, and execute data retrieval actions.
Use Case Example:
"Interpreted a company's annual report (PDF with charts), cross-referenced with live stock charts via a function call, and summarized investment opportunities."
Agentic Visual System Audits
Audit complex systems by visually inspecting interfaces, logs, and schematics, identifying vulnerabilities, and triggering automated remediation actions via function calls.
Use Case Example:
"Audited a web application's security by visually inspecting network traffic graphs and UI elements, then used a function call to flag a potential XSS vulnerability in the WAF."
Metadata
Specification
State
Deprecated
Architecture
Multimodal MoE
Calibrated
Yes
Mixture of Experts
Yes
Total Parameters
106B
Activated Parameters
106B
Reasoning
No
Precision
FP8
Context length
131K
Max Tokens
131K
Compare with Other Models
See how this model stacks up against others.

Z.ai
chat
GLM-5.1
Release on: Apr 3, 2026
Total Context:
205K
Max output:
131K
Input:
$
1.4
/ M Tokens
Output:
$
4.4
/ M Tokens

Z.ai
chat
GLM-5V-Turbo
Release on: Mar 30, 2026
Total Context:
205K
Max output:
131K
Input:
$
1.2
/ M Tokens
Output:
$
4.0
/ M Tokens

Z.ai
chat
GLM-5
Release on: Feb 12, 2026
Total Context:
205K
Max output:
131K
Input:
$
0.95
/ M Tokens
Output:
$
2.55
/ M Tokens

Z.ai
chat
GLM-4.7
Release on: Dec 23, 2025
Total Context:
205K
Max output:
205K
Input:
$
0.42
/ M Tokens
Output:
$
2.2
/ M Tokens

Z.ai
chat
GLM-4.6V
Release on: Dec 8, 2025
Total Context:
131K
Max output:
131K
Input:
$
0.3
/ M Tokens
Output:
$
0.9
/ M Tokens

Z.ai
chat
GLM-4.6
Release on: Oct 4, 2025
Total Context:
205K
Max output:
205K
Input:
$
0.39
/ M Tokens
Output:
$
1.9
/ M Tokens

Z.ai
chat
GLM-4.5-Air
Release on: Jul 28, 2025
Total Context:
131K
Max output:
131K
Input:
$
0.14
/ M Tokens
Output:
$
0.86
/ M Tokens

Z.ai
chat
GLM-4.5V
Release on: Aug 13, 2025
Total Context:
66K
Max output:
66K
Input:
$
0.14
/ M Tokens
Output:
$
0.86
/ M Tokens

Z.ai
chat
GLM-4.1V-9B-Thinking
Release on: Jul 4, 2025
Total Context:
66K
Max output:
66K
Input:
$
0.035
/ M Tokens
Output:
$
0.14
/ M Tokens
