GLM-4.5V API, Fine-Tuning, Deployment
zai-org/GLM-4.5V
GLM-4.5V is the latest generation vision-language model (VLM) released by Zhipu AI. The model is built upon the flagship text model GLM-4.5-Air, which has 106B total parameters and 12B active parameters, and it utilizes a Mixture-of-Experts (MoE) architecture to achieve superior performance at a lower inference cost. Technically, GLM-4.5V follows the lineage of GLM-4.1V-Thinking and introduces innovations like 3D Rotated Positional Encoding (3D-RoPE), significantly enhancing its perception and reasoning abilities for 3D spatial relationships. Through optimization across pre-training, supervised fine-tuning, and reinforcement learning phases, the model is capable of processing diverse visual content such as images, videos, and long documents, achieving state-of-the-art performance among open-source models of its scale on 41 public multimodal benchmarks. Additionally, the model features a 'Thinking Mode' switch, allowing users to flexibly choose between quick responses and deep reasoning to balance efficiency and effectiveness
Details
Model Provider
zai
Type
text
Sub Type
chat
Size
106B
Publish Time
Aug 13, 2025
Input Price
$
0.14
/ M Tokens
Output Price
$
0.86
/ M Tokens
Context length
66K
Tags
Reasoning,MoE,106B,66K
Compare with Other Models
See how this model stacks up against others.