
Z.ai
Text Generation
GLM-5.1
GLM-5.1 is Z.ai's next-generation flagship model built for agentic engineering. It is designed to run continuously for hours or even longer, refining its strategy as it works—the longer it runs, the better the results....
Total Context:
205K
Max output:
131K
Input:
$
1.4
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
4.4
/ M Tokens

Z.ai
Text Generation
GLM-5V-Turbo
GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....
Total Context:
205K
Max output:
131K
Input:
$
1.2
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
4.0
/ M Tokens

Z.ai
Text Generation
GLM-5
GLM-5 is a next-generation open-source model for complex systems engineering and long-horizon agentic tasks, scaled to ~744B sparse parameters (~40B active) with ~28.5T pretraining tokens. It integrates DeepSeek Sparse Attention (DSA) to retain long-context capacity while reducing inference cost, and leverages the “slime” asynchronous RL stack to deliver strong performance in reasoning, coding, and agentic benchmarks....
Total Context:
205K
Max output:
131K
Input:
$
0.95
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
2.55
/ M Tokens

Z.ai
Text Generation
GLM-4.7
GLM-4.7 is Zhipu’s new-generation flagship model, with 355B total parameters and 32B activated parameters, delivering comprehensive upgrades in general conversation, reasoning, and agent capabilities. Responses are more concise and natural; writing feels more immersive; tool-call instructions are followed more reliably; and the front-end polish of artifacts and agentic coding—along with long-horizon task completion efficiency—has been further improved....
Total Context:
205K
Max output:
205K
Input:
$
0.42
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
2.2
/ M Tokens

Z.ai
Text Generation
GLM-4.6V
GLM-4.6V achieves SOTA (State-of-the-Art) accuracy in visual understanding among models of the same parameter scale. For the first time, it natively integrates Function Call capabilities into the visual model architecture, bridging the gap between "Visual Perception" and "Executable Action." This provides a unified technical foundation for multimodal Agents in real-world business scenarios. Additionally, the visual context window has been expanded to 128k, supporting long video stream processing and high-resolution multi-image analysis....
Total Context:
131K
Max output:
131K
Input:
$
0.3
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.9
/ M Tokens

Z.ai
Text Generation
GLM-4.6
Compared with GLM-4.5, GLM-4.6 brings several key improvements, including longer context window expanded to 200K tokens, superior coding performance, advanced reasoning, more capable agents, and refined writing....
Total Context:
205K
Max output:
205K
Input:
$
0.39
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.9
/ M Tokens

Z.ai
Text Generation
GLM-4.5-Air
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. It’s also a hybrid reasoning model providing both thinking and non-thinking mode. ...
Total Context:
131K
Max output:
131K
Input:
$
0.14
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.86
/ M Tokens

