State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

Z.ai

Text Generation

GLM-4.7

Release on: Dec 23, 2025

GLM-4.7 is Zhipu’s new-generation flagship model, with 355B total parameters and 32B activated parameters, delivering comprehensive upgrades in general conversation, reasoning, and agent capabilities. Responses are more concise and natural; writing feels more immersive; tool-call instructions are followed more reliably; and the front-end polish of artifacts and agentic coding—along with long-horizon task completion efficiency—has been further improved....

Total Context:

200K

Max output:

131K

Input:

$

0.6

/ M Tokens

Output:

$

2.2

/ M Tokens

Z.ai

Text Generation

GLM-4.6V

Release on: Dec 8, 2025

GLM-4.6V achieves SOTA (State-of-the-Art) accuracy in visual understanding among models of the same parameter scale. For the first time, it natively integrates Function Call capabilities into the visual model architecture, bridging the gap between "Visual Perception" and "Executable Action." This provides a unified technical foundation for multimodal Agents in real-world business scenarios. Additionally, the visual context window has been expanded to 128k, supporting long video stream processing and high-resolution multi-image analysis....

Total Context:

131K

Max output:

131K

Input:

$

0.3

/ M Tokens

Output:

$

0.9

/ M Tokens

Z.ai

Text Generation

GLM-4.6

Release on: Oct 4, 2025

Compared with GLM-4.5, GLM-4.6 brings several key improvements, including longer context window expanded to 200K tokens, superior coding performance, advanced reasoning, more capable agents, and refined writing....

Total Context:

205K

Max output:

205K

Input:

$

0.39

/ M Tokens

Output:

$

1.9

/ M Tokens

Z.ai

Text Generation

GLM-4.5

Release on: Jul 28, 2025

The GLM-4.5 series models are foundation models designed for intelligent agents, by unifying reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, providing two modes: thinking and non-thinking....

Total Context:

131K

Max output:

131K

Input:

$

0.4

/ M Tokens

Output:

$

2.0

/ M Tokens

Z.ai

Text Generation

GLM-4.5-Air

Release on: Jul 28, 2025

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. It’s also a hybrid reasoning model providing both thinking and non-thinking mode. ...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.86

/ M Tokens

Z.ai

Text Generation

GLM-4.5V

Release on: Aug 13, 2025

As a part of the GLM-V family of models, GLM-4.5V is based on ZhipuAI’s foundation model GLM-4.5-Air, achieving SOTA performance on tasks such as image, video, and document understanding, as well as GUI agent operations....

Total Context:

66K

Max output:

66K

Input:

$

0.14

/ M Tokens

Output:

$

0.86

/ M Tokens

Z.ai

Text Generation

GLM-4.1V-9B-Thinking

Release on: Jul 4, 2025

GLM-4.1V-9B-Thinking is an open-source Vision-Language Model (VLM) jointly released by Zhipu AI and Tsinghua University's KEG lab, designed to advance general-purpose multimodal reasoning. Built upon the GLM-4-9B-0414 foundation model, it introduces a 'thinking paradigm' and leverages Reinforcement Learning with Curriculum Sampling (RLCS) to significantly enhance its capabilities in complex tasks. As a 9B-parameter model, it achieves state-of-the-art performance among models of a similar size, and its performance is comparable to or even surpasses the much larger 72B-parameter Qwen-2.5-VL-72B on 18 different benchmarks. The model excels in a diverse range of tasks, including STEM problem-solving, video understanding, and long document understanding, and it can handle images with resolutions up to 4K and arbitrary aspect ratios...

Total Context:

66K

Max output:

66K

Input:

$

0.035

/ M Tokens

Output:

$

0.14

/ M Tokens

Z.ai

Text Generation

GLM-Z1-32B-0414

Release on: Apr 18, 2025

GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This model was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, the team also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities. Despite having only 32B parameters, its performance on certain tasks is comparable to DeepSeek-R1 with 671B parameters. Through evaluations on benchmarks such as AIME 24/25, LiveCodeBench, and GPQA, the model demonstrates strong mathematical reasoning abilities and can support solutions for a wider range of complex tasks...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Z.ai

Text Generation

GLM-4-32B-0414

Release on: Apr 18, 2025

GLM-4-32B-0414 is a new generation model in the GLM family with 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, the team enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. On several benchmarks, its performance approaches or even exceeds that of larger models like GPT-4o and DeepSeek-V3-0324 (671B)...

Total Context:

33K

Max output:

33K

Input:

$

0.27

/ M Tokens

Output:

$

0.27

/ M Tokens

Z.ai

Text Generation

GLM-Z1-9B-0414

Release on: Apr 18, 2025

GLM-Z1-9B-0414 is a small-sized model in the GLM series with only 9 billion parameters that maintains the open-source tradition while showcasing surprising capabilities. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent performance in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. The research team employed the same series of techniques used for larger models to train this 9B model. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. The model features deep thinking capabilities and can handle long contexts through YaRN technology, making it particularly suitable for applications requiring mathematical reasoning abilities with limited computational resources...

Total Context:

131K

Max output:

131K

Input:

$

0.086

/ M Tokens

Output:

$

0.086

/ M Tokens

Z.ai

Text Generation

GLM-4-9B-0414

Release on: Apr 18, 2025

GLM-4-9B-0414 is a small-sized model in the GLM series with 9 billion parameters. This model inherits the technical characteristics of the GLM-4-32B series but offers a more lightweight deployment option. Despite its smaller scale, GLM-4-9B-0414 still demonstrates excellent capabilities in code generation, web design, SVG graphics generation, and search-based writing tasks. The model also supports function calling features, allowing it to invoke external tools to extend its range of capabilities. The model shows a good balance between efficiency and effectiveness in resource-constrained scenarios, providing a powerful option for users who need to deploy AI models under limited computational resources. Like other models in the same series, GLM-4-9B-0414 also demonstrates competitive performance in various benchmark tests...

Total Context:

33K

Max output:

33K

Input:

$

0.086

/ M Tokens

Output:

$

0.086

/ M Tokens

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?