GLM-4.5 Now Available on SiliconFlow: Open-Source SOTA Model for Reasoning, Code, and Agentic Applications

Jul 28, 2025

Today, we're excited to integrate GLM-4.5 and GLM-4.5-Air, Z.ai's latest flagship model serie, into SiliconFlow platform. This breakthrough model series represents a significant milestone in AGI development by natively unifying reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast-rising agentic applications.

Whether you're tackling full-stack development projects, sophisticated code refactoring, or building autonomous agent systems, GLM-4.5 provides the advanced functionality and reliability that intelligent agentic applications demand. This powerful addition to our model catalog empowers developers to push the boundaries of what's possible in intelligent automation and complex problem-solving scenarios.

With SiliconFlow's GLM-4.5 API, you can expect:

Cost-Effective Pricing: GLM-4.5 $0.5/M tokens (input) and $2/M tokens (output); GLM-4.5-Air $0.14/M tokens (input) and $0.86/M tokens (output).
Extended Context Window: 128K context window for complex tasks.

Key Capabilities & Benchmark Performance

The GLM-4.5 model series now available on SiliconFlow features the following key capabilities:

SOTA Performance: Delivers state-of-the-art results among open-source models in reasoning, code generation and agentic capabilities, with industry-leading performance in real-world code agent evaluations.
MoE Architecture: GLM-4.5 has 355B total/32B active parameters, while GLM-4.5-Air adopts a compact design with 106B total/12B active parameters. Both leverage the Mixture of Experts design for optimal efficiency.
Hybrid Inference: Both provide thinking mode for complex tasks and non-thinking mode for immediate responses.

To comprehensively evaluate GLM-4.5's general capabilities, Z.ai selected 12 representative benchmarks spanning three core domains: reasoning (MMLU Pro, AIME 24, MATH 500), coding (SciCode, GPQA, HLE, LiveCodeBench, SWE-Bench Verified), and agentic capabilities (Terminal-Bench, TAU-Bench, BFCL v3, BrowseComp).

Across these comprehensive metrics, GLM-4.5 demonstrates outstanding performance:

Global Ranking: Ranks 3rd globally across all models on the 12 comprehensive benchmarks, scoring 63.2 — just behind the leader Grok-4 (63.6) and surpassing Claude 4 Opus (60.9).
Open-Source Champion: Top-performing model in the open-source category.
Technical Domains: Demonstrates excellence across mathematical reasoning, scientific problem-solving, code generation, agent workflows, and complex task execution.

What Makes GLM-4.5 So Powerful

Advanced Training Pipeline

Z.ai developed GLM-4.5 using a sophisticated three-stage process:

Pre-training: 15 trillion tokens of general-purpose data for foundational capabilities.
Domain-specific training: 8 trillion tokens focused on code, reasoning, and agent tasks.
Reinforcement learning: Enhanced performance across reasoning, coding, and agent workflows.

Superior Parameter Efficiency

Through Pareto Frontier analysis, GLM-4.5 demonstrates exceptional efficiency:

Optimal scaling: Superior performance relative to models of comparable scale.
Efficiency leadership: Achieves optimal efficiency on the performance-scale trade-off boundary.
Resource advantage: Half the parameters of DeepSeek-R1, one-third of Kimi-K2.
Cost benefits: Higher parameter efficiency translates to faster inference and lower operational costs.

Real-word Performance

Beyond benchmark evaluations, GLM-4.5's practical capabilities have been rigorously tested in real-world coding scenarios:

Agentic Coding Evaluation

Independent evaluation of GLM-4.5's agentic coding capabilities was conducted using Claude Code across 52 diverse coding tasks, including frontend development, tool creation, data analysis, testing, and algorithm implementation.

Competitive Results:

vs. Kimi K2: 53.9% win rate in head-to-head comparisons.
vs. Qwen3-Coder: 80.8% success rate, demonstrating clear superiority
vs. Claude-4-Sonnet: Competitive performance, though further optimization remains possible
Tool calling accuracy: Leading 90.6% success rate, surpassing Claude-4-Sonnet (89.5%), Kimi-K2 (86.2%), and Qwen3-Coder (77.1%)

Real Application Scenarios

GLM-4.5's capabilities extend beyond benchmarks into practical development scenarios, demonstrating versatility across multiple domains through real-world implementations.

Interactive Artifact Creation

GLM-4.5 creates sophisticated standalone artifacts—from interactive mini-games to physics simulations—across HTML, SVG, Python and other formats, delivering superior user experiences for advanced agentic coding applications.

Slides Creation

Leveraging GLM-4.5's powerful agentic tool usage and HTML coding capabilities, the model-native PPT/Poster agent autonomously searches the web, retrieves images, and creates slides from simple requests or uploaded documents.

Full-Stack Web Development

GLM-4.5 excels in both frontend and backend development for modern web applications. Users can create entire websites with just a few words, then effortlessly add features through multi-turn dialogue, making the coding process smooth and enjoyable.

These real-world scenarios demonstrate GLM-4.5's practical utility in professional development workflows, from rapid prototyping to complete application delivery.

Get Started Immediately

Explore: Try GLM-4.5 & GLM-4.5-Air in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.5",
    "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
    ],
    "top_p": 0.95,
    "temperature": 0.6
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.5",
    "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
    ],
    "top_p": 0.95,
    "temperature": 0.6
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.5",
    "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
    ],
    "top_p": 0.95,
    "temperature": 0.6
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)