Models

Products

Pricing

Docs

Blog

About

Contact

Back to Models

GLM-4.5V API, Deployment, Pricing

zai-org/GLM-4.5V

GLM-4.5V is the latest generation vision-language model (VLM) released by Zhipu AI. The model is built upon the flagship text model GLM-4.5-Air, which has 106B total parameters and 12B active parameters, and it utilizes a Mixture-of-Experts (MoE) architecture to achieve superior performance at a lower inference cost. Technically, GLM-4.5V follows the lineage of GLM-4.1V-Thinking and introduces innovations like 3D Rotated Positional Encoding (3D-RoPE), significantly enhancing its perception and reasoning abilities for 3D spatial relationships. Through optimization across pre-training, supervised fine-tuning, and reinforcement learning phases, the model is capable of processing diverse visual content such as images, videos, and long documents, achieving state-of-the-art performance among open-source models of its scale on 41 public multimodal benchmarks. Additionally, the model features a 'Thinking Mode' switch, allowing users to flexibly choose between quick responses and deep reasoning to balance efficiency and effectiveness

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "zai-org/GLM-4.5V",
  "max_tokens": 512,
  "enable_thinking": true,
  "thinking_budget": 4096,
  "min_p": 0.05,
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "frequency_penalty": 0.5,
  "n": 1,
  "messages": [
    {
      "content": "how are you",
      "role": "user"
    }
  ]
}'

Details

Model Provider

Z.ai

Type

text

Sub Type

chat

Size

text

Publish Time

Aug 13, 2025

Input Price

0.14

/ M Tokens

Output Price

0.86

/ M Tokens

Context length

66K

DeepSeek-V3.1-Terminus

Release on: Sep 29, 2025

DeepSeek-V3.1-Terminus is an updated version of the V3.1 model from DeepSeek, positioned as a hybrid, agent-oriented large language model. This update maintains the model's original capabilities while focusing on addressing user-reported issues and improving stability. It significantly enhances language consistency, reducing instances of mixed Chinese-English text and abnormal characters. The model integrates both a 'Thinking Mode' for complex, multi-step reasoning and a 'Non-thinking Mode' for direct, quick responses, switchable via the chat template. As a key enhancement, V3.1-Terminus features improved performance for its Code Agent and Search Agent, making it more reliable for tool use and executing complex, multi-step tasks...

Total Context:

164K

Max output:

164K

Input:

0.27

/M Tokens

Output:

1.0

/M Tokens

DeepSeek

chat

DeepSeek-R1

Release on: May 28, 2025

DeepSeek-R1-0528 is a reasoning model powered by reinforcement learning (RL) that addresses the issues of repetition and readability. Prior to RL, DeepSeek-R1 incorporated cold-start data to further optimize its reasoning performance. It achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks, and through carefully designed training methods, it has enhanced overall effectiveness...

Total Context:

164K

Max output:

164K

Input:

0.5

/ M Tokens

Output:

2.18

/ M Tokens

DeepSeek

chat

DeepSeek-R1-0120

Release on: Sep 18, 2025

DeepSeek-R1 is a reasoning model powered by reinforcement learning (RL) that addresses the issues of repetition and readability. Prior to RL, DeepSeek-R1 incorporated cold-start data to further optimize its reasoning performance. It achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks, and through carefully designed training methods, it has enhanced overall effectiveness...

Total Context:

66K

Max output:

Input:

0.58

/ M Tokens

Output:

2.29

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Llama-70B

Release on: Sep 18, 2025

DeepSeek-R1-Distill-Llama-70B is a distilled model based on Llama-3.3-70B-Instruct. As part of the DeepSeek-R1 series, it was fine-tuned using samples generated by DeepSeek-R1 and demonstrates excellent performance across mathematics, programming, and reasoning tasks. The model achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, showcasing its strong reasoning capabilities...

Total Context:

33K

Max output:

Input:

0.59

/ M Tokens

Output:

0.59

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Llama-8B

Release on: Sep 18, 2025

DeepSeek-R1-Distill-Llama-8B is a distilled model based on Llama-3.1-8B. The model was fine-tuned using samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved notable results across various benchmarks, including 89.1% accuracy on MATH-500, 50.4% pass rate on AIME 2024, and a rating of 1205 on CodeForces, showing impressive mathematical and programming abilities for an 8B-scale model...

Total Context:

33K

Max output:

Input:

0.06

/ M Tokens

Output:

0.06

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-1.5B

Release on: Sep 18, 2025

DeepSeek-R1-Distill-Qwen-1.5B is a distilled model based on Qwen2.5-Math-1.5B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates decent performance across various benchmarks. As a lightweight model, it achieved 83.9% accuracy on MATH-500, 28.9% pass rate on AIME 2024, and a rating of 954 on CodeForces, showing reasoning capabilities beyond its parameter scale...

Total Context:

33K

Max output:

Input:

0.02

/ M Tokens

Output:

0.02

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-14B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-14B is a distilled model based on Qwen2.5-14B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 93.9% accuracy on MATH-500, 69.7% pass rate on AIME 2024, and a rating of 1481 on CodeForces, showcasing its powerful abilities in mathematics and programming tasks...

Total Context:

131K

Max output:

131K

Input:

0.1

/ M Tokens

Output:

0.1

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-32B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...

Total Context:

131K

Max output:

131K

Input:

0.18

/ M Tokens

Output:

0.18

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-7B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-7B is a distilled model based on Qwen2.5-Math-7B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 92.8% accuracy on MATH-500, 55.5% pass rate on AIME 2024, and a rating of 1189 on CodeForces, showing remarkable mathematical and programming abilities for a 7B-scale model...

Total Context:

33K

Max output:

16K

Input:

0.05

/ M Tokens

Output:

0.05

/ M Tokens

DeepSeek

chat

DeepSeek-V3

Release on: Dec 26, 2024

The new version of DeepSeek-V3 (DeepSeek-V3-0324) utilizes the same base model as the previous DeepSeek-V3-1226, with improvements made only to the post-training methods. The new V3 model incorporates reinforcement learning techniques from the training process of the DeepSeek-R1 model, significantly enhancing its performance on reasoning tasks. It has achieved scores surpassing GPT-4.5 on evaluation sets related to mathematics and coding. Additionally, the model has seen notable improvements in tool invocation, role-playing, and casual conversation capabilities....

Total Context:

164K

Max output:

164K

Input:

0.27

/ M Tokens

Output:

1.13

/ M Tokens

DeepSeek

chat

DeepSeek-V3.1

Release on: Aug 25, 2025

DeepSeek-V3.1 is a hybrid large language model released by DeepSeek AI, featuring significant upgrades over its predecessor. A key innovation is the integration of both a 'Thinking Mode' for deliberative, chain-of-thought reasoning and a 'Non-thinking Mode' for direct responses, which can be switched via the chat template to suit various tasks. The model's capabilities in tool use and agent tasks have been substantially improved through post-training optimization, enabling better support for external search tools and complex multi-step instructions. DeepSeek-V3.1 is post-trained on top of the DeepSeek-V3.1-Base model, which underwent a two-phase long-context extension with a vastly expanded dataset, enhancing its ability to process long documents and codebases. As an open-source model, DeepSeek-V3.1 demonstrates performance comparable to leading closed-source models on various benchmarks, particularly in coding, math, and reasoning, while its Mixture-of-Experts (MoE) architecture maintains a massive parameter count while reducing inference costs...

Total Context:

164K

Max output:

164K

Input:

0.27

/ M Tokens

Output:

1.1

/ M Tokens

DeepSeek

chat

DeepSeek-VL2

Release on: Dec 13, 2024

DeepSeek-VL2 is a mixed-expert (MoE) vision-language model developed based on DeepSeekMoE-27B, employing a sparse-activated MoE architecture to achieve superior performance with only 4.5B active parameters. The model excels in various tasks including visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Compared to existing open-source dense models and MoE-based models, it demonstrates competitive or state-of-the-art performance using the same or fewer active parameters....

Total Context:

Max output:

Input:

0.15

/ M Tokens

Output:

0.15

/ M Tokens