GLM-4.1V-9B-Thinking

GLM-4.1V-9B-Thinking

THUDM/GLM-4.1V-9B-Thinking

About GLM-4.1V-9B-Thinking

GLM-4.1V-9B-Thinking is an open-source Vision-Language Model (VLM) jointly released by Zhipu AI and Tsinghua University's KEG lab, designed to advance general-purpose multimodal reasoning. Built upon the GLM-4-9B-0414 foundation model, it introduces a 'thinking paradigm' and leverages Reinforcement Learning with Curriculum Sampling (RLCS) to significantly enhance its capabilities in complex tasks. As a 9B-parameter model, it achieves state-of-the-art performance among models of a similar size, and its performance is comparable to or even surpasses the much larger 72B-parameter Qwen-2.5-VL-72B on 18 different benchmarks. The model excels in a diverse range of tasks, including STEM problem-solving, video understanding, and long document understanding, and it can handle images with resolutions up to 4K and arbitrary aspect ratios

Available Serverless

Run queries immediately, pay only for usage

$

0.035

/

$

0.14

Per 1M Tokens (input/output)

Metadata

Create on

Jul 4, 2025

License

mit

Provider

Z.ai

Specification

State

Available

Architecture

Calibrated

No

Mixture of Experts

No

Total Parameters

9

Activated Parameters

9B

Reasoning

No

Precision

FP8

Context length

66K

Max Tokens

66K

Supported Functionality

Serverless

Supported

Serverless LoRA

Not supported

Fine-tuning

Not supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Supported

JSON Mode

Not supported

Structured Outputs

Not supported

Tools

Not supported

Fim Completion

Not supported

Chat Prefix Completion

Not supported

Model FAQs: Usage, Deployment

Learn how to use, fine-tune, and deploy this model with ease.

Ready to accelerate your AI development?

Ready to accelerate your AI development?