THUDM/GLM-4.1V-9B-Thinking
THUDM/GLM-4.1V-9B-Thinking
GLM-4.1V-9B-Thinking is an open-source Vision-Language Model (VLM) jointly released by Zhipu AI and Tsinghua University's KEG lab, designed to advance general-purpose multimodal reasoning. Built upon the GLM-4-9B-0414 foundation model, it introduces a 'thinking paradigm' and leverages Reinforcement Learning with Curriculum Sampling (RLCS) to significantly enhance its capabilities in complex tasks. As a 9B-parameter model, it achieves state-of-the-art performance among models of a similar size, and its performance is comparable to or even surpasses the much larger 72B-parameter Qwen-2.5-VL-72B on 18 different benchmarks. The model excels in a diverse range of tasks, including STEM problem-solving, video understanding, and long document understanding, and it can handle images with resolutions up to 4K and arbitrary aspect ratios
Details
Model Provider
THUDM
Type
text
Sub Type
chat
Size
9B
Publish Time
Jul 4, 2025
Input Price
$
0.035
/ M Tokens
Output Price
$
0.14
/ M Tokens
Context length
64K
Tags
9B,64K