Models

Products

Pricing

Docs

Blog

About

Contact

🎉 MiniMax-M2 Now on SiliconFlow: Frontier-Style Coding and Agentic Intelligence

State-of-the-Art

AI Models

Access a comprehensive library of cutting-edge AI models for LLMs, image, video, and audio generation, all through our high-performance inference API.

Get Started

Contact Sales

All

LLM

Vision

Image

Video

Audio

Reranker

Embedding

MiniMaxAI

chat

MiniMax-M2

Release on: Oct 28, 2025

MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever...

Total Context:

197K

Max output:

131K

Input:

0.3

/ M Tokens

Output:

1.2

/ M Tokens

DeepSeek

chat

DeepSeek-V3.2-Exp

Release on: Oct 10, 2025

DeepSeek-V3.2-Exp is an experimental version of DeepSeek model, built on V3.1-Terminus. It debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context....

Total Context:

164K

Max output:

164K

Input:

0.27

/ M Tokens

Output:

0.41

/ M Tokens

DeepSeek

chat

DeepSeek-V3.1-Terminus

Release on: Sep 29, 2025

DeepSeek-V3.1-Terminus is an updated version built on V3.1’s strengths while addressing key user feedback. It improves in language consistency, reducing instances of mixed Chinese-English text and occasional abnormal characters. And also upgrades in stronger Code Agent & Search Agent performance....

Total Context:

164K

Max output:

164K

Input:

0.27

/ M Tokens

Output:

1.0

/ M Tokens

DeepSeek

chat

DeepSeek-V3.1

Release on: Aug 25, 2025

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly....

Total Context:

164K

Max output:

164K

Input:

0.27

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

chat

Qwen3-VL-32B-Instruct

Release on: Oct 21, 2025

Qwen3-VL is the vision-language model in the Qwen3 series, achieving state-of-the-art(SOTA)performance on various vision-language(VL)benchmarks. The model supports high-resolution image inputs up to the megapixel level and possesses strong capabilities in general visual understanding, multilingual OCR, fine-grained visual grounding, and visual dialogue. As part of the Qwen3 series, it inherits a powerful language foundation, enabling it to understand and execute complex instructions....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

chat

Qwen3-VL-32B-Thinking

Release on: Oct 21, 2025

Qwen3-VL-Thinking is a version of the Qwen3-VL series specially optimized for complex visual reasoning tasks. It incorporates a "Thinking Mode" , enabling it to generate detailed intermediate reasoning steps (Chain-of-Thought) before providing a final answer. This design significantly enhances the model's performance on visual question answering (VQA) and other vision-language tasks that require multi-step logic, planning, and in-depth analysis....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

chat

Qwen3-VL-8B-Instruct

Release on: Oct 15, 2025

Qwen3-VL-8B-Instruct is the vision-language model of the Qwen3 series, demonstrates strong capabilities in general visual understanding, visual-centric dialogue, and multilingual text recognition in images. ...

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Output:

0.68

/ M Tokens

Qwen

chat

Qwen3-VL-8B-Thinking

Release on: Oct 15, 2025

Qwen3-VL-8B-Thinking is a vision-language model from the Qwen3 series, optimized for scenarios requiring complex reasoning. In this Thinking mode, the model performs step-by-step thinking and reasoning before providing the final answer....

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Output:

2.0

/ M Tokens

Qwen

chat

Qwen3-VL-235B-A22B-Instruct

Release on: Oct 4, 2025

Qwen3-VL-235B-A22B-Instruct is a 235B parameters Mixture-of-Experts (MoE) vision-language model, with 22B activated parameters. It is an instruction-tuned version of Qwen3-VL-235B-A22B and is aligned for chat applications. ...

Total Context:

262K

Max output:

262K

Input:

0.3

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

chat

Qwen3-VL-235B-A22B-Thinking

Release on: Oct 4, 2025

Qwen3-VL-235B-A22B-Thinking is one of the Qwen3-VL series models, a reasoning-enhanced Thinking edition that achieves state-of-the-art (SOTA) results across many multimodal reasoning benchmarks, excelling in STEM, math, causal analysis, and logical, evidence-based answers. It features a Mixture-of-Experts (MoE) architecture with 235B total parameters and 22B active parameters. ...

Total Context:

262K

Max output:

262K

Input:

0.45

/ M Tokens

Output:

3.5

/ M Tokens

Qwen

chat

Qwen3-VL-30B-A3B-Instruct

Release on: Oct 5, 2025

Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

chat

Qwen3-VL-30B-A3B-Thinking

Release on: Oct 11, 2025

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Output:

1.0

/ M Tokens

DeepSeek

chat

DeepSeek-R1

Release on: May 28, 2025

DeepSeek-R1-0528 is an upgraded model shows significant improvements in handling complex reasoning tasks，also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding. It achieves performance comparable to O3 and Gemini 2.5 Pro....

Total Context:

164K

Max output:

164K

Input:

0.5

/ M Tokens

Output:

2.18

/ M Tokens

DeepSeek

chat

DeepSeek-V3

Release on: Dec 26, 2024

DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects, including major boost in reasoning performance, stronger front-end development skills and smarter tool-use capabilities....

Total Context:

164K

Max output:

164K

Input:

0.25

/ M Tokens

Output:

1.0

/ M Tokens

Moonshot AI

chat

Kimi-K2-Instruct-0905

Release on: Sep 8, 2025

Kimi K2-Instruct-0905, a state-of-the-art mixture-of-experts (MoE) language model, is the latest, most capable version of Kimi K2. Key Features include enhanced coding capabilities, esp. front-end & tool-calling, context length extended to 256k tokens, and improved integration with various agent scaffolds....

Total Context:

262K

Max output:

262K

Input:

0.4

/ M Tokens

Output:

2.0

/ M Tokens

OpenAI

chat

gpt-oss-120b

Release on: Aug 13, 2025

The gpt-oss series is OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X)....

Total Context:

131K

Max output:

Input:

0.05

/ M Tokens

Output:

0.45

/ M Tokens

OpenAI

chat

gpt-oss-20b

Release on: Aug 13, 2025

The gpt-oss series is OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is for lower latency, and local or specialized use cases....

Total Context:

131K

Max output:

Input:

0.04

/ M Tokens

Output:

0.18

/ M Tokens

Z.ai

chat

GLM-4.6

Release on: Oct 4, 2025

Compared with GLM-4.5, GLM-4.6 brings several key improvements， including longer context window expanded to 200K tokens, superior coding performance, advanced reasoning, more capable agents, and refined writing....

Total Context:

205K

Max output:

205K

Input:

0.5

/ M Tokens

Output:

1.9

/ M Tokens

Z.ai

chat

GLM-4.5

Release on: Jul 28, 2025

The GLM-4.5 series models are foundation models designed for intelligent agents, by unifying reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, providing two modes: thinking and non-thinking....

Total Context:

131K

Max output:

131K

Input:

0.4

/ M Tokens

Output:

2.0

/ M Tokens

Z.ai

chat

GLM-4.5-Air

Release on: Jul 28, 2025

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. It’s also a hybrid reasoning model providing both thinking and non-thinking mode. ...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.86

/ M Tokens

Z.ai

chat

GLM-4.5V

Release on: Aug 13, 2025

As a part of the GLM-V family of models, GLM-4.5V is based on ZhipuAI’s foundation model GLM-4.5-Air, achieving SOTA performance on tasks such as image, video, and document understanding, as well as GUI agent operations....

Total Context:

66K

Max output:

66K

Input:

0.14

/ M Tokens

Output:

0.86

/ M Tokens

inclusionAI

chat

Ring-1T

Release on: Oct 14, 2025

Ring-1T is an open-source, trillion-parameter thinking model released by the Bailing team. Built upon the Ling 2.0 architecture and the Ling-1T-base foundation model, it features 1 trillion total parameters with 50 billion activated parameters and supports a context window of up to 131K tokens. The model's deep reasoning and natural language inference capabilities have been significantly enhanced through large-scale verifiable reward reinforcement learning (RLVR), combined with the self-developed icepop reinforcement learning stabilization method and the efficient ASystem RL framework. Ring-1T achieves leading open-source performance on challenging reasoning benchmarks, including math competitions (e.g., IMO 2025), code generation (e.g., ICPC World Finals 2025), and logical reasoning...

Total Context:

131K

Max output:

131K

Input:

0.57

/ M Tokens

Output:

2.28

/ M Tokens

inclusionAI

chat

Ling-1T

Release on: Oct 11, 2025

Ling-1T is the first flagship non-thinking model in the Ling 2.0 series, featuring 1 trillion total parameters with ≈ 50 billion active parameters per token. Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of efficient reasoning and scalable cognition. Pre-trained on 20 trillion+ high-quality, reasoning-dense tokens, Ling-1T-base supports up to 131K context length and adopts an evolutionary chain-of-thought (Evo-CoT) process across mid-training and post-training. This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve state-of-the-art performance on multiple complex reasoning benchmarks—balancing accuracy and efficiency...

Total Context:

131K

Max output:

131K

Input:

0.57

/ M Tokens

Output:

2.28

/ M Tokens

inclusionAI

chat

Ling-flash-2.0

Release on: Sep 18, 2025

Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

inclusionAI

chat

Ling-mini-2.0

Release on: Sep 10, 2025

Ling-mini-2.0 is a small yet high-performance large language model built on the MoE architecture. It has 16B total parameters, but only 1.4B are activated per token (non-embedding 789M), enabling extremely fast generation. Thanks to the efficient MoE design and large-scale high-quality training data, despite having only 1.4B activated parameters, Ling-mini-2.0 still delivers top-tier downstream task performance comparable to sub-10B dense LLMs and even larger MoE models...

Total Context:

131K

Max output:

131K

Input:

0.07

/ M Tokens

Output:

0.28

/ M Tokens

inclusionAI

chat

Ring-flash-2.0

Release on: Sep 29, 2025

Ring-flash-2.0 is a high-performance thinking model, deeply optimized based on Ling-flash-2.0-base. It is a Mixture-of-Experts (MoE) model with a total of 100B parameters, but only 6.1B are activated per inference. The model leverages the independently developed 'icepop' algorithm to address the training instability challenges in reinforcement learning (RL) for MoE LLMs, enabling continuous improvement of its complex reasoning capabilities throughout extended RL training cycles. Ring-flash-2.0 demonstrates significant breakthroughs across challenging benchmarks, including math competitions, code generation, and logical reasoning. Its performance surpasses that of SOTA dense models under 40B parameters and rivals larger open-weight MoE models and closed-source high-performance thinking model APIs. More surprisingly, although Ring-flash-2.0 is primarily designed for complex reasoning, it also shows strong capabilities in creative writing. Thanks to its efficient architecture, it achieves high-speed inference, significantly reducing inference costs for thinking models in high-concurrency scenarios...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Tencent

chat

Hunyuan-MT-7B

Release on: Sep 18, 2025

The Hunyuan Translation Model consists of a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. Hunyuan-MT-7B is a lightweight translation model with 7 billion parameters used to translate source text into the target language. The model supports mutual translation among 33 languages, including five ethnic minority languages in China. In the WMT25 machine translation competition, Hunyuan-MT-7B won first place in 30 out of the 31 language categories it participated in, demonstrating its outstanding translation capabilities. For translation tasks, Tencent Hunyuan proposed a comprehensive training framework covering pre-training, supervised fine-tuning, translation enhancement, and ensemble refinement, achieving state-of-the-art performance among models of a similar scale. The model is computationally efficient and easy to deploy, making it suitable for various application scenarios...

Total Context:

33K

Max output:

33K

Input:

0.0

/ M Tokens

Output:

0.0

/ M Tokens

Qwen

chat

Qwen3-Next-80B-A3B-Instruct

Release on: Sep 18, 2025

Qwen3-Next-80B-A3B-Instruct is a next-generation foundation model released by Alibaba's Qwen team. It is built on the new Qwen3-Next architecture, designed for ultimate training and inference efficiency. The model incorporates innovative features such as a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention), a High-Sparsity Mixture-of-Experts (MoE) structure, and various stability optimizations. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters per token during inference, which significantly reduces computational costs and delivers over 10 times higher throughput than the Qwen3-32B model for long-context tasks exceeding 32K tokens. This is an instruction-tuned version optimized for general-purpose tasks and does not support 'thinking' mode. In terms of performance, it is comparable to Qwen's flagship model, Qwen3-235B, on certain benchmarks, showing significant advantages in ultra-long-context scenarios...

Total Context:

262K

Max output:

262K

Input:

0.14

/ M Tokens

Output:

1.4

/ M Tokens

Qwen

chat

Qwen3-Next-80B-A3B-Thinking

Release on: Sep 25, 2025

Qwen3-Next-80B-A3B-Thinking is a next-generation foundation model from Alibaba's Qwen team, specifically designed for complex reasoning tasks. It is built on the innovative Qwen3-Next architecture, which combines a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention) with a High-Sparsity Mixture-of-Experts (MoE) structure to achieve ultimate training and inference efficiency. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters during inference, significantly reducing computational costs and delivering over 10 times higher throughput than the Qwen3-32B model on long-context tasks exceeding 32K tokens. This 'Thinking' version is optimized for demanding multi-step problems like mathematical proofs, code synthesis, logical analysis, and agentic planning, and it outputs structured 'thinking' traces by default. In terms of performance, it surpasses more costly models like Qwen3-32B-Thinking and has outperformed Gemini-2.5-Flash-Thinking on multiple benchmarks...

Total Context:

262K

Max output:

262K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Qwen

chat

Qwen3-Omni-30B-A3B-Captioner

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Captioner is a Vision-Language Model (VLM) from Alibaba's Qwen team, part of the Qwen3 series. It is specifically designed for generating high-quality, detailed, and accurate image captions. Based on a 30B total parameter Mixture of Experts (MoE) architecture, the model can deeply understand image content and translate it into rich, natural language text...

Total Context:

66K

Max output:

66K

Input:

0.1

/ M Tokens

Output:

0.4

/ M Tokens

Qwen

chat

Qwen3-Omni-30B-A3B-Instruct

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Instruct is a member of the latest Qwen3 series from Alibaba's Qwen team. It is a Mixture of Experts (MoE) model with 30 billion total parameters and 3 billion active parameters, which effectively reduces inference costs while maintaining powerful performance. The model was trained on high-quality, multi-source, and multilingual data, demonstrating excellent performance in basic capabilities such as multilingual dialogue, as well as in code, math...

Total Context:

66K

Max output:

66K

Input:

0.1

/ M Tokens

Output:

0.4

/ M Tokens

Qwen

chat

Qwen3-Omni-30B-A3B-Thinking

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Thinking is the core "Thinker" component within the Qwen3-Omni omni-modal model's "Thinker-Talker" architecture. It is specifically designed to process multimodal inputs, including text, audio, images, and video, and to execute complex chain-of-thought reasoning. As the reasoning brain of the system, this model unifies all inputs into a common representational space for understanding and analysis, but its output is text-only. This design allows it to excel at solving complex problems that require deep thought and cross-modal understanding, such as mathematical problems presented in images, making it key to the powerful cognitive abilities of the entire Qwen3-Omni architecture...

Total Context:

66K

Max output:

66K

Input:

0.1

/ M Tokens

Output:

0.4

/ M Tokens

Qwen

chat

Qwen3-Coder-480B-A35B-Instruct

Release on: Jul 31, 2025

Qwen3-Coder-480B-A35B-Instruct is the most agentic code model released by Alibaba to date. It is a Mixture-of-Experts (MoE) model with 480 billion total parameters and 35 billion activated parameters, balancing efficiency and performance. The model natively supports a 256K (approximately 262,144) token context length, which can be extended up to 1 million tokens using extrapolation methods like YaRN, enabling it to handle repository-scale codebases and complex programming tasks. Qwen3-Coder is specifically designed for agentic coding workflows, where it not only generates code but also autonomously interacts with developer tools and environments to solve complex problems. It has achieved state-of-the-art results among open models on various coding and agentic benchmarks, with performance comparable to leading models like Claude Sonnet 4. Alongside the model, Alibaba has also open-sourced Qwen Code, a command-line tool designed to fully unleash its powerful agentic coding capabilities...

Total Context:

262K

Max output:

262K

Input:

0.25

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

chat

Qwen3-Coder-30B-A3B-Instruct

Release on: Aug 1, 2025

Qwen3-Coder-30B-A3B-Instruct is a code model from the Qwen3 series developed by Alibaba's Qwen team. As a streamlined and optimized model, it maintains impressive performance and efficiency while focusing on enhanced coding capabilities. It demonstrates significant performance advantages among open-source models on complex tasks such as Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. The model natively supports a long context of 256K tokens, which can be extended up to 1M tokens, enabling better repository-scale understanding and processing. Furthermore, it provides robust agentic coding support for platforms like Qwen Code and CLINE, featuring a specially designed function call format...

Total Context:

262K

Max output:

262K

Input:

0.07

/ M Tokens

Output:

0.28

/ M Tokens

Qwen

chat

Qwen3-30B-A3B-Instruct-2507

Release on: Jul 30, 2025

Qwen3-30B-A3B-Instruct-2507 is the updated version of the Qwen3-30B-A3B non-thinking mode. It is a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion activated parameters. This version features key enhancements, including significant improvements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and offers markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Furthermore, its capabilities in long-context understanding have been enhanced to 256K. This model supports only non-thinking mode and does not generate `<think></think>` blocks in its output...

Total Context:

262K

Max output:

262K

Input:

0.09

/ M Tokens

Output:

0.3

/ M Tokens

Qwen

chat

Qwen3-30B-A3B-Thinking-2507

Release on: Jul 31, 2025

Qwen3-30B-A3B-Thinking-2507 is the latest thinking model in the Qwen3 series, released by Alibaba's Qwen team. As a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion active parameters, it is focused on enhancing capabilities for complex tasks. The model demonstrates significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise. It also shows markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences. The model natively supports a 256K long-context understanding capability, which can be extended to 1 million tokens. This version is specifically designed for ‘thinking mode’ to tackle highly complex problems through step-by-step reasoning and also excels in agentic capabilities...

Total Context:

262K

Max output:

131K

Input:

0.09

/ M Tokens

Output:

0.3

/ M Tokens

Qwen

chat

Qwen3-235B-A22B-Instruct-2507

Release on: Jul 23, 2025

Qwen3-235B-A22B-Instruct-2507 is a flagship Mixture-of-Experts (MoE) large language model from the Qwen3 series, developed by Alibaba Cloud's Qwen team. The model has a total of 235 billion parameters, with 22 billion activated per forward pass. It was released as an updated version of the Qwen3-235B-A22B non-thinking mode, featuring significant enhancements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. Additionally, the model provides substantial gains in long-tail knowledge coverage across multiple languages and shows markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Notably, it natively supports an extensive 256K (262,144 tokens) context window, which enhances its capabilities for long-context understanding. This version exclusively supports the non-thinking mode and does not generate <think> blocks, aiming to deliver more efficient and precise responses for tasks like direct Q&A and knowledge retrieval...

Total Context:

262K

Max output:

262K

Input:

0.09

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

chat

Qwen3-235B-A22B-Thinking-2507

Release on: Jul 28, 2025

Qwen3-235B-A22B-Thinking-2507 is a member of the Qwen3 large language model series developed by Alibaba's Qwen team, specializing in highly complex reasoning tasks. The model is built on a Mixture-of-Experts (MoE) architecture, with 235 billion total parameters and approximately 22 billion activated parameters per token, which enhances computational efficiency while maintaining powerful performance. As a dedicated 'thinking' model, it demonstrates significantly improved performance on tasks requiring human expertise, such as logical reasoning, mathematics, science, coding, and academic benchmarks, achieving state-of-the-art results among open-source thinking models. Furthermore, the model features enhanced general capabilities like instruction following, tool usage, and text generation, and it natively supports a 256K long-context understanding capability, making it ideal for scenarios that require deep reasoning and processing of long documents...

Total Context:

262K

Max output:

262K

Input:

0.13

/ M Tokens

Output:

0.6

/ M Tokens

StepFun

chat

step3

Release on: Aug 6, 2025

Step3 is a cutting-edge multimodal reasoning model from StepFun. It is built on a Mixture-of-Experts (MoE) architecture with 321B total parameters and 38B active parameters. The model is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision-language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators. During pretraining, Step3 processed over 20T text tokens and 4T image-text mixed tokens, spanning more than ten languages. The model has achieved state-of-the-art performance for open-source models on various benchmarks, including math, code, and multimodality...

Total Context:

66K

Max output:

66K

Input:

0.57

/ M Tokens

Output:

1.42

/ M Tokens

ByteDance

chat

Seed-OSS-36B-Instruct

Release on: Sep 4, 2025

Seed-OSS is a series of open-source large language models developed by the ByteDance Seed team, designed for powerful long-context processing, reasoning, agent capabilities, and general-purpose abilities. Within this series, Seed-OSS-36B-Instruct is an instruction-tuned model with 36 billion parameters that natively supports an ultra-long context length, enabling it to process massive documents or complex codebases in a single pass. The model is specially optimized for reasoning, code generation, and agent tasks (such as tool use), while maintaining balanced and excellent general-purpose capabilities. A key feature of this model is the ‘Thinking Budget’ function, which allows users to flexibly adjust the reasoning length as needed, thereby effectively improving inference efficiency in practical applications...

Total Context:

262K

Max output:

262K

Input:

0.21

/ M Tokens

Output:

0.57

/ M Tokens

Z.ai

chat

GLM-4.1V-9B-Thinking

Release on: Jul 4, 2025

GLM-4.1V-9B-Thinking is an open-source Vision-Language Model (VLM) jointly released by Zhipu AI and Tsinghua University's KEG lab, designed to advance general-purpose multimodal reasoning. Built upon the GLM-4-9B-0414 foundation model, it introduces a 'thinking paradigm' and leverages Reinforcement Learning with Curriculum Sampling (RLCS) to significantly enhance its capabilities in complex tasks. As a 9B-parameter model, it achieves state-of-the-art performance among models of a similar size, and its performance is comparable to or even surpasses the much larger 72B-parameter Qwen-2.5-VL-72B on 18 different benchmarks. The model excels in a diverse range of tasks, including STEM problem-solving, video understanding, and long document understanding, and it can handle images with resolutions up to 4K and arbitrary aspect ratios...

Total Context:

66K

Max output:

66K

Input:

0.035

/ M Tokens

Output:

0.14

/ M Tokens

BAIDU

chat

ERNIE-4.5-300B-A47B

Release on: Jul 2, 2025

ERNIE-4.5-300B-A47B is a large language model developed by Baidu based on a Mixture-of-Experts (MoE) architecture. The model has a total of 300 billion parameters, but only activates 47 billion parameters per token during inference, thus balancing powerful performance with computational efficiency. As one of the core models in the ERNIE 4.5 series, it is trained on the PaddlePaddle deep learning framework and demonstrates outstanding capabilities in tasks such as text understanding, generation, reasoning, and coding. The model utilizes an innovative multimodal heterogeneous MoE pre-training method, which effectively enhances its overall abilities through joint training on text and visual modalities, showing prominent results in instruction following and world knowledge memorization. Baidu has open-sourced this model along with others in the series to promote the research and application of AI technology...

Total Context:

131K

Max output:

131K

Input:

0.28

/ M Tokens

Output:

1.1

/ M Tokens

Tencent

chat

Hunyuan-A13B-Instruct

Release on: Jun 30, 2025

Hunyuan-A13B-Instruct activates only 13 B of its 80 B parameters, yet matches much larger LLMs on mainstream benchmarks. It offers hybrid reasoning: low-latency “fast” mode or high-precision “slow” mode, switchable per call. Native 256 K-token context lets it digest book-length documents without degradation. Agent skills are tuned for BFCL-v3, τ-Bench and C3-Bench leadership, making it an excellent autonomous assistant backbone. Grouped Query Attention plus multi-format quantization delivers memory-light, GPU-efficient inference for real-world deployment, with built-in multilingual support and robust safety alignment for enterprise-grade applications....

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Moonshot AI

chat

Kimi-K2-Instruct

Release on: Jul 13, 2025

Kimi K2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models...

Total Context:

131K

Max output:

131K

Input:

0.58

/ M Tokens

Output:

2.29

/ M Tokens

Moonshot AI

chat

Kimi-Dev-72B

Release on: Jun 19, 2025

Kimi-Dev-72B is a new open-source coding large language model achieving 60.4% on SWE-bench Verified, setting a state-of-the-art result among open-source models. Optimized through large-scale reinforcement learning, it autonomously patches real codebases in Docker and earns rewards only when full test suites pass. This ensures the model delivers correct, robust, and practical solutions aligned with real-world software engineering standards...

Total Context:

131K

Max output:

131K

Input:

0.29

/ M Tokens

Output:

1.15

/ M Tokens

MiniMaxAI

chat

MiniMax-M1-80k

Release on: Jun 17, 2025

MiniMax-M1 is a open-weight, large-scale hybrid-attention reasoning model with 456 B parameters and 45.9 B activated per token. It natively supports 1 M-token context, lightning attention enabling 75% FLOPs savings vs DeepSeek R1 at 100 K tokens, and leverages a MoE architecture. Efficient RL training with CISPO and hybrid design yields state-of-the-art performance on long-input reasoning and real-world software engineering tasks....

Total Context:

131K

Max output:

131K

Input:

0.55

/ M Tokens

Output:

2.2

/ M Tokens

Qwen

chat

Qwen3-30B-A3B

Release on: Apr 30, 2025

Qwen3-30B-A3B is the latest large language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 30.5B total parameters and 3.3B activated parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, superior human preference alignment in creative writing, role-playing, and multi-turn dialogues. The model excels in agent capabilities for precise integration with external tools and supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

0.09

/ M Tokens

Output:

0.45

/ M Tokens

Qwen

chat

Qwen3-32B

Release on: Apr 30, 2025

Qwen3-32B is the latest large language model in the Qwen series with 32.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Qwen

chat

Qwen3-14B

Release on: Apr 30, 2025

Qwen3-14B is the latest large language model in the Qwen series with 14.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

0.07

/ M Tokens

Output:

0.28

/ M Tokens

Qwen

chat

Qwen3-8B

Release on: Apr 30, 2025

Qwen3-8B is the latest large language model in the Qwen series with 8.2B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

0.06

/ M Tokens

Output:

0.06

/ M Tokens

Qwen

reranker

Qwen3-Reranker-8B

Release on: Jun 6, 2025

Qwen3-Reranker-8B is the 8-billion parameter text reranking model from the Qwen3 series. It is designed to refine and improve the quality of search results by accurately re-ordering documents based on their relevance to a query. Built on the powerful Qwen3 foundational models, it excels in understanding long-text with a 32k context length and supports over 100 languages. The Qwen3-Reranker-8B model is part of a flexible series that offers state-of-the-art performance in various text and code retrieval scenarios...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Qwen

embedding

Qwen3-Embedding-8B

Release on: Jun 6, 2025

Qwen3-Embedding-8B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 8B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 4096. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It ranks No.1 on the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58) and demonstrates state-of-the-art performance across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 4096) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Qwen

reranker

Qwen3-Reranker-4B

Release on: Jun 6, 2025

Qwen3-Reranker-4B is a powerful text reranking model from the Qwen3 series, featuring 4 billion parameters. It is engineered to significantly improve the relevance of search results by re-ordering an initial list of documents based on a query. This model inherits the core strengths of its Qwen3 foundation, including exceptional understanding of long-text (up to 32k context length) and robust capabilities across more than 100 languages. According to benchmarks, the Qwen3-Reranker-4B model demonstrates superior performance in various text and code retrieval evaluations...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Qwen

embedding

Qwen3-Embedding-4B

Release on: Jun 6, 2025

Qwen3-Embedding-4B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 4B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 2560. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves excellent performance on the MTEB multilingual leaderboard (score 69.45) and demonstrates outstanding results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 2560) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, providing an optimal balance between efficiency and effectiveness...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Qwen

reranker

Qwen3-Reranker-0.6B

Release on: Jun 6, 2025

Qwen3-Reranker-0.6B is a text reranking model from the Qwen3 series. It is specifically designed to refine the results from initial retrieval systems by re-ordering documents based on their relevance to a given query. With 0.6 billion parameters and a context length of 32k, this model leverages the strong multilingual (supporting over 100 languages), long-text understanding, and reasoning capabilities of its Qwen3 foundation. Evaluation results show that Qwen3-Reranker-0.6B achieves strong performance across various text retrieval benchmarks, including MTEB-R, CMTEB-R, and MLDR...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Qwen

embedding

Qwen3-Embedding-0.6B

Release on: Jun 6, 2025

Qwen3-Embedding-0.6B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 0.6B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 1024. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves strong performance on the MTEB multilingual leaderboard (score 64.33) and demonstrates excellent results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 1024) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, making it an ideal choice for applications prioritizing both efficiency and effectiveness...

Total Context:

33K

Max output:

Input:

/ M Tokens

Output:

/ M Tokens

Z.ai

chat

GLM-Z1-32B-0414

Release on: Apr 18, 2025

GLM-Z1-32B-0414 is a reasoning model with deep thinking capabilities. This model was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, the team also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities. Despite having only 32B parameters, its performance on certain tasks is comparable to DeepSeek-R1 with 671B parameters. Through evaluations on benchmarks such as AIME 24/25, LiveCodeBench, and GPQA, the model demonstrates strong mathematical reasoning abilities and can support solutions for a wider range of complex tasks...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Z.ai

chat

GLM-4-32B-0414

Release on: Apr 18, 2025

GLM-4-32B-0414 is a new generation model in the GLM family with 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, the team enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. On several benchmarks, its performance approaches or even exceeds that of larger models like GPT-4o and DeepSeek-V3-0324 (671B)...

Total Context:

33K

Max output:

33K

Input:

0.27

/ M Tokens

Output:

0.27

/ M Tokens

Z.ai

chat

GLM-Z1-9B-0414

Release on: Apr 18, 2025

GLM-Z1-9B-0414 is a small-sized model in the GLM series with only 9 billion parameters that maintains the open-source tradition while showcasing surprising capabilities. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent performance in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. The research team employed the same series of techniques used for larger models to train this 9B model. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment. The model features deep thinking capabilities and can handle long contexts through YaRN technology, making it particularly suitable for applications requiring mathematical reasoning abilities with limited computational resources...

Total Context:

131K

Max output:

131K

Input:

0.086

/ M Tokens

Output:

0.086

/ M Tokens

Z.ai

chat

GLM-4-9B-0414

Release on: Apr 18, 2025

GLM-4-9B-0414 is a small-sized model in the GLM series with 9 billion parameters. This model inherits the technical characteristics of the GLM-4-32B series but offers a more lightweight deployment option. Despite its smaller scale, GLM-4-9B-0414 still demonstrates excellent capabilities in code generation, web design, SVG graphics generation, and search-based writing tasks. The model also supports function calling features, allowing it to invoke external tools to extend its range of capabilities. The model shows a good balance between efficiency and effectiveness in resource-constrained scenarios, providing a powerful option for users who need to deploy AI models under limited computational resources. Like other models in the same series, GLM-4-9B-0414 also demonstrates competitive performance in various benchmark tests...

Total Context:

33K

Max output:

33K

Input:

0.086

/ M Tokens

Output:

0.086

/ M Tokens

Qwen

chat

Qwen2.5-VL-32B-Instruct

Release on: Mar 24, 2025

Qwen2.5-VL-32B-Instruct is a multimodal large language model released by the Qwen team, part of the Qwen2.5-VL series. This model is not only proficient in recognizing common objects but is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. It acts as a visual agent that can reason and dynamically direct tools, capable of computer and phone use. Additionally, the model can accurately localize objects in images, and generate structured outputs for data like invoices and tables. Compared to its predecessor Qwen2-VL, this version has enhanced mathematical and problem-solving abilities through reinforcement learning, with response styles adjusted to better align with human preferences...

Total Context:

131K

Max output:

131K

Input:

0.27

/ M Tokens

Output:

0.27

/ M Tokens

Qwen

chat

Qwen3-235B-A22B

Release on: Apr 30, 2025

Qwen3-235B-A22B is the latest large language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 235B total parameters and 22B activated parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, superior human preference alignment in creative writing, role-playing, and multi-turn dialogues. The model excels in agent capabilities for precise integration with external tools and supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

0.35

/ M Tokens

Output:

1.42

/ M Tokens

Qwen

chat

QwQ-32B

Release on: Mar 6, 2025

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. The model incorporates technologies like RoPE, SwiGLU, RMSNorm, and Attention QKV bias, with 64 layers and 40 Q attention heads (8 for KV in GQA architecture)...

Total Context:

131K

Max output:

131K

Input:

0.15

/ M Tokens

Output:

0.58

/ M Tokens

Qwen

chat

Qwen2.5-VL-72B-Instruct

Release on: Jan 28, 2025

Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks...

Total Context:

131K

Max output:

Input:

0.59

/ M Tokens

Output:

0.59

/ M Tokens

Qwen

chat

Qwen2.5-VL-7B-Instruct

Release on: Jan 28, 2025

Qwen2.5-VL is a new member of the Qwen series, equipped with powerful visual comprehension capabilities. It can analyze text, charts, and layouts within images, understand long videos, and capture events. It is capable of reasoning, manipulating tools, supporting multi-format object localization, and generating structured outputs. The model has been optimized for dynamic resolution and frame rate training in video understanding, and has improved the efficiency of the visual encoder....

Total Context:

33K

Max output:

Input:

0.05

/ M Tokens

Output:

0.05

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-32B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...

Total Context:

131K

Max output:

131K

Input:

0.18

/ M Tokens

Output:

0.18

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-14B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-14B is a distilled model based on Qwen2.5-14B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 93.9% accuracy on MATH-500, 69.7% pass rate on AIME 2024, and a rating of 1481 on CodeForces, showcasing its powerful abilities in mathematics and programming tasks...

Total Context:

131K

Max output:

131K

Input:

0.1

/ M Tokens

Output:

0.1

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-7B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-7B is a distilled model based on Qwen2.5-Math-7B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 92.8% accuracy on MATH-500, 55.5% pass rate on AIME 2024, and a rating of 1189 on CodeForces, showing remarkable mathematical and programming abilities for a 7B-scale model...

Total Context:

33K

Max output:

16K

Input:

0.05

/ M Tokens

Output:

0.05

/ M Tokens

Qwen

chat

Qwen2.5-Coder-32B-Instruct

Release on: Nov 11, 2024

Qwen2.5-Coder-32B-Instruct is a code-specific large language model developed based on Qwen2.5. The model has undergone training on 5.5 trillion tokens, achieving significant improvements in code generation, code reasoning, and code repair. It is currently the most advanced open-source code language model, with coding capabilities comparable to GPT-4. Not only has the model enhanced coding abilities, but it also maintains strengths in mathematics and general capabilities, and supports long text processing....

Total Context:

33K

Max output:

Input:

0.18

/ M Tokens

Output:

0.18

/ M Tokens

Qwen

chat

Qwen2.5-72B-Instruct-128K

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 72B model demonstrates significant improvements in areas such as coding and mathematics. It supports a context length of up to 128K tokens. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. It has shown notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

131K

Max output:

Input:

0.59

/ M Tokens

Output:

0.59

/ M Tokens

DeepSeek

chat

deepseek-vl2

Release on: Dec 13, 2024

DeepSeek-VL2 is a mixed-expert (MoE) vision-language model developed based on DeepSeekMoE-27B, employing a sparse-activated MoE architecture to achieve superior performance with only 4.5B active parameters. The model excels in various tasks including visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Compared to existing open-source dense models and MoE-based models, it demonstrates competitive or state-of-the-art performance using the same or fewer active parameters....

Total Context:

Max output:

Input:

0.15

/ M Tokens

Output:

0.15

/ M Tokens

Qwen

chat

Qwen2.5-72B-Instruct

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

Input:

0.59

/ M Tokens

Output:

0.59

/ M Tokens

Qwen

chat

Qwen2.5-32B-Instruct

Release on: Sep 19, 2024

Qwen2.5-32B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 32B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese, English, and others. It shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

Input:

0.18

/ M Tokens

Output:

0.18

/ M Tokens

Qwen

chat

Qwen2.5-14B-Instruct

Release on: Sep 18, 2024

Qwen2.5-14B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 14B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese and English. It has shown notable advancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

Input:

0.1

/ M Tokens

Output:

0.1

/ M Tokens

Qwen

chat

Qwen2.5-7B-Instruct

Release on: Sep 18, 2024

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....

Total Context:

33K

Max output:

Input:

0.05

/ M Tokens

Output:

0.05

/ M Tokens

Meta Llama

chat

Meta-Llama-3.1-8B-Instruct

Release on: Apr 23, 2025

Meta Llama 3.1 is a family of multilingual large language models developed by Meta, featuring pretrained and instruction-tuned variants in 8B, 70B, and 405B parameter sizes. This 8B instruction-tuned model is optimized for multilingual dialogue use cases and outperforms many available open-source and closed chat models on common industry benchmarks. The model was trained on over 15 trillion tokens of publicly available data, using techniques like supervised fine-tuning and reinforcement learning with human feedback to enhance helpfulness and safety. Llama 3.1 supports text and code generation, with a knowledge cutoff of December 2023...

Total Context:

33K

Max output:

Input:

0.06

/ M Tokens

Output:

0.06

/ M Tokens

Ready to accelerate your AI development?