
Qwen3-Next-80B-A3B-Thinking API, Deployment, Pricing
Qwen/Qwen3-Next-80B-A3B-Thinking
Qwen3-Next-80B-A3B-Thinking is a next-generation foundation model from Alibaba's Qwen team, specifically designed for complex reasoning tasks. It is built on the innovative Qwen3-Next architecture, which combines a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention) with a High-Sparsity Mixture-of-Experts (MoE) structure to achieve ultimate training and inference efficiency. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters during inference, significantly reducing computational costs and delivering over 10 times higher throughput than the Qwen3-32B model on long-context tasks exceeding 32K tokens. This 'Thinking' version is optimized for demanding multi-step problems like mathematical proofs, code synthesis, logical analysis, and agentic planning, and it outputs structured 'thinking' traces by default. In terms of performance, it surpasses more costly models like Qwen3-32B-Thinking and has outperformed Gemini-2.5-Flash-Thinking on multiple benchmarks
API Usage
Details
Model Provider
Qwen
Type
text
Sub Type
chat
Size
text
Publish Time
Sep 25, 2025
Input Price
$
0.14
/ M Tokens
Output Price
$
0.57
/ M Tokens
Context length
262K
Tags
MoE,235B,128K
Compare with Other Models
See how this model stacks up against others.

Qwen
chat
Qwen2.5-72B-Instruct
Release on: Sep 18, 2024
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....
Total Context:
33K
Max output:
4K
Input:
$
0.59
/ M Tokens
Output:
$
0.59
/ M Tokens

Qwen
chat
Qwen2.5-7B-Instruct
Release on: Sep 18, 2024
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....
Total Context:
33K
Max output:
4K
Input:
$
0.05
/ M Tokens
Output:
$
0.05
/ M Tokens

Qwen
chat
Qwen2.5-14B-Instruct
Release on: Sep 18, 2024
Qwen2.5-14B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 14B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese and English. It has shown notable advancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....
Total Context:
33K
Max output:
4K
Input:
$
0.1
/ M Tokens
Output:
$
0.1
/ M Tokens

Qwen
chat
Qwen2.5-32B-Instruct
Release on: Sep 19, 2024
Qwen2.5-32B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 32B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese, English, and others. It shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....
Total Context:
33K
Max output:
4K
Input:
$
0.18
/ M Tokens
Output:
$
0.18
/ M Tokens

Qwen
chat
Qwen2.5-72B-Instruct-128K
Release on: Sep 18, 2024
Qwen2.5-72B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 72B model demonstrates significant improvements in areas such as coding and mathematics. It supports a context length of up to 128K tokens. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. It has shown notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....
Total Context:
131K
Max output:
4K
Input:
$
0.59
/ M Tokens
Output:
$
0.59
/ M Tokens

Qwen
chat
Qwen2.5-Coder-32B-Instruct
Release on: Nov 11, 2024
Qwen2.5-Coder-32B-Instruct is a code-specific large language model developed based on Qwen2.5. The model has undergone training on 5.5 trillion tokens, achieving significant improvements in code generation, code reasoning, and code repair. It is currently the most advanced open-source code language model, with coding capabilities comparable to GPT-4. Not only has the model enhanced coding abilities, but it also maintains strengths in mathematics and general capabilities, and supports long text processing....
Total Context:
33K
Max output:
4K
Input:
$
0.18
/ M Tokens
Output:
$
0.18
/ M Tokens
DeepSeek
chat
DeepSeek-VL2
Release on: Dec 13, 2024
DeepSeek-VL2 is a mixed-expert (MoE) vision-language model developed based on DeepSeekMoE-27B, employing a sparse-activated MoE architecture to achieve superior performance with only 4.5B active parameters. The model excels in various tasks including visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Compared to existing open-source dense models and MoE-based models, it demonstrates competitive or state-of-the-art performance using the same or fewer active parameters....
Total Context:
4K
Max output:
4K
Input:
$
0.15
/ M Tokens
Output:
$
0.15
/ M Tokens

Qwen
chat
Qwen2.5-VL-72B-Instruct
Release on: Jan 28, 2025
Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks...
Total Context:
131K
Max output:
4K
Input:
$
0.59
/ M Tokens
Output:
$
0.59
/ M Tokens
DeepSeek
chat
DeepSeek-V3
Release on: Dec 26, 2024
The new version of DeepSeek-V3 (DeepSeek-V3-0324) utilizes the same base model as the previous DeepSeek-V3-1226, with improvements made only to the post-training methods. The new V3 model incorporates reinforcement learning techniques from the training process of the DeepSeek-R1 model, significantly enhancing its performance on reasoning tasks. It has achieved scores surpassing GPT-4.5 on evaluation sets related to mathematics and coding. Additionally, the model has seen notable improvements in tool invocation, role-playing, and casual conversation capabilities....
Total Context:
164K
Max output:
164K
Input:
$
0.27
/ M Tokens
Output:
$
1.13
/ M Tokens
DeepSeek
chat
DeepSeek-R1
Release on: May 28, 2025
DeepSeek-R1-0528 is a reasoning model powered by reinforcement learning (RL) that addresses the issues of repetition and readability. Prior to RL, DeepSeek-R1 incorporated cold-start data to further optimize its reasoning performance. It achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks, and through carefully designed training methods, it has enhanced overall effectiveness...
Total Context:
164K
Max output:
164K
Input:
$
0.5
/ M Tokens
Output:
$
2.18
/ M Tokens
DeepSeek
chat
DeepSeek-R1-Distill-Qwen-14B
Release on: Jan 20, 2025
DeepSeek-R1-Distill-Qwen-14B is a distilled model based on Qwen2.5-14B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 93.9% accuracy on MATH-500, 69.7% pass rate on AIME 2024, and a rating of 1481 on CodeForces, showcasing its powerful abilities in mathematics and programming tasks...
Total Context:
131K
Max output:
131K
Input:
$
0.1
/ M Tokens
Output:
$
0.1
/ M Tokens
DeepSeek
chat
DeepSeek-R1-Distill-Qwen-32B
Release on: Jan 20, 2025
DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...
Total Context:
131K
Max output:
131K
Input:
$
0.18
/ M Tokens
Output:
$
0.18
/ M Tokens
Model FAQs: Usage, Deployment
Learn how to use, fine-tune, and deploy this model with ease.