DeepSeek
Text Generation
DeepSeek-V3.2
Release on: Dec 4, 2025
DeepSeek-V3.2 is a model that harmonizes high computational efficiency with superior reasoning and agent performance. Its approach is built upon three key technical breakthroughs: DeepSeek Sparse Attention (DSA), an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios; a Scalable Reinforcement Learning Framework, which enables performance comparable to GPT-5 and reasoning proficiency on par with Gemini-3.0-Pro in its high-compute variant; and a Large-Scale Agentic Task Synthesis Pipeline to integrate reasoning into tool-use scenarios, improving compliance and generalization in complex interactive environments. The model has achieved gold-medal performance in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI)...
Total Context:
164K
Max output:
164K
Input:
$
0.27
/ M Tokens
Output:
$
0.42
/ M Tokens
DeepSeek
Text Generation
DeepSeek-V3.2-Exp
Release on: Oct 10, 2025
DeepSeek-V3.2-Exp is an experimental version of DeepSeek model, built on V3.1-Terminus. It debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context....
Total Context:
164K
Max output:
164K
Input:
$
0.27
/ M Tokens
Output:
$
0.41
/ M Tokens
DeepSeek
Text Generation
DeepSeek-V3.1-Terminus
Release on: Sep 29, 2025
DeepSeek-V3.1-Terminus is an updated version built on V3.1’s strengths while addressing key user feedback. It improves in language consistency, reducing instances of mixed Chinese-English text and occasional abnormal characters. And also upgrades in stronger Code Agent & Search Agent performance....
Total Context:
164K
Max output:
164K
Input:
$
0.27
/ M Tokens
Output:
$
1.0
/ M Tokens
DeepSeek
Text Generation
DeepSeek-V3.1
Release on: Aug 25, 2025
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly....
Total Context:
164K
Max output:
164K
Input:
$
0.27
/ M Tokens
Output:
$
1.0
/ M Tokens
DeepSeek
Text Generation
DeepSeek-V3
Release on: Dec 26, 2024
DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects, including major boost in reasoning performance, stronger front-end development skills and smarter tool-use capabilities....
Total Context:
164K
Max output:
164K
Input:
$
0.25
/ M Tokens
Output:
$
1.0
/ M Tokens
DeepSeek
Text Generation
DeepSeek-R1
Release on: May 28, 2025
DeepSeek-R1-0528 is an upgraded model shows significant improvements in handling complex reasoning tasks,also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding. It achieves performance comparable to O3 and Gemini 2.5 Pro....
Total Context:
164K
Max output:
164K
Input:
$
0.5
/ M Tokens
Output:
$
2.18
/ M Tokens
DeepSeek
Text Generation
DeepSeek-R1-Distill-Qwen-32B
Release on: Jan 20, 2025
DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...
Total Context:
131K
Max output:
131K
Input:
$
0.18
/ M Tokens
Output:
$
0.18
/ M Tokens
DeepSeek
Text Generation
DeepSeek-R1-Distill-Qwen-14B
Release on: Jan 20, 2025
DeepSeek-R1-Distill-Qwen-14B is a distilled model based on Qwen2.5-14B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 93.9% accuracy on MATH-500, 69.7% pass rate on AIME 2024, and a rating of 1481 on CodeForces, showcasing its powerful abilities in mathematics and programming tasks...
Total Context:
131K
Max output:
131K
Input:
$
0.1
/ M Tokens
Output:
$
0.1
/ M Tokens
DeepSeek
Text Generation
DeepSeek-R1-Distill-Qwen-7B
Release on: Jan 20, 2025
DeepSeek-R1-Distill-Qwen-7B is a distilled model based on Qwen2.5-Math-7B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 92.8% accuracy on MATH-500, 55.5% pass rate on AIME 2024, and a rating of 1189 on CodeForces, showing remarkable mathematical and programming abilities for a 7B-scale model...
Total Context:
33K
Max output:
16K
Input:
$
0.05
/ M Tokens
Output:
$
0.05
/ M Tokens
DeepSeek
Text Generation
deepseek-vl2
Release on: Dec 13, 2024
DeepSeek-VL2 is a mixed-expert (MoE) vision-language model developed based on DeepSeekMoE-27B, employing a sparse-activated MoE architecture to achieve superior performance with only 4.5B active parameters. The model excels in various tasks including visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Compared to existing open-source dense models and MoE-based models, it demonstrates competitive or state-of-the-art performance using the same or fewer active parameters....
Total Context:
4K
Max output:
4K
Input:
$
0.15
/ M Tokens
Output:
$
0.15
/ M Tokens

