Ring-flash-2.0 API, Deployment, Pricing

inclusionAI/Ring-flash-2.0

Ring-flash-2.0 is a high-performance thinking model, deeply optimized based on Ling-flash-2.0-base. It is a Mixture-of-Experts (MoE) model with a total of 100B parameters, but only 6.1B are activated per inference. The model leverages the independently developed 'icepop' algorithm to address the training instability challenges in reinforcement learning (RL) for MoE LLMs, enabling continuous improvement of its complex reasoning capabilities throughout extended RL training cycles. Ring-flash-2.0 demonstrates significant breakthroughs across challenging benchmarks, including math competitions, code generation, and logical reasoning. Its performance surpasses that of SOTA dense models under 40B parameters and rivals larger open-weight MoE models and closed-source high-performance thinking model APIs. More surprisingly, although Ring-flash-2.0 is primarily designed for complex reasoning, it also shows strong capabilities in creative writing. Thanks to its efficient architecture, it achieves high-speed inference, significantly reducing inference costs for thinking models in high-concurrency scenarios

Details

Model Provider

inclusionAI

Type

text

Sub Type

chat

Size

text

Publish Time

Sep 25, 2025

Input Price

$

0.14

/ M Tokens

Output Price

$

0.57

/ M Tokens

Context length

131K

Tags

MoE,235B,128K

Compare with Other Models

See how this model stacks up against others.

Qwen

chat

Qwen2.5-72B-Instruct

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

chat

Qwen2.5-7B-Instruct

Release on: Sep 18, 2024

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....

Total Context:

33K

Max output:

4K

Input:

$

0.05

/ M Tokens

Output:

$

0.05

/ M Tokens

Qwen

chat

Qwen2.5-14B-Instruct

Release on: Sep 18, 2024

Qwen2.5-14B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 14B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese and English. It has shown notable advancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.1

/ M Tokens

Output:

$

0.1

/ M Tokens

Qwen

chat

Qwen2.5-32B-Instruct

Release on: Sep 19, 2024

Qwen2.5-32B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 32B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese, English, and others. It shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

Qwen

chat

Qwen2.5-72B-Instruct-128K

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 72B model demonstrates significant improvements in areas such as coding and mathematics. It supports a context length of up to 128K tokens. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. It has shown notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

131K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

chat

Qwen2.5-Coder-32B-Instruct

Release on: Nov 11, 2024

Qwen2.5-Coder-32B-Instruct is a code-specific large language model developed based on Qwen2.5. The model has undergone training on 5.5 trillion tokens, achieving significant improvements in code generation, code reasoning, and code repair. It is currently the most advanced open-source code language model, with coding capabilities comparable to GPT-4. Not only has the model enhanced coding abilities, but it also maintains strengths in mathematics and general capabilities, and supports long text processing....

Total Context:

33K

Max output:

4K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

DeepSeek

chat

DeepSeek-VL2

Release on: Dec 13, 2024

DeepSeek-VL2 is a mixed-expert (MoE) vision-language model developed based on DeepSeekMoE-27B, employing a sparse-activated MoE architecture to achieve superior performance with only 4.5B active parameters. The model excels in various tasks including visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Compared to existing open-source dense models and MoE-based models, it demonstrates competitive or state-of-the-art performance using the same or fewer active parameters....

Total Context:

4K

Max output:

4K

Input:

$

0.15

/ M Tokens

Output:

$

0.15

/ M Tokens

Qwen

chat

Qwen2.5-VL-72B-Instruct

Release on: Jan 28, 2025

Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks...

Total Context:

131K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

DeepSeek

chat

DeepSeek-V3

Release on: Dec 26, 2024

The new version of DeepSeek-V3 (DeepSeek-V3-0324) utilizes the same base model as the previous DeepSeek-V3-1226, with improvements made only to the post-training methods. The new V3 model incorporates reinforcement learning techniques from the training process of the DeepSeek-R1 model, significantly enhancing its performance on reasoning tasks. It has achieved scores surpassing GPT-4.5 on evaluation sets related to mathematics and coding. Additionally, the model has seen notable improvements in tool invocation, role-playing, and casual conversation capabilities....

Total Context:

164K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

1.13

/ M Tokens

DeepSeek

chat

DeepSeek-R1

Release on: May 28, 2025

DeepSeek-R1-0528 is a reasoning model powered by reinforcement learning (RL) that addresses the issues of repetition and readability. Prior to RL, DeepSeek-R1 incorporated cold-start data to further optimize its reasoning performance. It achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks, and through carefully designed training methods, it has enhanced overall effectiveness...

Total Context:

164K

Max output:

164K

Input:

$

0.5

/ M Tokens

Output:

$

2.18

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-14B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-14B is a distilled model based on Qwen2.5-14B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates strong reasoning capabilities. It achieved impressive results across various benchmarks, including 93.9% accuracy on MATH-500, 69.7% pass rate on AIME 2024, and a rating of 1481 on CodeForces, showcasing its powerful abilities in mathematics and programming tasks...

Total Context:

131K

Max output:

131K

Input:

$

0.1

/ M Tokens

Output:

$

0.1

/ M Tokens

DeepSeek

chat

DeepSeek-R1-Distill-Qwen-32B

Release on: Jan 20, 2025

DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

Model FAQs: Usage, Deployment

Learn how to use, fine-tune, and deploy this model with ease.

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.