
inclusionAI
Text Generation
Ring-flash-2.0
Release on: Sep 29, 2025
Ring-flash-2.0 is a high-performance thinking model, deeply optimized based on Ling-flash-2.0-base. It is a Mixture-of-Experts (MoE) model with a total of 100B parameters, but only 6.1B are activated per inference. The model leverages the independently developed 'icepop' algorithm to address the training instability challenges in reinforcement learning (RL) for MoE LLMs, enabling continuous improvement of its complex reasoning capabilities throughout extended RL training cycles. Ring-flash-2.0 demonstrates significant breakthroughs across challenging benchmarks, including math competitions, code generation, and logical reasoning. Its performance surpasses that of SOTA dense models under 40B parameters and rivals larger open-weight MoE models and closed-source high-performance thinking model APIs. More surprisingly, although Ring-flash-2.0 is primarily designed for complex reasoning, it also shows strong capabilities in creative writing. Thanks to its efficient architecture, it achieves high-speed inference, significantly reducing inference costs for thinking models in high-concurrency scenarios...
Total Context:
131K
Max output:
131K
Input:
$
0.14
/ M Tokens
Output:
$
0.57
/ M Tokens

inclusionAI
Text Generation
Ling-flash-2.0
Release on: Sep 18, 2025
Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods...
Total Context:
131K
Max output:
131K
Input:
$
0.14
/ M Tokens
Output:
$
0.57
/ M Tokens

inclusionAI
Text Generation
Ling-mini-2.0
Release on: Sep 10, 2025
Ling-mini-2.0 is a small yet high-performance large language model built on the MoE architecture. It has 16B total parameters, but only 1.4B are activated per token (non-embedding 789M), enabling extremely fast generation. Thanks to the efficient MoE design and large-scale high-quality training data, despite having only 1.4B activated parameters, Ling-mini-2.0 still delivers top-tier downstream task performance comparable to sub-10B dense LLMs and even larger MoE models...
Total Context:
131K
Max output:
131K
Input:
$
0.07
/ M Tokens
Output:
$
0.28
/ M Tokens

