🎉 MiniMax-M2.5 is available on SiliconFlow. Try it NOW.

Models

Products

Pricing

Docs

Blog

About

Contact

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

All

Featured

LLM

Vision

Image

Video

Audio

Serverless

inclusionAI

inclusionAI

Text Generation

Ling-flash-2.0

Release on: Sep 18, 2025

Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

inclusionAI

Text Generation

Ling-mini-2.0

Release on: Sep 10, 2025

Ling-mini-2.0 is a small yet high-performance large language model built on the MoE architecture. It has 16B total parameters, but only 1.4B are activated per token (non-embedding 789M), enabling extremely fast generation. Thanks to the efficient MoE design and large-scale high-quality training data, despite having only 1.4B activated parameters, Ling-mini-2.0 still delivers top-tier downstream task performance comparable to sub-10B dense LLMs and even larger MoE models...

Total Context:

131K

Max output:

131K

Input:

0.07

/ M Tokens

Output:

0.28

/ M Tokens

inclusionAI

Text Generation

Ring-flash-2.0

Release on: Sep 29, 2025

Ring-flash-2.0 is a high-performance thinking model, deeply optimized based on Ling-flash-2.0-base. It is a Mixture-of-Experts (MoE) model with a total of 100B parameters, but only 6.1B are activated per inference. The model leverages the independently developed 'icepop' algorithm to address the training instability challenges in reinforcement learning (RL) for MoE LLMs, enabling continuous improvement of its complex reasoning capabilities throughout extended RL training cycles. Ring-flash-2.0 demonstrates significant breakthroughs across challenging benchmarks, including math competitions, code generation, and logical reasoning. Its performance surpasses that of SOTA dense models under 40B parameters and rivals larger open-weight MoE models and closed-source high-performance thinking model APIs. More surprisingly, although Ring-flash-2.0 is primarily designed for complex reasoning, it also shows strong capabilities in creative writing. Thanks to its efficient architecture, it achieves high-speed inference, significantly reducing inference costs for thinking models in high-concurrency scenarios...

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Output:

0.57

/ M Tokens

Ready to accelerate your AI development?