Ling-flash-2.0 Now on SiliconFlow: Flagship MoE Model Delivering SOTA Reasoning and High Efficiency

Sep 23, 2025

TL;DR: Ling-flash-2.0 is now available on SiliconFlow — Ant Group inclusionAI's flagship MoE language model that combines SOTA reasoning with advanced efficiency. With 100B total parameters but only 6.1B activated, it delivers performance competitive with 40B dense models and 131K context window. Perfect for complex reasoning, coding, and frontend development — now empower your business and workflows at budget-friendly cost through our API services.

SiliconFlow is excited to bring you Ling-flash-2.0, the third MoE model under Ling 2.0 architecture. Building on the success of Ling-mini-2.0 and Ring-mini-2.0, this release reflects a step forward in combining efficiency and reasoning ability. Trained on over 20T high-quality tokens with multi-stage supervised fine-tuning and reinforcement learning, Ling-flash-2.0 combines advanced MoE design with real-world versatility — making it a powerful choice for complex reasoning, coding, and industry-specific applications.

Through SiliconFlow's Ling-flash-2.0 API, you can expect:

Cost-Effective Pricing: Ling-flash-2.0 $0.14/M tokens (input) and $0.57/M tokens (output).
Efficient MoE Design: MoE architecture with 100B total params with only 6.1B activated (4.8B non-embedding).
Extended Context Window: 131K context window enables users to tackle complex tasks.
Advanced Capabilities: SOTA in reasoning, code, math, and domain tasks like finance & healthcare.

Why Ling-flash-2.0 Matters

Ling-flash-2.0 consistently delivers strong performance across knowledge-intensive, mathematical, coding, logical, and domain-specific tasks such as finance and healthcare. It also proves high competitiveness in more open-ended applications, including creative writing.

Crucially, Ling-flash-2.0 not only outperforms dense models under 40B parameters (Qwen3-32B-Non-Thinking and Seed-OSS-36B (think budget=0)), but also remains competitive with larger MoE peers such as Hunyuan-80B-A13B-Instruct and GPT-OSS-120B (low), all while maintaining clear cost and efficiency advantages.

Benchmark	Ling-flash-2.0	Qwen3-32B-Non-Thinking	Seed-OSS-36B-Instruct (think budget=0)	Hunyuan-80B-A13B-Instruct	GPT-OSS-120B (low)
GPQA-Diamond	🥇68.1	56.2	52.0	61.8	63.4
MMLU-PRO	🥇77.1	69.2	73.2	65.0	74.1
AIME 2025	🥇56.6	23.1	15.0	22.6	51.9
Omni-MATH	🥇53.4	33.8	29.7	39.4	42.3
KOR-Bench	68.8	57.0	44.2	47.6	73.1
ARC-Prize	🥇24.6	3.3	4.4	0.1	10.7
LiveCodeBench v6	🥇51.38	31.5	30.7	25.8	42.7
CodeForces-Elo	🥇1600	678	605	683	1520
OptMATH	🥇39.76	15.51	14.61	2.86	26.96
HealthBench	46.17	43.0	36.9	30.0	56.4
FinanceReasoning	81.59	78.5	78.1	64.3	83.8
Creative Writing V3	🥇85.17	77.57	82.17	59.69	79.09

What Makes Ling-flash-2.0 So Efficient

Ling-flash-2.0 is built on Ling Scaling Laws and uses a 1/32 activation-ratio MoE architecture. Instead of brute-force scaling, it introduces a series of design refinements — from expert granularity and shared-expert ratio to balanced attention, smarter routing strategies, Multi-Token Prediction, QK-Norm, and Partial-RoPE.

Together, these innovations allow the model to deliver the power of ~40B dense models with only 6.1B active parameters, achieving 7× efficiency gains over equivalent dense architectures.

Real Performance on SiliconFlow

This demo demonstrates the real-world performance of Ling-flash-2.0 within the SiliconFlow Playground. Using a straightforward prompt — "Write the complete code for a Snake game" — the model rapidly generates a fully functional implementation, showcasing its ability to seamlessly integrate reasoning, coding expertise, and practical problem-solving in real time.

Get Started Immediately

1. Explore: Try Ling-flash-2.0 in the SiliconFlow playground.
2. Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requestsurl = "https://api.siliconflow.com/v1/chat/completions"payload = {    "thinking_budget": 4096,    "top_p": 0.7,    "model": "inclusionAI/Ling-flash-2.0",    "messages": [        {            "content": "I have 4 apples. I give 2 to my friend. How many apples do we have now?",            "role": "user"        }    ]}headers = {    "Authorization": "Bearer <token>",    "Content-Type": "application/json"}response = requests.post(url, json=payload, headers=headers)print(response.json())

Try Ling Flash 2.0 now on SiliconFlow and feel the difference that speed makes.

Business or Sales Inquiries →

Join our Discord community now →

Explore all available models on SiliconFlow →