Ling-flash-2.0 Now on SiliconFlow: Flagship MoE Model Delivering SOTA Reasoning and High Efficiency

Sep 23, 2025

Ling-flash-2.0 Now on SiliconFlow
Ling-flash-2.0 Now on SiliconFlow
Ling-flash-2.0 Now on SiliconFlow

TL;DR: Ling-flash-2.0 is now available on SiliconFlow Ant Group inclusionAI's flagship MoE language model that combines SOTA reasoning with advanced efficiency. With 100B total parameters but only 6.1B activated, it delivers performance competitive with 40B dense models and 131K context window. Perfect for complex reasoning, coding, and frontend development — now empower your business and workflows at budget-friendly cost through our API services.


SiliconFlow is excited to bring you Ling-flash-2.0, the third MoE model under Ling 2.0 architecture. Building on the success of Ling-mini-2.0 and Ring-mini-2.0, this release reflects a step forward in combining efficiency and reasoning ability. Trained on over 20T high-quality tokens with multi-stage supervised fine-tuning and reinforcement learning, Ling-flash-2.0 combines advanced MoE design with real-world versatility — making it a powerful choice for complex reasoning, coding, and industry-specific applications.


Through SiliconFlow's Ling-flash-2.0 API, you can expect:


  • Cost-Effective Pricing: Ling-flash-2.0 $0.14/M tokens (input) and $0.57/M tokens (output).

  • Efficient MoE Design: MoE architecture with 100B total params with only 6.1B activated (4.8B non-embedding).

  • Extended Context Window: 131K context window enables users to tackle complex tasks.

  • Advanced Capabilities: SOTA in reasoning, code, math, and domain tasks like finance & healthcare.


Why Ling-flash-2.0 Matters


Ling-flash-2.0 consistently delivers strong performance across knowledge-intensive, mathematical, coding, logical, and domain-specific tasks such as finance and healthcare. It also proves high competitiveness in more open-ended applications, including creative writing.


Crucially, Ling-flash-2.0 not only outperforms dense models under 40B parameters (Qwen3-32B-Non-Thinking and Seed-OSS-36B (think budget=0)), but also remains competitive with larger MoE peers such as Hunyuan-80B-A13B-Instruct and GPT-OSS-120B (low), all while maintaining clear cost and efficiency advantages.


Benchmark

Ling-flash-2.0

Qwen3-32B-Non-Thinking

Seed-OSS-36B-Instruct (think budget=0)

Hunyuan-80B-A13B-Instruct

GPT-OSS-120B (low)

GPQA-Diamond

🥇68.1

56.2

52.0

61.8

63.4

MMLU-PRO

🥇77.1

69.2

73.2

65.0

74.1

AIME 2025

🥇56.6

23.1

15.0

22.6

51.9

Omni-MATH

🥇53.4

33.8

29.7

39.4

42.3

KOR-Bench

68.8

57.0

44.2

47.6

73.1

ARC-Prize

🥇24.6

3.3

4.4

0.1

10.7

LiveCodeBench v6

🥇51.38

31.5

30.7

25.8

42.7

CodeForces-Elo

🥇1600

678

605

683

1520

OptMATH

🥇39.76

15.51

14.61

2.86

26.96

HealthBench

46.17

43.0

36.9

30.0

56.4

FinanceReasoning

81.59

78.5

78.1

64.3

83.8

Creative Writing V3

🥇85.17

77.57

82.17

59.69

79.09


What Makes Ling-flash-2.0 So Efficient


Ling-flash-2.0 is built on Ling Scaling Laws and uses a 1/32 activation-ratio MoE architecture. Instead of brute-force scaling, it introduces a series of design refinements — from expert granularity and shared-expert ratio to balanced attention, smarter routing strategies, Multi-Token Prediction, QK-Norm, and Partial-RoPE.


Together, these innovations allow the model to deliver the power of ~40B dense models with only 6.1B active parameters, achieving 7× efficiency gains over equivalent dense architectures.


Image


Real Performance on SiliconFlow


This demo demonstrates the real-world performance of Ling-flash-2.0 within the SiliconFlow Playground. Using a straightforward prompt — "Write the complete code for a Snake game" — the model rapidly generates a fully functional implementation, showcasing its ability to seamlessly integrate reasoning, coding expertise, and practical problem-solving in real time.


Image


Get Started Immediately


  1. 1. Explore: Try Ling-flash-2.0 in the SiliconFlow playground.

  2. 2. Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.


import requestsurl = "https://api.siliconflow.com/v1/chat/completions"payload = {    "thinking_budget": 4096,    "top_p": 0.7,    "model": "inclusionAI/Ling-flash-2.0",    "messages": [        {            "content": "I have 4 apples. I give 2 to my friend. How many apples do we have now?",            "role": "user"        }    ]}headers = {    "Authorization": "Bearer <token>",    "Content-Type": "application/json"}response = requests.post(url, json=payload, headers=headers)print(response.json())


Try Ling Flash 2.0 now on SiliconFlow and feel the difference that speed makes.


Business or Sales Inquiries →

Join our Discord community now →

Follow us on X for the latest updates →

Explore all available models on SiliconFlow →



Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.