Ling-flash-2.0 API, Deployment, Pricing

inclusionAI/Ling-flash-2.0

Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods

API Usage

curl --request POST \
  --url https://api.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "thinking_budget": 4096,
  "top_p": 0.7,
  "model": "inclusionAI/Ling-flash-2.0",
  "messages": [
    {
      "content": "I have 4 apples. I give 2 to my friend. How many apples do we have now?",
      "role": "user"
    }
  ]
}'

Details

Model Provider

inclusionAI

Type

text

Sub Type

chat

Size

100B

Publish Time

Sep 18, 2025

Input Price

$

0.14

/ M Tokens

Output Price

$

0.57

/ M Tokens

Context length

131K

Tags

MoE,106B,A6B,131K

Compare with Other Models

See how this model stacks up against others.

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.