Ling-mini-2.0 API, Deployment, Pricing

inclusionAI/Ling-mini-2.0

Ling-mini-2.0 is a small yet high-performance large language model built on the MoE architecture. It has 16B total parameters, but only 1.4B are activated per token (non-embedding 789M), enabling extremely fast generation. Thanks to the efficient MoE design and large-scale high-quality training data, despite having only 1.4B activated parameters, Ling-mini-2.0 still delivers top-tier downstream task performance comparable to sub-10B dense LLMs and even larger MoE models

API Usage

curl --request POST \
  --url https://api.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "inclusionAI/Ling-mini-2.0",
  "thinking_budget": 4096,
  "top_p": 0.7,
  "messages": [
    {
      "content": "Tell me a story",
      "role": "user"
    }
  ]
}'

Details

Model Provider

inclusionAI

Type

text

Sub Type

chat

Size

16B

Publish Time

Sep 10, 2025

Input Price

$

0.07

/ M Tokens

Output Price

$

0.29

/ M Tokens

Context length

131K

Tags

MoE,16B,131K

Compare with Other Models

See how this model stacks up against others.

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.