Hunyuan-A13B-Instruct

tencent/Hunyuan-A13B-Instruct

Hunyuan-A13B-Instruct activates only 13 B of its 80 B parameters, yet matches much larger LLMs on mainstream benchmarks. It offers hybrid reasoning: low-latency “fast” mode or high-precision “slow” mode, switchable per call. Native 256 K-token context lets it digest book-length documents without degradation. Agent skills are tuned for BFCL-v3, τ-Bench and C3-Bench leadership, making it an excellent autonomous assistant backbone. Grouped Query Attention plus multi-format quantization delivers memory-light, GPU-efficient inference for real-world deployment, with built-in multilingual support and robust safety alignment for enterprise-grade applications.

API Usage

curl --request POST \
  --url https://api.ap.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "tencent/Hunyuan-A13B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": "tell me a story"
      }
    ],
    "stream": false,
    "max_tokens": 512,
    "enable_thinking": true,
    "thinking_budget": 4096,
    "min_p": 0.05,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "frequency_penalty": 0.5,
    "n": 1,
    "stop": []
  }'

Details

Model Provider

hunyuan

Type

text

Sub Type

chat

Size

80

Publish Time

Jun 30, 2025

Input Price

$

0.14

/ M Tokens

Output Price

$

0.57

/ M Tokens

Context length

131072

Tags

Reasoning,MoE,80B,128K

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.