Hunyuan-A13B-Instruct Now Available on SiliconFlow

Jun 30, 2025

The Tencent Hunyuan AI team announced the release of Hunyuan-A13B-Instruct, an open-source large language model (LLM) now available on the SiliconFlow platform.

Built on a fine-grained Mixture-of-Experts (MoE) architecture, the model efficiently scales 80B total parameters with only 13B active parameters, achieving state-of-the-art performance across multiple benchmarks—particularly in mathematics, science, agent domains, and more.

SiliconFlow supports:

  • Extended Context: Default 128K token context windows(256K available upon request).

  • Cost-Optimized Pricing: 0.14/M tokens(input) and 0.57/M tokens (output).

Why Hunyuan-A13B-Instruct matters?

  • Compact yet Powerful: With only 13 billion active parameters (out of a total of 80 billion), the model delivers competitive performance on a wide range of benchmark tasks, rivaling much larger models.

  • Hybrid Reasoning Support: Supports both fast and slow thinking modes, allowing users to flexibly choose according to their needs.

  • Ultra-Long Context Understanding: Natively supports a 256K context window, maintaining stable performance on long-text tasks.

  • Enhanced Agent Capabilities: Optimized for agent tasks, achieving leading results on benchmarks such as BFCL-v3, τ-Bench and C3-Bench.

  • Efficient Inference: Utilizes Grouped Query Attention (GQA) and supports multiple quantization formats, enabling highly efficient inference.

Quick Start

Try the Hunyuan-A13B-Instruct model directly on the SiliconFlow playground.

Quick Access to API

The following Python example demonstrates how to invoke the Hunyuan-A13B-Instruct model using SiliconFlow's API endpoint. For more specifications, please refer to the SiliconFlow API documentation.

from openai import OpenAI

url = 'https://api.ap.siliconflow.com/v1/'
api_key = 'your_api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content = ""
messages = [
    {"role": "user", "content": "How do you implement a binary search algorithm in Python with detailed comments?"}
]
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Gradually receive and process the response
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True
)

Hunyuan-A13B-Instruct is an ideal choice for researchers and developers seeking high performance. Whether for academic research, cost-effective AI solution development, or innovative application exploration, this model provides a robust foundation for advancement.

Start building with Hunyuan-A13B-Instruct today at SiliconFlow!

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.