Hunyuan-A13B-Instruct 现已在 SiliconFlow 上可用

2025年6月30日

为什么Hunyuan-A13B-Instruct重要？

紧凑但强大：仅13亿活跃参数（总共80亿参数），模型在多种基准任务中提供具有竞争力的表现，媲美更大的模型。
混合推理支持：支持快速和慢速思维模式，允许用户根据需要灵活选择。
超长上下文理解：原生支持256K上下文窗口，在长Text任务中保持稳定性能。
增强的代理能力：为代理任务优化，在BFCL-v3, τ-Bench和C3-Bench等基准上取得领先结果。
高效推理：利用分组查询注意力（GQA）并支持多种量化格式，实现高效推理。

快速开始

直接在SiliconFlow模型广场上试用Hunyuan-A13B-Instruct模型。

快速访问API

以下Python示例演示了如何使用SiliconFlow的API端点调用Hunyuan-A13B-Instruct模型。有关更多规格，请参阅SiliconFlow API文档。

from openai import OpenAI

url = 'https://api.siliconflow.com/v1/'
api_key = 'your_api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content = ""
messages = [
    {"role": "user", "content": "How do you implement a binary search algorithm in Python with detailed comments?"}
]
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Gradually receive and process the response
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True
)

from openai import OpenAI

url = 'https://api.siliconflow.com/v1/'
api_key = 'your_api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content = ""
messages = [
    {"role": "user", "content": "How do you implement a binary search algorithm in Python with detailed comments?"}
]
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Gradually receive and process the response
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True
)

from openai import OpenAI

url = 'https://api.siliconflow.com/v1/'
api_key = 'your_api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content = ""
messages = [
    {"role": "user", "content": "How do you implement a binary search algorithm in Python with detailed comments?"}
]
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Gradually receive and process the response
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True
)