Hunyuan-A13B-Instruct 이제 SiliconFlow에서 사용 가능합니다.

2025. 6. 30.

Hunyuan-A13B-Instruct가 중요한 이유?

작지만 강력함: 총 80억 중 13억 개의 활성 파라미터만으로도 다양한 벤치마크 작업에서 경쟁력 있는 성능을 제공하며 더 큰 모델과 견줄 수 있습니다.
하이브리드 추론 지원: 빠른 사고 및 느린 사고 모드를 모두 지원하여 사용자 필요에 따라 유연하게 선택할 수 있습니다.
초장문 컨텍스트 이해: 본디 256K 컨텍스트 윈도우를 지원하여 긴 텍스트 작업에서도 안정적인 성능을 유지합니다.
향상된 에이전트 기능: 에이전트 작업에 최적화되어 있으며 BFCL-v3, τ-Bench, C3-Bench와 같은 벤치마크에서 선도적 결과를 달성합니다.
효율적인 Inference: Grouped Query Attention (GQA)을 활용하고 다양한 양자화 형식을 지원하여 Inference를 매우 효율적으로 수행합니다.

빠른 시작

SiliconFlow Playground에서 Hunyuan-A13B-Instruct 모델을 직접 사용해보세요.

API에 빠른 액세스

다음 Python 예제는 SiliconFlow의 API 엔드포인트를 사용하여 Hunyuan-A13B-Instruct 모델을 호출하는 방법을 보여줍니다. 더 많은 사양은 SiliconFlow API 문서를 참조하십시오.

from openai import OpenAI

url = 'https://api.siliconflow.com/v1/'
api_key = 'your_api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content = ""
messages = [
    {"role": "user", "content": "How do you implement a binary search algorithm in Python with detailed comments?"}
]
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Gradually receive and process the response
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="tencent/Hunyuan-A13B-Instruct",
    messages=messages,
    stream=True
)