Seed-OSS-36B-Instruct 现已在 SiliconFlow 上可用：更智能的 AI 按需思考

2025年9月5日

SiliconFlow 很高兴将Seed-OSS-36B-Instruct引入我们的模型目录——这是字节跳动革命性开源模型，将 AI 推理控制权交到用户手中。通过其灵活的思维预算，用户可以精确调整每个任务的推理深度，而强大的推理能力和代理智能则提供卓越的问题解决性能。

使用SiliconFlow的 Seed-OSS-36B-Instruct API，您可以期望：

具竞争力的定价： Seed-OSS-36B-Instruct $0.21/百万 tokens（Input）和$0.57/百万 tokens（Output）。
262k 上下文窗口支持：使用户能够顺利处理复杂任务。

Seed-OSS 的重要性

大多数开源模型常常让人觉得像个黑盒子：无法控制 AI 的思维多寡，长文档迅速达到上下文限制，成本随任务复杂性不可预期地上升。Seed-OSS-36B-Instruct改变了这一切：

灵活控制思维预算：用户可以灵活调整推理长度以匹配任务复杂性，平衡准确性、效率和成本。预算可设置为 512 token 的倍数（使用 0 进行即时直接响应），使开发人员能够在不同部署场景中控制性能——尤其适用于客户支持或自主代理等应用。
原生长上下文：未像其他模型一样改造，Seed-OSS 以原生方式培训了长达 512K 的长上下文。换句话说，即使输入量很大，也能保持更稳定和一致的性能。
高级推理与代理智能：专门优化了复杂推理任务，同时保持了平衡的通用能力，在使用工具、多步问题解决和问题解决等代理工作流中表现出色。

此外，Seed-OSS-36B-Instruct 匹配或超越其类别中顶级开源模型的性能，包括Qwen3-30B-A3B-Thinking-2507、Qwen3-32B和OAI-OSS-20B，在数学、编程、推理、代理任务及长上下文处理任务中均表现出色。

基准测试	Seed-OSS-36B-Instruct	Qwen3-30B-A3B-Thinking-2507	Qwen3-32B	OAI-OSS-20B	Gemma3-27B
知识
MMLU-Pro	🥇82.7	81.9	81.8	76.2	67.5
MMLU	🥇87.4	86.9	86.2	81.7	76.9
GPQA-D	71.4	71.4	66.7	72.2	42.4
数学
AIME24	91.7	87.7	82.7	92.7
AIME25	84.7	81.3	73.3	90.3
推理
HLE	10.1	8.7	6.9	12.7
编程
LiveCodeBench v6	🥇67.4	60.3	53.4	63.8
代理
TAU1-Retail	🥇70.4	58.7	40.9	54.8
SWE-Bench 已验证	🥇47	39.7	23.4	60.7
长上下文
RULER (128K)	🥇94.6	94.5	77.5	78.7

现实应用场景

思维预算在实践中如何工作？当您设置一个思维预算时，模型将以完全透明的方式运行。以下是一个思维预算设置为 512 的示例：在推理过程中，模型周期性触发自我反思以估算消耗的预算和剩余预算，并在预算用尽或推理结束时提供最终响应。

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

这种可控的推理结合高级代理能力打开了强大的使用案例：

自适应客户支持：
根据查询复杂性调整 AI 推理：对于常见问题提供即时响应，对于技术问题进行深入分析。在控制成本的同时保持简单和复杂客户交互中的服务质量。
企业文档智能：
支持从长文档中提取信息和进行分析，如合规手册、合同包或监管框架。跨多个相关文档工作，同时保留上下文连接。
智能开发工作流：
使用零思维预算进行快速语法检查，使用全部推理能力进行全面架构审查。在单一会话中处理整个代码库，而不是孤立的代码片段。
全球运营：
在所有国际市场部署一致的 AI 支持，具备原生多语言能力。在统一的工作流中支持跨法域的研究、文化适应性见解和区域市场分析。

无论您是在优化客户支持效率、处理大量文档库、简化开发工作流，还是扩展全球运营，模型都能适应您的具体需求，同时保持透明度和成本可预测性。

立即开始

探索：在Seed-OSS-36B-Instruct中尝试SiliconFlow 模型广场。
集成：使用我们的 OpenAI 兼容 API。在SiliconFlow API 文档中探索完整的 API 规格。

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)