Models

Products

Pricing

Docs

Blog

About

Contact

🎉 Ring-1T Now on SiliconFlow: The World's First Open-Source Trillion-Parameter Thinking Model

Back to Blogs

Seed-OSS-36B-Instruct Now Available On SiliconFlow: Smarter AI That Thinks On-Demand

Sep 5, 2025

TL;DR: Try ByteDance's Seed-OSS-36B-Instruct on SiliconFlow today - get smarter reasoning via controllable thinking budgets, premium-quality results at an affordable price, and a production-ready API for seamless deployment and scaling.

SiliconFlow is excited to bring Seed-OSS-36B-Instruct to our model catalog - ByteDance's revolutionary open-source model that puts AI reasoning control in your hands. With its Flexible Thinking Budget, users can precisely adjust reasoning depth for each task, while enhanced reasoning capabilities and agentic intelligence deliver exceptional problem-solving performance.

With SiliconFlow's Seed-OSS-36B-Instruct API, you can expect:

Competitive Pricing: Seed-OSS-36B-Instruct $0.21/M tokens (input) and $0.57/M tokens (output).
262k Context Window Support: Enables users to tackle complex tasks smoothly.

Why Seed-OSS Matters

Most open-source models often feel like a black box: you can't control how much the AI thinks, long documents quickly hit context limits, and costs scale unpredictably with task complexity. Seed-OSS-36B-Instruct changes that:

Flexible Control of Thinking Budget: Users can flexibly adjust reasoning length to match task complexity, balancing accuracy, efficiency, and cost. Set budgets in multiples of 512 tokens (with 0 for instant direct response), giving developers control over performance in different deployment scenarios - especially perfect for applications like customer support or autonomous agents.
Native Long Context: Not retrofitted like other models, Seed-OSS is trained with up-to-512K long context natively. In other words, it provides more stable and consistent performance even with massive inputs.
Advanced Reasoning & Agentic Intelligence: Specifically optimized for complex reasoning tasks while maintaining balanced general capabilities, with exceptional performance in agentic workflows such as tool-using, multi-step problem-solving, and issue resolution.

Furthermore, Seed-OSS-36B-Instruct matches or exceeds the performance of top-tier open-source models in its class, including Qwen3-30B-A3B-Thinking-2507, Qwen3-32B, and OAI-OSS-20B, across mathematics, coding, reasoning, agent tasks, and long-context processing tasks.

Benchmark	Seed-OSS-36B-Instruct	Qwen3-30B-A3B-Thinking-2507	Qwen3-32B	OAI-OSS-20B	Gemma3-27B
Knowledge
MMLU-Pro	🥇82.7	81.9	81.8	76.2	67.5
MMLU	🥇87.4	86.9	86.2	81.7	76.9
GPQA-D	71.4	71.4	66.7	72.2	42.4
Math
AIME24	91.7	87.7	82.7	92.7
AIME25	84.7	81.3	73.3	90.3
Reasoning
HLE	10.1	8.7	6.9	12.7
Coding
LiveCodeBench v6	🥇67.4	60.3	53.4	63.8
Agent
TAU1-Retail	🥇70.4	58.7	40.9	54.8
SWE-Bench Verified	🥇47	39.7	23.4	60.7
Long Context
RULER (128K)	🥇94.6	94.5	77.5	78.7

Real-World Applications Scenarios

How Thinking Budget works in practice? When you set a thinking budget, the model operates with full transparency. Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes.

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

This controllable reasoning combined wit advanced agentic capabilities opens up powerful use cases:

Adaptive Customer Support:
Scale AI reasoning based on query complexity: instant responses for FAQs, deep analysis for technical issues. Control costs while maintaining service quality across simple and complex customer interactions.
Enterprise Document Intelligence:
Support information extraction and analysis from long documents like compliance manuals, contract bundles, or regulatory frameworks. Work across multiple related documents while preserving context connections.
Smart Development Workflows:
Quick syntax checks with zero thinking budget, comprehensive architecture reviews with full reasoning power. Handle entire codebases in single sessions rather than isolated code snippets.
Global Operations:
Deploy consistent AI assistance across international markets with native multilingual capabilities. Support cross-jurisdictional research, cultural adaptation insights, and regional market analysis within unified workflows.

Whether you're optimizing customer support efficiency, processing massive document libraries, streamlining development workflows, or scaling global operations, this model adapts to your specific needs while maintaining transparency and cost predictability.

Get Started Immediately

Explore: Try Seed-OSS-36B-Instruct in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)