Seed-OSS-36B-Instruct がシリコンフローで利用可能に: 要求に応じて考えるスマートなAI

2025/09/05

TL;DR: バイトダンスの Seed-OSS-36B-Instruct をSiliconFlowで今日試してみましょう - 制御可能な思考予算によるスマートな推論、高品質な結果を手頃な価格で、シームレスな導入とスケーリングのための本番環境向けAPI。

SiliconFlowは、Seed-OSS-36B-Instruct をモデルカタログに導入できることを嬉しく思います - バイトダンスの革命的なオープンソースモデルはAIの推論制御を手に入れます。柔軟な思考予算により、ユーザーは各タスクの推論深度を正確に調整でき、推論能力の向上とエージェント知能が優れた問題解決能力を提供します。

SiliconFlowのSeed-OSS-36B-Instructの APIで期待できるのは:

競争力のある価格設定: Seed-OSS-36B-Instruct $0.21/M tokens (Input) と $0.57/M tokens (Output)。
262kコンテキストウィンドウサポート: ユーザーが複雑なタスクをスムーズに処理できます。

Seed-OSSの重要性

ほとんどのオープンソースモデルはしばしばブラックボックスのように感じられます：AIがどれだけ考えるかを制御できず、長い文書はすぐにコンテキスト制限に達し、タスクの複雑さに応じてコストが予測できずに増大します。Seed-OSS-36B-Instructがそれを変えます：

思考予算の柔軟な制御: ユーザーはタスクの複雑さに合わせて推論長を柔軟に調整し、精度、効率、コストのバランスを取ることができます。512 tokens単位で予算を設定し（Instant Direct Responseの場合は0）、異なる展開シナリオでのパフォーマンスを開発者が制御できるように - 特にカスタマーサポートや自律エージェントなどのアプリケーションに最適です。
ネイティブな長いコンテキスト: 他のモデルのように後から付け足されたものではなく、Seed-OSSは最大512Kの長いコンテキストでトレーニングされます。つまり、大量のInputでもより安定した一貫したパフォーマンスを提供します。
高度な推論とエージェント知能: 複雑な推論タスクに特化して最適化され、ツール使用、マルチステップ問題解決、問題解決などのエージェントワークフローにおいて、一般能力を維持しながら卓越したパフォーマンスを提供します。

さらに、Seed-OSS-36B-Instructは、自身のクラス内で最高級のオープンソースモデルの性能と一致または上回り、Qwen3-30B-A3B-Thinking-2507、Qwen3-32B、そしてOAI-OSS-20Bを含む、数学、コーディング、推論、エージェントタスク、長いコンテキスト処理タスクで最高の性能を発揮します。

ベンチマーク	Seed-OSS-36B-Instruct	Qwen3-30B-A3B-Thinking-2507	Qwen3-32B	OAI-OSS-20B	Gemma3-27B
知識
MMLU-Pro	🥇82.7	81.9	81.8	76.2	67.5
MMLU	🥇87.4	86.9	86.2	81.7	76.9
GPQA-D	71.4	71.4	66.7	72.2	42.4
数学
AIME24	91.7	87.7	82.7	92.7
AIME25	84.7	81.3	73.3	90.3
推論
HLE	10.1	8.7	6.9	12.7
コーディング
LiveCodeBench v6	🥇67.4	60.3	53.4	63.8
エージェント
TAU1-Retail	🥇70.4	58.7	40.9	54.8
SWE-Bench Verified	🥇47	39.7	23.4	60.7
長いコンテキスト
RULER (128K)	🥇94.6	94.5	77.5	78.7

リアルワールドのアプリケーションシナリオ

思考予算は実際にはどのように機能するのか？考えられるシナリオでは、512の思考予算を設定した例：推論プロセス中に、モデルは定期的に自己反省をトリガーして消費および残りの予算を推定し、予算が使い切れるか推論が終了すると最終応答を返します。

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

<seed:think>
Got it, let's try to solve this problem step by step. The problem says ... ...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule, ... ...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that ... ...
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
Because if ... ...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted)

この制御可能な推論は高度なエージェント機能と組み合わさり、強力な使用例を開きます：

適応型カスタマーサポート:
FAQに瞬時に応答するためのAI推論をスケールし、技術的な問題には深い分析を。簡単で複雑な顧客インタラクションでサービス品質を維持しながら、コストを管理します。
企業ドキュメントインテリジェンス:
コンプライアンスマニュアル、契約バンドル、規制フレームワークなどの長いドキュメントからの情報抽出と分析をサポート。関連する複数のドキュメントをまたいでコンテキストの接続を維持しながら作業します。
スマート開発ワークフロー:
ゼロの思考予算での迅速な構文チェック、完全な推論能力での包括的アーキテクチャーレビュー。統合されたコードスニペットではなく、単一のセッションでの全コードベースのハンドリングが可能。
グローバルオペレーション:
ネイティブな多言語機能を持つ国際市場全体で一致したAIアシスタンスを展開。統一されたワークフロー内での交管轄研究、文化適応の洞察、地域の市場分析を支援。

顧客サポートの効率化、膨大なドキュメントライブラリの処理、開発ワークフローのスリム化、グローバルオペレーションの拡大であれ、このモデルは透明性とコスト予測可能性を維持しながら、特定のニーズに適応します。

今すぐ始めましょう

探検: Seed-OSS-36B-Instruct を SiliconFlow Playgroundで試してみてください。
統合: OpenAI互換のAPIを使用。SiliconFlow APIドキュメンテーションで完全なAPI仕様をご覧ください。

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)