OpenAI's gpt-oss Now Live on SiliconFlow: Designed for Agentic Workflows, Advanced Reasoning and Tool Use

Aug 19, 2025

SiliconFlow is excited to announce the launch of gpt-oss-120B and gpt-oss-20B — state-of-the-art open-weight language models now available on our platform. Built on a MoE architecture, gpt-oss-120B has 117 billion parameters with 5.1 billion activated per token, while gpt-oss-20B has 21 billion parameters, activating 3.6 billion per token.

Trained with reinforcement learning techniques inspired by OpenAI's advanced internal models (including o3), gpt-oss is built for agentic workflows with exceptional instruction following, tool use such as web search and Python code execution, and configurable reasoning effort— enabling both complex reasoning and lower latency outputs.

Whether you're building complex reasoning pipelines, enabling sophisticated tool use or deploying large-scale AI services, gpt-oss on SiliconFlow delivers the flexibility and power to accelerate innovation — backed by our fully optimized deployment and production-ready API service.

With SiliconFlow's gpt-oss API, you can expect:

Cost-Effective Pricing:
- gpt-oss-120b $0.09/M tokens (input) and $0.45/M tokens (output);
- gpt-oss-20b $0.04/M tokens (input) and $0.18/M tokens (output).
Extended Context Window: 131K context window for complex tasks.

Key Capabilities & Benchmark Performance

OpenAI's gpt-oss models on SiliconFlow offer versatile capabilities to adapt to a wide range of AI tasks:

Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Full chain-of-thought: Provides complete access to the model's reasoning process, facilitating easier debugging and greater trust in outputs.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution and Structured Outputs.

Also, gpt-oss-120b and gpt-oss-20b have been evaluated across standard academic benchmarks to measure their capabilities in coding, competition math, health, and agentic tool use, compared with other OpenAI reasoning models, including o3, o3‑mini, and o4-mini:

gpt-oss-120b outperforms OpenAI o3‑mini and matches or exceeds OpenAI o4-mini on competition coding (Codeforces), general problem solving (MMLU and HLE) and tool calling (TauBench). It furthermore does even better than o4-mini on health-related queries (HealthBench⁠) and competition mathematics (AIME 2024 & 2025).
gpt-oss-20b matches or exceeds OpenAI o3‑mini on these same evals, despite its small size, even outperforming it on competition mathematics and health.

Category	Benchmark	gpt-oss-120B	gpt-oss-20B	OpenAI o3-mini	OpenAI o4-mini
Coding	Codeforces	2622	2516	2073 (without tools)	2719
Tool use	TauBench	🥇 67.8	54.8	–	65.6
Health	HealthBench	🥇 57.6	42.5	37.8	50.1
Reasoning & factuality	AIME 2024 & 2025	96.6 / 97.9	96 / 98.7	87.3 / 86.5	98.7 / 99.5
	MMLU	90	85.3	87	93
	HLE	🥇 19	17.3	13.4 (without tools)	17.7
	GPQA-Diamond	80.1	71.5	77	81.4

With these features and competitive benchmark performance, gpt-oss offers developers an optimal balance of capability and cost-effectiveness.

Technical Highlights of gpt-oss

Building on these capabilities and benchmark results, the technical foundation of gpt-oss combines cutting-edge architecture with advanced training methodologies to deliver high performance:

Advanced Training & Architecture:

Trained using OpenAI's most advanced pre-training and post-training techniques, emphasizing reasoning, efficiency and real-world usability.
Built on a Transformer backbone with mixture-of-experts (MoE), gpt-oss-120b activates 5.1B parameters per token (117B total), and gpt-oss-20b activates 3.6B (21B total).
Employ alternating dense and locally banded sparse attention, grouped multi-query attention (group size 8) and Rotary Positional Embedding (RoPE) supporting context lengths up to 128k tokens.
Training data focuses on English text in STEM, coding and general knowledge, tokenized with the open-sourced o200k_harmony tokenizer.

Post-Training & Reasoning:

Following pre-training, the models undergo supervised fine-tuning and a high-compute reinforcement learning stage to align with the OpenAIModelSpec.
This process enhances chain-of-thought (CoT) reasoning and tool use capabilities, supporting configurable reasoning efforts — low, medium, and high — allowing developers to balance latency and performance via system prompts.

Get Started Immediately

Explore: Try gpt-oss in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "openai/gpt-oss-20b",
    "max_tokens": 512,
    "enable_thinking": True,
    "thinking_budget": 4096,
    "min_p": 0.05,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "frequency_penalty": 0.5,
    "n": 1,
    "messages": [
        {
            "content": "how are you today",
            "role": "user"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())