gpt-oss-20b API, Fine-Tuning, Deployment

openai/gpt-oss-20b

gpt-oss-20b is OpenAI’s lightweight open-weight model with ~21B parameters (3.6B active), built on an MoE architecture and MXFP4 quantization to run locally on 16 GB VRAM devices. It matches o3-mini in reasoning, math, and health tasks, supporting CoT, tool use, and deployment via frameworks like Transformers, vLLM, and Ollama.

API Usage

curl --request POST \
  --url https://api.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "openai/gpt-oss-20b",
  "max_tokens": 512,
  "enable_thinking": true,
  "thinking_budget": 4096,
  "min_p": 0.05,
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "frequency_penalty": 0.5,
  "n": 1,
  "messages": [
    {
      "content": "how are you today",
      "role": "user"
    }
  ]
}'

Details

Model Provider

openai

Type

text

Sub Type

chat

Size

20B

Publish Time

Aug 13, 2025

Input Price

$

0.04

/ M Tokens

Output Price

$

0.18

/ M Tokens

Context length

131K

Tags

MoE,20B,131K

Compare with Other Models

See how this model stacks up against others.

Ready to accelerate your AI development?

Ready to accelerate your AI development?

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.

© 2025 SiliconFlow Technology PTE. LTD.