Models

Products

Pricing

Docs

Blog

About

Contact

Back to Models

Qwen3-30B-A3B-Instruct-2507 API, Deployment, Pricing

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen3-30B-A3B-Instruct-2507 is the updated version of the Qwen3-30B-A3B non-thinking mode. It is a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion activated parameters. This version features key enhancements, including significant improvements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and offers markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Furthermore, its capabilities in long-context understanding have been enhanced to 256K. This model supports only non-thinking mode and does not generate `<think></think>` blocks in its output

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.siliconflow.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
  "min_p": 0.05,
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "messages": [
    {
      "content": "Hello, how are you?",
      "role": "user"
    }
  ]
}'

Details

Model Provider

Qwen

Type

text

Sub Type

chat

Size

30B

Publish Time

Jul 30, 2025

Input Price

0.09

/ M Tokens

Output Price

0.3

/ M Tokens

Context length

262K

tencent/Hunyuan-MT-7B

The Hunyuan Translation Model consists of a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. Hunyuan-MT-7B is a lightweight translation model with 7 billion parameters used to translate source text into the target language. The model supports mutual translation among 33 languages, including five ethnic minority languages in China. In the WMT25 machine translation competition, Hunyuan-MT-7B won first place in 30 out of the 31 language categories it participated in, demonstrating its outstanding translation capabilities. For translation tasks, Tencent Hunyuan proposed a comprehensive training framework covering pre-training, supervised fine-tuning, translation enhancement, and ensemble refinement, achieving state-of-the-art performance among models of a similar scale. The model is computationally efficient and easy to deploy, making it suitable for various application scenarios

Qwen/Qwen3-Next-80B-A3B-Instruct

Qwen3-Next-80B-A3B-Instruct is a next-generation foundation model released by Alibaba's Qwen team. It is built on the new Qwen3-Next architecture, designed for ultimate training and inference efficiency. The model incorporates innovative features such as a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention), a High-Sparsity Mixture-of-Experts (MoE) structure, and various stability optimizations. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters per token during inference, which significantly reduces computational costs and delivers over 10 times higher throughput than the Qwen3-32B model for long-context tasks exceeding 32K tokens. This is an instruction-tuned version optimized for general-purpose tasks and does not support 'thinking' mode. In terms of performance, it is comparable to Qwen's flagship model, Qwen3-235B, on certain benchmarks, showing significant advantages in ultra-long-context scenarios

inclusionAI/Ling-flash-2.0

Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods

inclusionAI/Ling-mini-2.0

Ling-mini-2.0 is a small yet high-performance large language model built on the MoE architecture. It has 16B total parameters, but only 1.4B are activated per token (non-embedding 789M), enabling extremely fast generation. Thanks to the efficient MoE design and large-scale high-quality training data, despite having only 1.4B activated parameters, Ling-mini-2.0 still delivers top-tier downstream task performance comparable to sub-10B dense LLMs and even larger MoE models

moonshotai/Kimi-K2-Instruct-0905

Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters. Key features include enhanced agentic coding intelligence, with the model demonstrating significant improvements on public benchmarks and real-world coding agent tasks; an improved frontend coding experience, offering advancements in both the aesthetics and practicality of frontend programming

ByteDance-Seed/Seed-OSS-36B-Instruct

Seed-OSS is a series of open-source large language models developed by the ByteDance Seed team, designed for powerful long-context processing, reasoning, agent capabilities, and general-purpose abilities. Within this series, Seed-OSS-36B-Instruct is an instruction-tuned model with 36 billion parameters that natively supports an ultra-long context length, enabling it to process massive documents or complex codebases in a single pass. The model is specially optimized for reasoning, code generation, and agent tasks (such as tool use), while maintaining balanced and excellent general-purpose capabilities. A key feature of this model is the ‘Thinking Budget’ function, which allows users to flexibly adjust the reasoning length as needed, thereby effectively improving inference efficiency in practical applications

deepseek-ai/DeepSeek-V3.1

DeepSeek-V3.1 is a hybrid large language model released by DeepSeek AI, featuring significant upgrades over its predecessor. A key innovation is the integration of both a 'Thinking Mode' for deliberative, chain-of-thought reasoning and a 'Non-thinking Mode' for direct responses, which can be switched via the chat template to suit various tasks. The model's capabilities in tool use and agent tasks have been substantially improved through post-training optimization, enabling better support for external search tools and complex multi-step instructions. DeepSeek-V3.1 is post-trained on top of the DeepSeek-V3.1-Base model, which underwent a two-phase long-context extension with a vastly expanded dataset, enhancing its ability to process long documents and codebases. As an open-source model, DeepSeek-V3.1 demonstrates performance comparable to leading closed-source models on various benchmarks, particularly in coding, math, and reasoning, while its Mixture-of-Experts (MoE) architecture maintains a massive parameter count while reducing inference costs

zai-org/GLM-4.5V

GLM-4.5V is the latest generation vision-language model (VLM) released by Zhipu AI. The model is built upon the flagship text model GLM-4.5-Air, which has 106B total parameters and 12B active parameters, and it utilizes a Mixture-of-Experts (MoE) architecture to achieve superior performance at a lower inference cost. Technically, GLM-4.5V follows the lineage of GLM-4.1V-Thinking and introduces innovations like 3D Rotated Positional Encoding (3D-RoPE), significantly enhancing its perception and reasoning abilities for 3D spatial relationships. Through optimization across pre-training, supervised fine-tuning, and reinforcement learning phases, the model is capable of processing diverse visual content such as images, videos, and long documents, achieving state-of-the-art performance among open-source models of its scale on 41 public multimodal benchmarks. Additionally, the model features a 'Thinking Mode' switch, allowing users to flexibly choose between quick responses and deep reasoning to balance efficiency and effectiveness

openai/gpt-oss-20b

gpt-oss-20b is OpenAI’s lightweight open-weight model with ~21B parameters (3.6B active), built on an MoE architecture and MXFP4 quantization to run locally on 16 GB VRAM devices. It matches o3-mini in reasoning, math, and health tasks, supporting CoT, tool use, and deployment via frameworks like Transformers, vLLM, and Ollama.

openai/gpt-oss-120b

gpt-oss-120b is OpenAI’s open-weight large language model with ~117B parameters (5.1B active), using a Mixture-of-Experts (MoE) design and MXFP4 quantization to run on a single 80 GB GPU. It delivers o4-mini-level or better performance in reasoning, coding, health, and math benchmarks, with full Chain-of-Thought (CoT), tool use, and Apache 2.0-licensed commercial deployment support.

stepfun-ai/step3

Step3 is a cutting-edge multimodal reasoning model from StepFun. It is built on a Mixture-of-Experts (MoE) architecture with 321B total parameters and 38B active parameters. The model is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision-language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators. During pretraining, Step3 processed over 20T text tokens and 4T image-text mixed tokens, spanning more than ten languages. The model has achieved state-of-the-art performance for open-source models on various benchmarks, including math, code, and multimodality

Qwen/Qwen3-Coder-30B-A3B-Instruct

Qwen3-Coder-30B-A3B-Instruct is a code model from the Qwen3 series developed by Alibaba's Qwen team. As a streamlined and optimized model, it maintains impressive performance and efficiency while focusing on enhanced coding capabilities. It demonstrates significant performance advantages among open-source models on complex tasks such as Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. The model natively supports a long context of 256K tokens, which can be extended up to 1M tokens, enabling better repository-scale understanding and processing. Furthermore, it provides robust agentic coding support for platforms like Qwen Code and CLINE, featuring a specially designed function call format

Model FAQs: Usage, Deployment

Learn how to use, fine-tune, and deploy this model with ease.

What is the Qwen/Qwen3-30B-A3B-Instruct-2507 model, and what are its core capabilities and technical specifications?

In which business scenarios does Qwen/Qwen3-30B-A3B-Instruct-2507 perform well? Which industries or applications is it suitable for?

How can the performance and effectiveness of Qwen/Qwen3-30B-A3B-Instruct-2507 be optimized in actual business use?

Compared with other models, when should Qwen/Qwen3-30B-A3B-Instruct-2507 be selected?

What are SiliconFlow's key strengths in AI serverless deployment for Qwen/Qwen3-30B-A3B-Instruct-2507?

What makes SiliconFlow the top platform for Qwen/Qwen3-30B-A3B-Instruct-2507 API?

Ready to accelerate your AI development?