Qwen3.5 Series Now on SiliconFlow: Multimodal AI Excellence

2026/05/12

TL; DR: The Qwen3.5 series, representing Alibaba's latest advancement in foundation model development, is now available on SiliconFlow, featuring five powerful models ranging from 9B to 397B parameters. These next-generation multimodal models deliver breakthrough performance in reasoning, coding, multilingual understanding, and vision-language tasks with exceptional cost-efficiency. Start building with SiliconFlow's API today to supercharge your AI workflow.

Through SiliconFlow's API, You Can Expect

Cost-effective Pricing:
- Qwen/Qwen3.5-397B-A17B: $0.39/M tokens (Input) / $2.34/M tokens (Output)
- Qwen/Qwen3.5-122B-A10B: $0.26/M tokens (Input) / $2.08/M tokens (Output)
- Qwen/Qwen3.5-35B-A3B: $0.24/M tokens (Input) / $1.80/M tokens (Output)
- Qwen/Qwen3.5-27B: $0.25/M tokens (Input) / $2.00/M tokens (Output)
- Qwen/Qwen3.5-9B: $0.10/M tokens (Input) / $0.15/M tokens (Output)
Seamless Integration: Instant compatibility with your existing development ecosystem: deploy via SiliconFlow's OpenAI-Compatible API through Cline, Gen-CLI, Kilo Code, Roo Code ; Anthropic-Compatible API with Claude Code; plug into agents like OpenClaw, Hermes Agent; ready-to-use in Dify, Janitor AI, Chub AI, ChatHub, Chatbox, Sider; and also available through OpenRouter.

Key Features & Benchmark Performance

The release of Qwen3.5 brings five distinct models—Qwen3.5-9B, Qwen3.5-27B, Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-397B-A17B—each optimized for different performance and efficiency requirements.

These models represent a significant leap forward in multimodal AI, integrating breakthroughs in architectural efficiency, reinforcement learning scale, and native multimodal capabilities. Whether you're building intelligent agents, complex reasoning systems, or multilingual applications, the Qwen3.5 series delivers unprecedented capability and efficiency across the entire spectrum of AI workloads.

Qwen3.5 Highlights

Model	Archetype	Core Strengths
Qwen3.5-397B-A17B	Flagship	Industry-leading reasoning, vision & language; unmatched multilingual support
Qwen3.5-122B-A10B	Versatile	Elite instruction-following, agentic workflows, coding & vision
Qwen3.5-35B-A3B	Efficient	Peak cost-efficiency; strong multilingual & VL capabilities
Qwen3.5-27B	Dense	Consistent dense performance; superior coding & vision
Qwen3.5-9B	Compact	Edge/constrained deployment; impressive long-context & vision

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early-fusion training matches Qwen3 cross-generational parity and surpasses Qwen3-VL across reasoning, code, agents, and vision.
Efficient Hybrid Architecture: Gated Delta Networks + sparse MoE deliver high-throughput inference with minimal latency and cost.
Scalable RL Generalization: Million-agent RL with progressive task complexity ensures robust real-world adaptability.
Global Linguistic Coverage: 201 supported languages and dialects enable culturally nuanced, inclusive deployment.
Next-Gen Training Infra: Near-lossless multimodal training efficiency (vs. text-only) and async RL for massive agent-environment orchestration.

Performance Showcase: Qwen3.5-397B-A17B

Real-World Applications

Intelligent Coding Assistants: Build advanced code generation, debugging, and refactoring tools with state-of-the-art performance on SWE-bench and coding benchmarks.
Multimodal Content Analysis: Process and understand documents, images, and videos simultaneously for applications in education, healthcare, legal document review, and media analysis.
Multilingual AI Systems: Deploy truly global applications with robust support for languages, from customer service chatbots to content localization platforms.
Autonomous AI Agents: Create sophisticated agents with tool-calling capabilities for web browsing, data analysis, task automation, and complex workflow orchestration.

Get Started Immediately

Explore: Try the Qwen3.5 models in the SiliconFlow playground:
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

Python Example for Qwen3.5-397B-A17B API Usage:

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "Qwen/Qwen3.5-397B-A17B",
    "stream": False,
    "max_tokens": 4096,
    "enable_thinking": True,
    "temperature": 0.7,
    "messages": [
        {
            "role": "user",
            "content": "How many r's are in the word 'strawberry'?"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "Qwen/Qwen3.5-397B-A17B",
    "stream": False,
    "max_tokens": 4096,
    "enable_thinking": True,
    "temperature": 0.7,
    "messages": [
        {
            "role": "user",
            "content": "How many r's are in the word 'strawberry'?"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "Qwen/Qwen3.5-397B-A17B",
    "stream": False,
    "max_tokens": 4096,
    "enable_thinking": True,
    "temperature": 0.7,
    "messages": [
        {
            "role": "user",
            "content": "How many r's are in the word 'strawberry'?"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)