
Qwen
Text Generation
Qwen3.6-35B-A3B
Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.6
/ M Tokens

Qwen
Text Generation
Qwen3.6-27B
Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...
Total Context:
262K
Max output:
262K
Input:
$
0.3
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
3.2
/ M Tokens

Qwen
Text Generation
Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...
Total Context:
262K
Max output:
262K
Input:
$
0.39
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
2.34
/ M Tokens

Qwen
Text Generation
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...
Total Context:
262K
Max output:
262K
Input:
$
0.26
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
2.08
/ M Tokens

Qwen
Text Generation
Qwen3.5-35B-A3B
Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...
Total Context:
262K
Max output:
262K
Input:
$
0.24
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.8
/ M Tokens

Qwen
Text Generation
Qwen3.5-27B
Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...
Total Context:
262K
Max output:
262K
Input:
$
0.25
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
2.0
/ M Tokens

Qwen
Text Generation
Qwen3.5-9B
Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...
Total Context:
262K
Max output:
262K
Input:
$
0.1
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.15
/ M Tokens

Qwen
Text Generation
Qwen3-VL-32B-Instruct
Qwen3-VL is the vision-language model in the Qwen3 series, achieving state-of-the-art(SOTA)performance on various vision-language(VL)benchmarks. The model supports high-resolution image inputs up to the megapixel level and possesses strong capabilities in general visual understanding, multilingual OCR, fine-grained visual grounding, and visual dialogue. As part of the Qwen3 series, it inherits a powerful language foundation, enabling it to understand and execute complex instructions....
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.6
/ M Tokens

Qwen
Text Generation
Qwen3-VL-32B-Thinking
Qwen3-VL-Thinking is a version of the Qwen3-VL series specially optimized for complex visual reasoning tasks. It incorporates a "Thinking Mode" , enabling it to generate detailed intermediate reasoning steps (Chain-of-Thought) before providing a final answer. This design significantly enhances the model's performance on visual question answering (VQA) and other vision-language tasks that require multi-step logic, planning, and in-depth analysis....
Total Context:
262K
Max output:
262K
Input:
$
0.2
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.5
/ M Tokens

Qwen
Text Generation
Qwen3-VL-8B-Instruct
Qwen3-VL-8B-Instruct is the vision-language model of the Qwen3 series, demonstrates strong capabilities in general visual understanding, visual-centric dialogue, and multilingual text recognition in images. ...
Total Context:
262K
Max output:
262K
Input:
$
0.18
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.68
/ M Tokens

Qwen
Text Generation
Qwen3-VL-30B-A3B-Instruct
Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....
Total Context:
262K
Max output:
262K
Input:
$
0.29
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.0
/ M Tokens

Qwen
Text Generation
Qwen3-VL-30B-A3B-Thinking
Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....
Total Context:
262K
Max output:
262K
Input:
$
0.29
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.0
/ M Tokens

Qwen
Image-to-Video
Wan2.2-I2V-A14B
$
0.29
/ Video
Cached Input:
$
text
/ Video


Qwen
Text-to-Video
Wan2.2-T2V-A14B
$
0.29
/ Video
Cached Input:
$
text
/ Video


Qwen
Text-to-Image
Qwen-Image
$
0.02
/ Image
Cached Input:
$
text
/ Image


Qwen
Text-to-Image
Qwen-Image-Edit
$
0.04
/ Image
Cached Input:
$
text
/ Image


Qwen
Text Generation
Qwen3-Coder-480B-A35B
Qwen3-Coder-480B-A35B-Instruct is the most agentic code model released by Alibaba to date. It is a Mixture-of-Experts (MoE) model with 480 billion total parameters and 35 billion activated parameters, balancing efficiency and performance. The model natively supports a 256K (approximately 262,144) token context length, which can be extended up to 1 million tokens using extrapolation methods like YaRN, enabling it to handle repository-scale codebases and complex programming tasks. Qwen3-Coder is specifically designed for agentic coding workflows, where it not only generates code but also autonomously interacts with developer tools and environments to solve complex problems. It has achieved state-of-the-art results among open models on various coding and agentic benchmarks, with performance comparable to leading models like Claude Sonnet 4. Alongside the model, Alibaba has also open-sourced Qwen Code, a command-line tool designed to fully unleash its powerful agentic coding capabilities...
Total Context:
262K
Max output:
262K
Input:
$
0.25
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
1.0
/ M Tokens

Qwen
Text Generation
Qwen3-Coder-30B-A3B-Instruct
Qwen3-Coder-30B-A3B-Instruct is a code model from the Qwen3 series developed by Alibaba's Qwen team. As a streamlined and optimized model, it maintains impressive performance and efficiency while focusing on enhanced coding capabilities. It demonstrates significant performance advantages among open-source models on complex tasks such as Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. The model natively supports a long context of 256K tokens, which can be extended up to 1M tokens, enabling better repository-scale understanding and processing. Furthermore, it provides robust agentic coding support for platforms like Qwen Code and CLINE, featuring a specially designed function call format...
Total Context:
262K
Max output:
262K
Input:
$
0.07
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.28
/ M Tokens

Qwen
Text Generation
Qwen3-30B-A3B-Instruct-2507
Qwen3-30B-A3B-Instruct-2507 is the updated version of the Qwen3-30B-A3B non-thinking mode. It is a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion activated parameters. This version features key enhancements, including significant improvements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and offers markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Furthermore, its capabilities in long-context understanding have been enhanced to 256K. This model supports only non-thinking mode and does not generate `<think></think>` blocks in its output...
Total Context:
262K
Max output:
262K
Input:
$
0.09
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.3
/ M Tokens

Qwen
Text Generation
Qwen3-32B
Qwen3-32B is the latest large language model in the Qwen series with 32.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...
Total Context:
131K
Max output:
131K
Input:
$
0.14
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.57
/ M Tokens

Qwen
Text Generation
Qwen3-14B
Qwen3-14B is the latest large language model in the Qwen series with 14.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...
Total Context:
131K
Max output:
131K
Input:
$
0.07
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.28
/ M Tokens

Qwen
Text Generation
Qwen3-8B
Qwen3-8B is the latest large language model in the Qwen series with 8.2B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...
Total Context:
131K
Max output:
131K
Input:
$
0.06
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.06
/ M Tokens

Qwen
Reranker
Qwen3-Reranker-8B
Qwen3-Reranker-8B is the 8-billion parameter text reranking model from the Qwen3 series. It is designed to refine and improve the quality of search results by accurately re-ordering documents based on their relevance to a query. Built on the powerful Qwen3 foundational models, it excels in understanding long-text with a 32k context length and supports over 100 languages. The Qwen3-Reranker-8B model is part of a flexible series that offers state-of-the-art performance in various text and code retrieval scenarios...
$
0.04
/ M Tokens
Cached Input:
$
text
/ M Tokens

Qwen
Embedding
Qwen3-Embedding-8B
Qwen3-Embedding-8B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 8B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 4096. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It ranks No.1 on the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58) and demonstrates state-of-the-art performance across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 4096) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios...
Input:
$
0.04
/ M Tokens
Cached Input:
$
text
/ M Tokens

Qwen
Embedding
Qwen3-Embedding-4B
Qwen3-Embedding-4B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 4B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 2560. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves excellent performance on the MTEB multilingual leaderboard (score 69.45) and demonstrates outstanding results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 2560) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, providing an optimal balance between efficiency and effectiveness...
Input:
$
0.02
/ M Tokens
Cached Input:
$
text
/ M Tokens

Qwen
Reranker
Qwen3-Reranker-0.6B
Qwen3-Reranker-0.6B is a text reranking model from the Qwen3 series. It is specifically designed to refine the results from initial retrieval systems by re-ordering documents based on their relevance to a given query. With 0.6 billion parameters and a context length of 32k, this model leverages the strong multilingual (supporting over 100 languages), long-text understanding, and reasoning capabilities of its Qwen3 foundation. Evaluation results show that Qwen3-Reranker-0.6B achieves strong performance across various text retrieval benchmarks, including MTEB-R, CMTEB-R, and MLDR...
$
0.01
/ M Tokens
Cached Input:
$
text
/ M Tokens

Qwen
Embedding
Qwen3-Embedding-0.6B
Qwen3-Embedding-0.6B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 0.6B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 1024. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves strong performance on the MTEB multilingual leaderboard (score 64.33) and demonstrates excellent results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 1024) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, making it an ideal choice for applications prioritizing both efficiency and effectiveness...
Input:
$
0.01
/ M Tokens
Cached Input:
$
text
/ M Tokens

Qwen
Text Generation
Qwen2.5-72B-Instruct
Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....
Total Context:
33K
Max output:
4K
Input:
$
0.59
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.59
/ M Tokens

Qwen
Text Generation
Qwen2.5-7B-Instruct
Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....
Total Context:
33K
Max output:
4K
Input:
$
0.05
/ M Tokens
Cached Input:
$
text
/ M Tokens
Output:
$
0.05
/ M Tokens

