State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

Qwen

Text Generation

Qwen3-VL-32B-Instruct

Release on: Oct 21, 2025

Qwen3-VL is the vision-language model in the Qwen3 series, achieving state-of-the-art(SOTA)performance on various vision-language(VL)benchmarks. The model supports high-resolution image inputs up to the megapixel level and possesses strong capabilities in general visual understanding, multilingual OCR, fine-grained visual grounding, and visual dialogue. As part of the Qwen3 series, it inherits a powerful language foundation, enabling it to understand and execute complex instructions....

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

Release on: Oct 21, 2025

Qwen3-VL-Thinking is a version of the Qwen3-VL series specially optimized for complex visual reasoning tasks. It incorporates a "Thinking Mode" , enabling it to generate detailed intermediate reasoning steps (Chain-of-Thought) before providing a final answer. This design significantly enhances the model's performance on visual question answering (VQA) and other vision-language tasks that require multi-step logic, planning, and in-depth analysis....

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Output:

$

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

Release on: Oct 15, 2025

Qwen3-VL-8B-Instruct is the vision-language model of the Qwen3 series, demonstrates strong capabilities in general visual understanding, visual-centric dialogue, and multilingual text recognition in images. ...

Total Context:

262K

Max output:

262K

Input:

$

0.18

/ M Tokens

Output:

$

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Thinking

Release on: Oct 15, 2025

Qwen3-VL-8B-Thinking is a vision-language model from the Qwen3 series, optimized for scenarios requiring complex reasoning. In this Thinking mode, the model performs step-by-step thinking and reasoning before providing the final answer....

Total Context:

262K

Max output:

262K

Input:

$

0.18

/ M Tokens

Output:

$

2.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-235B-A22B-Instruct

Release on: Oct 4, 2025

Qwen3-VL-235B-A22B-Instruct is a 235B parameters Mixture-of-Experts (MoE) vision-language model, with 22B activated parameters. It is an instruction-tuned version of Qwen3-VL-235B-A22B and is aligned for chat applications. ...

Total Context:

262K

Max output:

262K

Input:

$

0.3

/ M Tokens

Output:

$

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-235B-A22B-Thinking

Release on: Oct 4, 2025

Qwen3-VL-235B-A22B-Thinking is one of the Qwen3-VL series models, a reasoning-enhanced Thinking edition that achieves state-of-the-art (SOTA) results across many multimodal reasoning benchmarks, excelling in STEM, math, causal analysis, and logical, evidence-based answers. It features a Mixture-of-Experts (MoE) architecture with 235B total parameters and 22B active parameters. ...

Total Context:

262K

Max output:

262K

Input:

$

0.45

/ M Tokens

Output:

$

3.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

Release on: Oct 5, 2025

Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....

Total Context:

262K

Max output:

262K

Input:

$

0.29

/ M Tokens

Output:

$

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

Release on: Oct 11, 2025

Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....

Total Context:

262K

Max output:

262K

Input:

$

0.29

/ M Tokens

Output:

$

1.0

/ M Tokens

Qwen

Image-to-Video

Wan2.2-I2V-A14B

Release on: Aug 13, 2025

$

0.29

/ Video

Qwen

Text-to-Video

Wan2.2-T2V-A14B

Release on: Aug 13, 2025

$

0.29

/ Video

Qwen

Text Generation

Qwen3-Omni-30B-A3B-Captioner

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Captioner is a Vision-Language Model (VLM) from Alibaba's Qwen team, part of the Qwen3 series. It is specifically designed for generating high-quality, detailed, and accurate image captions. Based on a 30B total parameter Mixture of Experts (MoE) architecture, the model can deeply understand image content and translate it into rich, natural language text...

Total Context:

66K

Max output:

66K

Input:

$

0.1

/ M Tokens

Output:

$

0.4

/ M Tokens

Qwen

Text Generation

Qwen3-Omni-30B-A3B-Instruct

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Instruct is a member of the latest Qwen3 series from Alibaba's Qwen team. It is a Mixture of Experts (MoE) model with 30 billion total parameters and 3 billion active parameters, which effectively reduces inference costs while maintaining powerful performance. The model was trained on high-quality, multi-source, and multilingual data, demonstrating excellent performance in basic capabilities such as multilingual dialogue, as well as in code, math...

Total Context:

66K

Max output:

66K

Input:

$

0.1

/ M Tokens

Output:

$

0.4

/ M Tokens

Qwen

Text Generation

Qwen3-Omni-30B-A3B-Thinking

Release on: Oct 4, 2025

Qwen3-Omni-30B-A3B-Thinking is the core "Thinker" component within the Qwen3-Omni omni-modal model's "Thinker-Talker" architecture. It is specifically designed to process multimodal inputs, including text, audio, images, and video, and to execute complex chain-of-thought reasoning. As the reasoning brain of the system, this model unifies all inputs into a common representational space for understanding and analysis, but its output is text-only. This design allows it to excel at solving complex problems that require deep thought and cross-modal understanding, such as mathematical problems presented in images, making it key to the powerful cognitive abilities of the entire Qwen3-Omni architecture...

Total Context:

66K

Max output:

66K

Input:

$

0.1

/ M Tokens

Output:

$

0.4

/ M Tokens

Qwen

Text Generation

Qwen3-Next-80B-A3B-Thinking

Release on: Sep 25, 2025

Qwen3-Next-80B-A3B-Thinking is a next-generation foundation model from Alibaba's Qwen team, specifically designed for complex reasoning tasks. It is built on the innovative Qwen3-Next architecture, which combines a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention) with a High-Sparsity Mixture-of-Experts (MoE) structure to achieve ultimate training and inference efficiency. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters during inference, significantly reducing computational costs and delivering over 10 times higher throughput than the Qwen3-32B model on long-context tasks exceeding 32K tokens. This 'Thinking' version is optimized for demanding multi-step problems like mathematical proofs, code synthesis, logical analysis, and agentic planning, and it outputs structured 'thinking' traces by default. In terms of performance, it surpasses more costly models like Qwen3-32B-Thinking and has outperformed Gemini-2.5-Flash-Thinking on multiple benchmarks...

Total Context:

262K

Max output:

262K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Qwen

Text Generation

Qwen3-Next-80B-A3B-Instruct

Release on: Sep 18, 2025

Qwen3-Next-80B-A3B-Instruct is a next-generation foundation model released by Alibaba's Qwen team. It is built on the new Qwen3-Next architecture, designed for ultimate training and inference efficiency. The model incorporates innovative features such as a Hybrid Attention mechanism (Gated DeltaNet and Gated Attention), a High-Sparsity Mixture-of-Experts (MoE) structure, and various stability optimizations. As an 80-billion-parameter sparse model, it activates only about 3 billion parameters per token during inference, which significantly reduces computational costs and delivers over 10 times higher throughput than the Qwen3-32B model for long-context tasks exceeding 32K tokens. This is an instruction-tuned version optimized for general-purpose tasks and does not support 'thinking' mode. In terms of performance, it is comparable to Qwen's flagship model, Qwen3-235B, on certain benchmarks, showing significant advantages in ultra-long-context scenarios...

Total Context:

262K

Max output:

262K

Input:

$

0.14

/ M Tokens

Output:

$

1.4

/ M Tokens

Qwen

Text-to-Image

Qwen-Image

Release on: Sep 15, 2025

$

0.02

/ Image

Qwen

Text Generation

Qwen3-Coder-480B-A35B-Instruct

Release on: Jul 31, 2025

Qwen3-Coder-480B-A35B-Instruct is the most agentic code model released by Alibaba to date. It is a Mixture-of-Experts (MoE) model with 480 billion total parameters and 35 billion activated parameters, balancing efficiency and performance. The model natively supports a 256K (approximately 262,144) token context length, which can be extended up to 1 million tokens using extrapolation methods like YaRN, enabling it to handle repository-scale codebases and complex programming tasks. Qwen3-Coder is specifically designed for agentic coding workflows, where it not only generates code but also autonomously interacts with developer tools and environments to solve complex problems. It has achieved state-of-the-art results among open models on various coding and agentic benchmarks, with performance comparable to leading models like Claude Sonnet 4. Alongside the model, Alibaba has also open-sourced Qwen Code, a command-line tool designed to fully unleash its powerful agentic coding capabilities...

Total Context:

262K

Max output:

262K

Input:

$

0.25

/ M Tokens

Output:

$

1.0

/ M Tokens

Qwen

Image-to-Image

Qwen-Image-Edit

Release on: Sep 18, 2025

$

0.04

/ Image

Qwen

Text Generation

Qwen3-Coder-30B-A3B-Instruct

Release on: Aug 1, 2025

Qwen3-Coder-30B-A3B-Instruct is a code model from the Qwen3 series developed by Alibaba's Qwen team. As a streamlined and optimized model, it maintains impressive performance and efficiency while focusing on enhanced coding capabilities. It demonstrates significant performance advantages among open-source models on complex tasks such as Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. The model natively supports a long context of 256K tokens, which can be extended up to 1M tokens, enabling better repository-scale understanding and processing. Furthermore, it provides robust agentic coding support for platforms like Qwen Code and CLINE, featuring a specially designed function call format...

Total Context:

262K

Max output:

262K

Input:

$

0.07

/ M Tokens

Output:

$

0.28

/ M Tokens

Qwen

Text Generation

Qwen3-30B-A3B-Instruct-2507

Release on: Jul 30, 2025

Qwen3-30B-A3B-Instruct-2507 is the updated version of the Qwen3-30B-A3B non-thinking mode. It is a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion activated parameters. This version features key enhancements, including significant improvements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and offers markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Furthermore, its capabilities in long-context understanding have been enhanced to 256K. This model supports only non-thinking mode and does not generate `<think></think>` blocks in its output...

Total Context:

262K

Max output:

262K

Input:

$

0.09

/ M Tokens

Output:

$

0.3

/ M Tokens

Qwen

Text Generation

Qwen3-30B-A3B-Thinking-2507

Release on: Jul 31, 2025

Qwen3-30B-A3B-Thinking-2507 is the latest thinking model in the Qwen3 series, released by Alibaba's Qwen team. As a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion active parameters, it is focused on enhancing capabilities for complex tasks. The model demonstrates significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise. It also shows markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences. The model natively supports a 256K long-context understanding capability, which can be extended to 1 million tokens. This version is specifically designed for ‘thinking mode’ to tackle highly complex problems through step-by-step reasoning and also excels in agentic capabilities...

Total Context:

262K

Max output:

131K

Input:

$

0.09

/ M Tokens

Output:

$

0.3

/ M Tokens

Qwen

Text Generation

Qwen3-235B-A22B-Instruct-2507

Release on: Jul 23, 2025

Qwen3-235B-A22B-Instruct-2507 is a flagship Mixture-of-Experts (MoE) large language model from the Qwen3 series, developed by Alibaba Cloud's Qwen team. The model has a total of 235 billion parameters, with 22 billion activated per forward pass. It was released as an updated version of the Qwen3-235B-A22B non-thinking mode, featuring significant enhancements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. Additionally, the model provides substantial gains in long-tail knowledge coverage across multiple languages and shows markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Notably, it natively supports an extensive 256K (262,144 tokens) context window, which enhances its capabilities for long-context understanding. This version exclusively supports the non-thinking mode and does not generate <think> blocks, aiming to deliver more efficient and precise responses for tasks like direct Q&A and knowledge retrieval...

Total Context:

262K

Max output:

262K

Input:

$

0.09

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-235B-A22B-Thinking-2507

Release on: Jul 28, 2025

Qwen3-235B-A22B-Thinking-2507 is a member of the Qwen3 large language model series developed by Alibaba's Qwen team, specializing in highly complex reasoning tasks. The model is built on a Mixture-of-Experts (MoE) architecture, with 235 billion total parameters and approximately 22 billion activated parameters per token, which enhances computational efficiency while maintaining powerful performance. As a dedicated 'thinking' model, it demonstrates significantly improved performance on tasks requiring human expertise, such as logical reasoning, mathematics, science, coding, and academic benchmarks, achieving state-of-the-art results among open-source thinking models. Furthermore, the model features enhanced general capabilities like instruction following, tool usage, and text generation, and it natively supports a 256K long-context understanding capability, making it ideal for scenarios that require deep reasoning and processing of long documents...

Total Context:

262K

Max output:

262K

Input:

$

0.13

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-30B-A3B

Release on: Apr 30, 2025

Qwen3-30B-A3B is the latest large language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 30.5B total parameters and 3.3B activated parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, superior human preference alignment in creative writing, role-playing, and multi-turn dialogues. The model excels in agent capabilities for precise integration with external tools and supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.09

/ M Tokens

Output:

$

0.45

/ M Tokens

Qwen

Text Generation

Qwen3-32B

Release on: Apr 30, 2025

Qwen3-32B is the latest large language model in the Qwen series with 32.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Qwen

Text Generation

Qwen3-14B

Release on: Apr 30, 2025

Qwen3-14B is the latest large language model in the Qwen series with 14.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.07

/ M Tokens

Output:

$

0.28

/ M Tokens

Qwen

Text Generation

Qwen3-8B

Release on: Apr 30, 2025

Qwen3-8B is the latest large language model in the Qwen series with 8.2B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.06

/ M Tokens

Output:

$

0.06

/ M Tokens

Qwen

Reranker

Qwen3-Reranker-8B

Release on: Jun 6, 2025

Qwen3-Reranker-8B is the 8-billion parameter text reranking model from the Qwen3 series. It is designed to refine and improve the quality of search results by accurately re-ordering documents based on their relevance to a query. Built on the powerful Qwen3 foundational models, it excels in understanding long-text with a 32k context length and supports over 100 languages. The Qwen3-Reranker-8B model is part of a flexible series that offers state-of-the-art performance in various text and code retrieval scenarios...

$

0.04

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-8B

Release on: Jun 6, 2025

Qwen3-Embedding-8B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 8B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 4096. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It ranks No.1 on the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58) and demonstrates state-of-the-art performance across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 4096) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios...

Input:

$

0.04

/ M Tokens

Qwen

Reranker

Qwen3-Reranker-4B

Release on: Jun 6, 2025

Qwen3-Reranker-4B is a powerful text reranking model from the Qwen3 series, featuring 4 billion parameters. It is engineered to significantly improve the relevance of search results by re-ordering an initial list of documents based on a query. This model inherits the core strengths of its Qwen3 foundation, including exceptional understanding of long-text (up to 32k context length) and robust capabilities across more than 100 languages. According to benchmarks, the Qwen3-Reranker-4B model demonstrates superior performance in various text and code retrieval evaluations...

$

0.02

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-4B

Release on: Jun 6, 2025

Qwen3-Embedding-4B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 4B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 2560. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves excellent performance on the MTEB multilingual leaderboard (score 69.45) and demonstrates outstanding results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 2560) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, providing an optimal balance between efficiency and effectiveness...

Input:

$

0.02

/ M Tokens

Qwen

Reranker

Qwen3-Reranker-0.6B

Release on: Jun 6, 2025

Qwen3-Reranker-0.6B is a text reranking model from the Qwen3 series. It is specifically designed to refine the results from initial retrieval systems by re-ordering documents based on their relevance to a given query. With 0.6 billion parameters and a context length of 32k, this model leverages the strong multilingual (supporting over 100 languages), long-text understanding, and reasoning capabilities of its Qwen3 foundation. Evaluation results show that Qwen3-Reranker-0.6B achieves strong performance across various text retrieval benchmarks, including MTEB-R, CMTEB-R, and MLDR...

$

0.01

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-0.6B

Release on: Jun 6, 2025

Qwen3-Embedding-0.6B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 0.6B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 1024. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves strong performance on the MTEB multilingual leaderboard (score 64.33) and demonstrates excellent results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 1024) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, making it an ideal choice for applications prioritizing both efficiency and effectiveness...

Input:

$

0.01

/ M Tokens

Qwen

Text Generation

Qwen2.5-VL-32B-Instruct

Release on: Mar 24, 2025

Qwen2.5-VL-32B-Instruct is a multimodal large language model released by the Qwen team, part of the Qwen2.5-VL series. This model is not only proficient in recognizing common objects but is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. It acts as a visual agent that can reason and dynamically direct tools, capable of computer and phone use. Additionally, the model can accurately localize objects in images, and generate structured outputs for data like invoices and tables. Compared to its predecessor Qwen2-VL, this version has enhanced mathematical and problem-solving abilities through reinforcement learning, with response styles adjusted to better align with human preferences...

Total Context:

131K

Max output:

131K

Input:

$

0.27

/ M Tokens

Output:

$

0.27

/ M Tokens

Qwen

Text Generation

Qwen3-235B-A22B

Release on: Apr 30, 2025

Qwen3-235B-A22B is the latest large language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 235B total parameters and 22B activated parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, superior human preference alignment in creative writing, role-playing, and multi-turn dialogues. The model excels in agent capabilities for precise integration with external tools and supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.35

/ M Tokens

Output:

$

1.42

/ M Tokens

Qwen

Text Generation

QwQ-32B

Release on: Mar 6, 2025

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. The model incorporates technologies like RoPE, SwiGLU, RMSNorm, and Attention QKV bias, with 64 layers and 40 Q attention heads (8 for KV in GQA architecture)...

Total Context:

131K

Max output:

131K

Input:

$

0.15

/ M Tokens

Output:

$

0.58

/ M Tokens

Qwen

Text Generation

Qwen2.5-VL-72B-Instruct

Release on: Jan 28, 2025

Qwen2.5-VL is a vision-language model in the Qwen2.5 series that shows significant enhancements in several aspects: it has strong visual understanding capabilities, recognizing common objects while analyzing texts, charts, and layouts in images; it functions as a visual agent capable of reasoning and dynamically directing tools; it can comprehend videos over 1 hour long and capture key events; it accurately localizes objects in images by generating bounding boxes or points; and it supports structured outputs for scanned data like invoices and forms. The model demonstrates excellent performance across various benchmarks including image, video, and agent tasks...

Total Context:

131K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

Text Generation

Qwen2.5-VL-7B-Instruct

Release on: Jan 28, 2025

Qwen2.5-VL is a new member of the Qwen series, equipped with powerful visual comprehension capabilities. It can analyze text, charts, and layouts within images, understand long videos, and capture events. It is capable of reasoning, manipulating tools, supporting multi-format object localization, and generating structured outputs. The model has been optimized for dynamic resolution and frame rate training in video understanding, and has improved the efficiency of the visual encoder....

Total Context:

33K

Max output:

4K

Input:

$

0.05

/ M Tokens

Output:

$

0.05

/ M Tokens

Qwen

Text Generation

Qwen2.5-Coder-32B-Instruct

Release on: Nov 11, 2024

Qwen2.5-Coder-32B-Instruct is a code-specific large language model developed based on Qwen2.5. The model has undergone training on 5.5 trillion tokens, achieving significant improvements in code generation, code reasoning, and code repair. It is currently the most advanced open-source code language model, with coding capabilities comparable to GPT-4. Not only has the model enhanced coding abilities, but it also maintains strengths in mathematics and general capabilities, and supports long text processing....

Total Context:

33K

Max output:

4K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

Qwen

Text Generation

Qwen2.5-72B-Instruct-128K

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 72B model demonstrates significant improvements in areas such as coding and mathematics. It supports a context length of up to 128K tokens. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. It has shown notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

131K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

Text Generation

Qwen2.5-72B-Instruct

Release on: Sep 18, 2024

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

Text Generation

Qwen2.5-32B-Instruct

Release on: Sep 19, 2024

Qwen2.5-32B-Instruct is one of the latest large language models series released by Alibaba Cloud. This 32B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese, English, and others. It shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

Qwen

Text Generation

Qwen2.5-14B-Instruct

Release on: Sep 18, 2024

Qwen2.5-14B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 14B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multi-language support, covering over 29 languages, including Chinese and English. It has shown notable advancements in instruction following, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.1

/ M Tokens

Output:

$

0.1

/ M Tokens

Qwen

Text Generation

Qwen2.5-7B-Instruct

Release on: Sep 18, 2024

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....

Total Context:

33K

Max output:

4K

Input:

$

0.05

/ M Tokens

Output:

$

0.05

/ M Tokens

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?