State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

MiniMaxAI

Text Generation

MiniMax-M2.5

MiniMax-M2.5 is MiniMax's latest large language model, extensively trained with reinforcement learning across hundreds of thousands of complex real-world environments. Built on a 229B-parameter MoE architecture, it achieves SOTA performance in coding, agentic tool use, search, and office work, scoring 80.2% on SWE-Bench Verified with 37% faster inference than M2.1...

Total Context:

197K

Max output:

131K

Input:

$

0.3

/ M Tokens

Output:

$

1.2

/ M Tokens

Z.ai

Text Generation

GLM-5

GLM-5 is a next-generation open-source model for complex systems engineering and long-horizon agentic tasks, scaled to ~744B sparse parameters (~40B active) with ~28.5T pretraining tokens. It integrates DeepSeek Sparse Attention (DSA) to retain long-context capacity while reducing inference cost, and leverages the “slime” asynchronous RL stack to deliver strong performance in reasoning, coding, and agentic benchmarks....

Total Context:

205K

Max output:

131K

Input:

$

0.3

/ M Tokens

Output:

$

2.55

/ M Tokens

StepFun

Text Generation

Step-3.5-Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model, built on a sparse Mixture of Experts (MoE) architecture with 196B total parameters and only 11B activated per token. It supports a 262K context window and achieves 100-300 tok/s generation throughput via 3-way Multi-Token Prediction (MTP-3). The model excels at coding and agentic tasks, achieving 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0...

Total Context:

262K

Max output:

66K

Input:

$

0.1

/ M Tokens

Output:

$

0.3

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. With a 1T-parameter MoE architecture (32B active) and 256K context length, it seamlessly integrates vision and language understanding with advanced agentic capabilities, supporting both instant and thinking modes, as well as conversational and agentic paradigms...

Total Context:

262K

Max output:

262K

Input:

$

0.23

/ M Tokens

Output:

$

3.0

/ M Tokens

Z.ai

Text Generation

GLM-4.7

GLM-4.7 is Zhipu’s new-generation flagship model, with 355B total parameters and 32B activated parameters, delivering comprehensive upgrades in general conversation, reasoning, and agent capabilities. Responses are more concise and natural; writing feels more immersive; tool-call instructions are followed more reliably; and the front-end polish of artifacts and agentic coding—along with long-horizon task completion efficiency—has been further improved....

Total Context:

205K

Max output:

205K

Input:

$

0.42

/ M Tokens

Output:

$

2.2

/ M Tokens

DeepSeek

Text Generation

DeepSeek-V3.2

DeepSeek-V3.2 is a model that harmonizes high computational efficiency with superior reasoning and agent performance. Its approach is built upon three key technical breakthroughs: DeepSeek Sparse Attention (DSA), an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios; a Scalable Reinforcement Learning Framework, which enables performance comparable to GPT-5 and reasoning proficiency on par with Gemini-3.0-Pro in its high-compute variant; and a Large-Scale Agentic Task Synthesis Pipeline to integrate reasoning into tool-use scenarios, improving compliance and generalization in complex interactive environments. The model has achieved gold-medal performance in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI)...

Total Context:

164K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

0.42

/ M Tokens

DeepSeek

Text Generation

DeepSeek-V3.2-Exp

DeepSeek-V3.2-Exp is an experimental version of DeepSeek model, built on V3.1-Terminus. It debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context....

Total Context:

164K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

0.41

/ M Tokens

Z.ai

Text Generation

GLM-4.6V

GLM-4.6V achieves SOTA (State-of-the-Art) accuracy in visual understanding among models of the same parameter scale. For the first time, it natively integrates Function Call capabilities into the visual model architecture, bridging the gap between "Visual Perception" and "Executable Action." This provides a unified technical foundation for multimodal Agents in real-world business scenarios. Additionally, the visual context window has been expanded to 128k, supporting long video stream processing and high-resolution multi-image analysis....

Total Context:

131K

Max output:

131K

Input:

$

0.3

/ M Tokens

Output:

$

0.9

/ M Tokens

DeepSeek

Text Generation

DeepSeek-V3.1-Terminus

DeepSeek-V3.1-Terminus is an updated version built on V3.1’s strengths while addressing key user feedback. It improves in language consistency, reducing instances of mixed Chinese-English text and occasional abnormal characters. And also upgrades in stronger Code Agent & Search Agent performance....

Total Context:

164K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

1

/ M Tokens

DeepSeek

Text Generation

DeepSeek-V3.1

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly....

Total Context:

164K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

1

/ M Tokens

DeepSeek

Text Generation

DeepSeek-V3

DeepSeek-V3-0324 demonstrates notable improvements over its predecessor, DeepSeek-V3, in several key aspects, including major boost in reasoning performance, stronger front-end development skills and smarter tool-use capabilities....

Total Context:

164K

Max output:

164K

Input:

$

0.25

/ M Tokens

Output:

$

1

/ M Tokens

DeepSeek

Text Generation

DeepSeek-R1

DeepSeek-R1-0528 is an upgraded model shows significant improvements in handling complex reasoning tasks,also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding. It achieves performance comparable to O3 and Gemini 2.5 Pro....

Total Context:

164K

Max output:

164K

Input:

$

0.5

/ M Tokens

Output:

$

2.18

/ M Tokens

Nex AGI

Text Generation

DeepSeek-V3.1-Nex-N1

DeepSeek-V3.1-Nex-N1 is a large language model developed based on leading open-source models and optimized through post-training. This optimization significantly enhances its agency, leading to outstanding performance in Agent tasks and code generation and understanding, tool usage, and role-playing. The model excels at decomposing complex tasks into multi-step plans and proactively clarifying ambiguities to ensure reliable and accurate execution....

Total Context:

131K

Max output:

164K

Input:

$

0.27

/ M Tokens

Output:

$

1

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2-Instruct-0905

Kimi K2-Instruct-0905, a state-of-the-art mixture-of-experts (MoE) language model, is the latest, most capable version of Kimi K2. Key Features include enhanced coding capabilities, esp. front-end & tool-calling, context length extended to 256k tokens, and improved integration with various agent scaffolds....

Total Context:

262K

Max output:

262K

Input:

$

0.4

/ M Tokens

Output:

$

2

/ M Tokens

OpenAI

Text Generation

gpt-oss-120b

The gpt-oss series is OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X)....

Total Context:

131K

Max output:

8K

Input:

$

0.05

/ M Tokens

Output:

$

0.45

/ M Tokens

OpenAI

Text Generation

gpt-oss-20b

The gpt-oss series is OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is for lower latency, and local or specialized use cases....

Total Context:

131K

Max output:

8K

Input:

$

0.04

/ M Tokens

Output:

$

0.18

/ M Tokens

Z.ai

Text Generation

GLM-4.6

Compared with GLM-4.5, GLM-4.6 brings several key improvements, including longer context window expanded to 200K tokens, superior coding performance, advanced reasoning, more capable agents, and refined writing....

Total Context:

205K

Max output:

205K

Input:

$

0.39

/ M Tokens

Output:

$

1.9

/ M Tokens

Z.ai

Text Generation

GLM-4.5-Air

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. It’s also a hybrid reasoning model providing both thinking and non-thinking mode. ...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.86

/ M Tokens

inclusionAI

Text Generation

Ling-flash-2.0

Ling-flash-2.0 is a language model from inclusionAI with a total of 100 billion parameters, of which 6.1 billion are activated per token (4.8 billion non-embedding). As part of the Ling 2.0 architecture series, it is designed as a lightweight yet powerful Mixture-of-Experts (MoE) model. It aims to deliver performance comparable to or even exceeding that of 40B-level dense models and other larger MoE models, but with a significantly smaller active parameter count. The model represents a strategy focused on achieving high performance and efficiency through extreme architectural design and training methods...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

inclusionAI

Text Generation

Ring-flash-2.0

Ring-flash-2.0 is a high-performance thinking model, deeply optimized based on Ling-flash-2.0-base. It is a Mixture-of-Experts (MoE) model with a total of 100B parameters, but only 6.1B are activated per inference. The model leverages the independently developed 'icepop' algorithm to address the training instability challenges in reinforcement learning (RL) for MoE LLMs, enabling continuous improvement of its complex reasoning capabilities throughout extended RL training cycles. Ring-flash-2.0 demonstrates significant breakthroughs across challenging benchmarks, including math competitions, code generation, and logical reasoning. Its performance surpasses that of SOTA dense models under 40B parameters and rivals larger open-weight MoE models and closed-source high-performance thinking model APIs. More surprisingly, although Ring-flash-2.0 is primarily designed for complex reasoning, it also shows strong capabilities in creative writing. Thanks to its efficient architecture, it achieves high-speed inference, significantly reducing inference costs for thinking models in high-concurrency scenarios...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Qwen

Text Generation

Qwen3-Coder-480B-A35B-Instruct

Qwen3-Coder-480B-A35B-Instruct is the most agentic code model released by Alibaba to date. It is a Mixture-of-Experts (MoE) model with 480 billion total parameters and 35 billion activated parameters, balancing efficiency and performance. The model natively supports a 256K (approximately 262,144) token context length, which can be extended up to 1 million tokens using extrapolation methods like YaRN, enabling it to handle repository-scale codebases and complex programming tasks. Qwen3-Coder is specifically designed for agentic coding workflows, where it not only generates code but also autonomously interacts with developer tools and environments to solve complex problems. It has achieved state-of-the-art results among open models on various coding and agentic benchmarks, with performance comparable to leading models like Claude Sonnet 4. Alongside the model, Alibaba has also open-sourced Qwen Code, a command-line tool designed to fully unleash its powerful agentic coding capabilities...

Total Context:

262K

Max output:

262K

Input:

$

0.25

/ M Tokens

Output:

$

1

/ M Tokens

Qwen

Text Generation

Qwen3-Coder-30B-A3B-Instruct

Qwen3-Coder-30B-A3B-Instruct is a code model from the Qwen3 series developed by Alibaba's Qwen team. As a streamlined and optimized model, it maintains impressive performance and efficiency while focusing on enhanced coding capabilities. It demonstrates significant performance advantages among open-source models on complex tasks such as Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. The model natively supports a long context of 256K tokens, which can be extended up to 1M tokens, enabling better repository-scale understanding and processing. Furthermore, it provides robust agentic coding support for platforms like Qwen Code and CLINE, featuring a specially designed function call format...

Total Context:

262K

Max output:

262K

Input:

$

0.07

/ M Tokens

Output:

$

0.28

/ M Tokens

Qwen

Text Generation

Qwen3-30B-A3B-Instruct-2507

Qwen3-30B-A3B-Instruct-2507 is the updated version of the Qwen3-30B-A3B non-thinking mode. It is a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion activated parameters. This version features key enhancements, including significant improvements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and offers markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Furthermore, its capabilities in long-context understanding have been enhanced to 256K. This model supports only non-thinking mode and does not generate `<think></think>` blocks in its output...

Total Context:

262K

Max output:

262K

Input:

$

0.09

/ M Tokens

Output:

$

0.3

/ M Tokens

Qwen

Text Generation

Qwen3-30B-A3B-Thinking-2507

Qwen3-30B-A3B-Thinking-2507 is the latest thinking model in the Qwen3 series, released by Alibaba's Qwen team. As a Mixture-of-Experts (MoE) model with 30.5 billion total parameters and 3.3 billion active parameters, it is focused on enhancing capabilities for complex tasks. The model demonstrates significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise. It also shows markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences. The model natively supports a 256K long-context understanding capability, which can be extended to 1 million tokens. This version is specifically designed for ‘thinking mode’ to tackle highly complex problems through step-by-step reasoning and also excels in agentic capabilities...

Total Context:

262K

Max output:

131K

Input:

$

0.09

/ M Tokens

Output:

$

0.3

/ M Tokens

Qwen

Text Generation

Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is a flagship Mixture-of-Experts (MoE) large language model from the Qwen3 series, developed by Alibaba Cloud's Qwen team. The model has a total of 235 billion parameters, with 22 billion activated per forward pass. It was released as an updated version of the Qwen3-235B-A22B non-thinking mode, featuring significant enhancements in general capabilities such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. Additionally, the model provides substantial gains in long-tail knowledge coverage across multiple languages and shows markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation. Notably, it natively supports an extensive 256K (262,144 tokens) context window, which enhances its capabilities for long-context understanding. This version exclusively supports the non-thinking mode and does not generate <think> blocks, aiming to deliver more efficient and precise responses for tasks like direct Q&A and knowledge retrieval...

Total Context:

262K

Max output:

262K

Input:

$

0.09

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is a member of the Qwen3 large language model series developed by Alibaba's Qwen team, specializing in highly complex reasoning tasks. The model is built on a Mixture-of-Experts (MoE) architecture, with 235 billion total parameters and approximately 22 billion activated parameters per token, which enhances computational efficiency while maintaining powerful performance. As a dedicated 'thinking' model, it demonstrates significantly improved performance on tasks requiring human expertise, such as logical reasoning, mathematics, science, coding, and academic benchmarks, achieving state-of-the-art results among open-source thinking models. Furthermore, the model features enhanced general capabilities like instruction following, tool usage, and text generation, and it natively supports a 256K long-context understanding capability, making it ideal for scenarios that require deep reasoning and processing of long documents...

Total Context:

262K

Max output:

262K

Input:

$

0.13

/ M Tokens

Output:

$

0.6

/ M Tokens

ByteDance

Text Generation

Seed-OSS-36B-Instruct

Seed-OSS is a series of open-source large language models developed by the ByteDance Seed team, designed for powerful long-context processing, reasoning, agent capabilities, and general-purpose abilities. Within this series, Seed-OSS-36B-Instruct is an instruction-tuned model with 36 billion parameters that natively supports an ultra-long context length, enabling it to process massive documents or complex codebases in a single pass. The model is specially optimized for reasoning, code generation, and agent tasks (such as tool use), while maintaining balanced and excellent general-purpose capabilities. A key feature of this model is the ‘Thinking Budget’ function, which allows users to flexibly adjust the reasoning length as needed, thereby effectively improving inference efficiency in practical applications...

Total Context:

262K

Max output:

262K

Input:

$

0.21

/ M Tokens

Output:

$

0.57

/ M Tokens

BAIDU

Text Generation

ERNIE-4.5-300B-A47B

ERNIE-4.5-300B-A47B is a large language model developed by Baidu based on a Mixture-of-Experts (MoE) architecture. The model has a total of 300 billion parameters, but only activates 47 billion parameters per token during inference, thus balancing powerful performance with computational efficiency. As one of the core models in the ERNIE 4.5 series, it is trained on the PaddlePaddle deep learning framework and demonstrates outstanding capabilities in tasks such as text understanding, generation, reasoning, and coding. The model utilizes an innovative multimodal heterogeneous MoE pre-training method, which effectively enhances its overall abilities through joint training on text and visual modalities, showing prominent results in instruction following and world knowledge memorization. Baidu has open-sourced this model along with others in the series to promote the research and application of AI technology...

Total Context:

131K

Max output:

131K

Input:

$

0.28

/ M Tokens

Output:

$

1.1

/ M Tokens

Tencent

Text Generation

Hunyuan-A13B-Instruct

Hunyuan-A13B-Instruct activates only 13 B of its 80 B parameters, yet matches much larger LLMs on mainstream benchmarks. It offers hybrid reasoning: low-latency “fast” mode or high-precision “slow” mode, switchable per call. Native 256 K-token context lets it digest book-length documents without degradation. Agent skills are tuned for BFCL-v3, τ-Bench and C3-Bench leadership, making it an excellent autonomous assistant backbone. Grouped Query Attention plus multi-format quantization delivers memory-light, GPU-efficient inference for real-world deployment, with built-in multilingual support and robust safety alignment for enterprise-grade applications....

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2-Instruct

Kimi K2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models...

Total Context:

131K

Max output:

131K

Input:

$

0.58

/ M Tokens

Output:

$

2.29

/ M Tokens

Qwen

Text Generation

Qwen3-32B

Qwen3-32B is the latest large language model in the Qwen series with 32.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.14

/ M Tokens

Output:

$

0.57

/ M Tokens

Qwen

Text Generation

Qwen3-14B

Qwen3-14B is the latest large language model in the Qwen series with 14.8B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.07

/ M Tokens

Output:

$

0.28

/ M Tokens

Qwen

Text Generation

Qwen3-8B

Qwen3-8B is the latest large language model in the Qwen series with 8.2B parameters. This model uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue). It demonstrates significantly enhanced reasoning capabilities, surpassing previous QwQ and Qwen2.5 instruct models in mathematics, code generation, and commonsense logical reasoning. The model excels in human preference alignment for creative writing, role-playing, and multi-turn dialogues. Additionally, it supports over 100 languages and dialects with strong multilingual instruction following and translation capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.06

/ M Tokens

Output:

$

0.06

/ M Tokens

Qwen

Reranker

Qwen3-Reranker-8B

Qwen3-Reranker-8B is the 8-billion parameter text reranking model from the Qwen3 series. It is designed to refine and improve the quality of search results by accurately re-ordering documents based on their relevance to a query. Built on the powerful Qwen3 foundational models, it excels in understanding long-text with a 32k context length and supports over 100 languages. The Qwen3-Reranker-8B model is part of a flexible series that offers state-of-the-art performance in various text and code retrieval scenarios...

$

0.04

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-8B

Qwen3-Embedding-8B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 8B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 4096. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It ranks No.1 on the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58) and demonstrates state-of-the-art performance across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 4096) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios...

Input:

$

0.04

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-4B

Qwen3-Embedding-4B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 4B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 2560. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves excellent performance on the MTEB multilingual leaderboard (score 69.45) and demonstrates outstanding results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 2560) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, providing an optimal balance between efficiency and effectiveness...

Input:

$

0.02

/ M Tokens

Qwen

Reranker

Qwen3-Reranker-0.6B

Qwen3-Reranker-0.6B is a text reranking model from the Qwen3 series. It is specifically designed to refine the results from initial retrieval systems by re-ordering documents based on their relevance to a given query. With 0.6 billion parameters and a context length of 32k, this model leverages the strong multilingual (supporting over 100 languages), long-text understanding, and reasoning capabilities of its Qwen3 foundation. Evaluation results show that Qwen3-Reranker-0.6B achieves strong performance across various text retrieval benchmarks, including MTEB-R, CMTEB-R, and MLDR...

$

0.01

/ M Tokens

Qwen

Embedding

Qwen3-Embedding-0.6B

Qwen3-Embedding-0.6B is the latest proprietary model in the Qwen3 Embedding series, specifically designed for text embedding and ranking tasks. Built upon the dense foundational models of the Qwen3 series, this 0.6B parameter model supports context lengths up to 32K and can generate embeddings with dimensions up to 1024. The model inherits exceptional multilingual capabilities supporting over 100 languages, along with long-text understanding and reasoning skills. It achieves strong performance on the MTEB multilingual leaderboard (score 64.33) and demonstrates excellent results across various tasks including text retrieval, code retrieval, text classification, clustering, and bitext mining. The model offers flexible vector dimensions (32 to 1024) and instruction-aware capabilities for enhanced performance in specific tasks and scenarios, making it an ideal choice for applications prioritizing both efficiency and effectiveness...

Input:

$

0.01

/ M Tokens

Qwen

Text Generation

QwQ-32B

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini. The model incorporates technologies like RoPE, SwiGLU, RMSNorm, and Attention QKV bias, with 64 layers and 40 Q attention heads (8 for KV in GQA architecture)...

Total Context:

131K

Max output:

131K

Input:

$

0.15

/ M Tokens

Output:

$

0.58

/ M Tokens

DeepSeek

Text Generation

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B is a distilled model based on Qwen2.5-32B. The model was fine-tuned using 800k curated samples generated by DeepSeek-R1 and demonstrates exceptional performance across mathematics, programming, and reasoning tasks. It achieved impressive results in various benchmarks including AIME 2024, MATH-500, and GPQA Diamond, with a notable 94.3% accuracy on MATH-500, showcasing its strong mathematical reasoning capabilities...

Total Context:

131K

Max output:

131K

Input:

$

0.18

/ M Tokens

Output:

$

0.18

/ M Tokens

Qwen

Text Generation

Qwen2.5-72B-Instruct

Qwen2.5-72B-Instruct is one of the latest large language model series released by Alibaba Cloud. The 72B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese and English. It shows notable enhancements in following instructions, understanding structured data, and generating structured outputs, particularly in JSON format....

Total Context:

33K

Max output:

4K

Input:

$

0.59

/ M Tokens

Output:

$

0.59

/ M Tokens

Qwen

Text Generation

Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct is one of the latest large language model series released by Alibaba Cloud. This 7B model demonstrates significant improvements in areas such as coding and mathematics. The model also offers multilingual support, covering over 29 languages, including Chinese, English, and others. The model shows notable enhancements in instruction following, understanding structured data, and generating structured outputs, particularly JSON....

Total Context:

33K

Max output:

4K

Input:

$

0.05

/ M Tokens

Output:

$

0.05

/ M Tokens

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?