State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

State-of-the-Art

AI Model Library

One API to run inference on 200+ cutting-edge AI models, and deploy in seconds

MiniMaxAI

Text Generation

MiniMax-M3

MiniMax-M3 is MiniMax’s frontier multimodal coding and agentic model, built on the MiniMax Sparse Attention (MSA) architecture. It supports up to a 1M-token context window and accepts image and video inputs. The model is designed for code generation, agentic workflows, tool use, long-context understanding, and multi-step reasoning, showing strong performance on benchmarks such as SWE-Bench Pro, Terminal-Bench 2.1, and MCP Atlas....

Total Context:

1049K

Max output:

131K

Input:

$

0.3

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.2

/ M Tokens

Nex AGI

Text Generation

Nex-N2-Pro

Nex-N2 is a family of thinking models with Agentic Thinking. They adaptively decide when and how deeply to reason, unifying agent cognition across coding, search, and tool use into a single coherent paradigm. Key Claims - SOTA among open models on SWE-Verified, SWE-Pro, Terminal Bench 2.0, Tau3, WildClawBench, BFCL V4 - Top-tier in agentic coding (end-to-end dev loops), deep search (BrowserComp, Wild Search, FinSearch), and real-world productivity (GDP Val) - Adaptive Thinking: auto-adjusts reasoning depth per step, 30-50% fewer thinking tokens vs always-on, with equal or better performance - Plug-and-play with Claude Code, Cursor, OpenClaw, and agentic harnesses...

Total Context:

262K

Max output:

256K

Input:

$

0.0

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.0

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.6

Kimi K2.6 is an open-source, native multimodal agentic model by Moonshot AI, achieving open-source state-of-the-art on benchmarks including HLE with tools, SWE-Bench Pro, and BrowseComp. Built on a MoE architecture with 1T total parameters and 32B activated, the model supports a 256K-token context window and multimodal inputs (image and video) via its MoonViT vision encoder. K2.6 is optimized for agentic workloads: it sustains 4,000+ tool calls over 12+ hours of continuous execution, scales to 300 parallel sub-agents × 4,000 steps per run to produce 100+ files from a single prompt, and supports both Thinking and Instant inference modes with function calling and multi-turn Preserve Thinking...

Total Context:

262K

Max output:

262K

Input:

$

0.77

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.6

/ M Tokens

Qwen

Text Generation

Qwen3.6-27B

Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...

Total Context:

262K

Max output:

262K

Input:

$

0.3

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

3.2

/ M Tokens

Z.ai

Text Generation

GLM-5V-Turbo

GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....

Total Context:

205K

Max output:

131K

Input:

$

1.2

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...

Total Context:

262K

Max output:

262K

Input:

$

0.39

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

2.34

/ M Tokens

Qwen

Text Generation

Qwen3.5-122B-A10B

Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

$

0.26

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

2.08

/ M Tokens

Qwen

Text Generation

Qwen3.5-35B-A3B

Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

$

0.24

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.8

/ M Tokens

Qwen

Text Generation

Qwen3.5-27B

Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

$

0.25

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

2.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-9B

Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

$

0.1

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.15

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. With a 1T-parameter MoE architecture (32B active) and 256K context length, it seamlessly integrates vision and language understanding with advanced agentic capabilities, supporting both instant and thinking modes, as well as conversational and agentic paradigms...

Total Context:

262K

Max output:

262K

Input:

$

0.45

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

2.25

/ M Tokens

Google

Text Generation

gemma-4-26B-A4B-it

Gemma 4 26B is Google DeepMind's latest open-source MoE model, built on a 26B-parameter Mixture of Experts architecture that activates only 3.8B parameters during inference for exceptionally fast token throughput. Purpose-built for advanced reasoning and agentic workflows, it ranks #6 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

$

0.12

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.4

/ M Tokens

Google

Text Generation

gemma-4-31B-it

Gemma 4 31B is Google DeepMind's latest open-source model, built on a 31B dense architecture from the same research foundation as Gemini 3. Purpose-built for advanced reasoning and agentic workflows, it ranks #3 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

$

0.13

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.4

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Instruct

Qwen3-VL is the vision-language model in the Qwen3 series, achieving state-of-the-art(SOTA)performance on various vision-language(VL)benchmarks. The model supports high-resolution image inputs up to the megapixel level and possesses strong capabilities in general visual understanding, multilingual OCR, fine-grained visual grounding, and visual dialogue. As part of the Qwen3 series, it inherits a powerful language foundation, enabling it to understand and execute complex instructions....

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

Qwen3-VL-Thinking is a version of the Qwen3-VL series specially optimized for complex visual reasoning tasks. It incorporates a "Thinking Mode" , enabling it to generate detailed intermediate reasoning steps (Chain-of-Thought) before providing a final answer. This design significantly enhances the model's performance on visual question answering (VQA) and other vision-language tasks that require multi-step logic, planning, and in-depth analysis....

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

Qwen3-VL-8B-Instruct is the vision-language model of the Qwen3 series, demonstrates strong capabilities in general visual understanding, visual-centric dialogue, and multilingual text recognition in images. ...

Total Context:

262K

Max output:

262K

Input:

$

0.18

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....

Total Context:

262K

Max output:

262K

Input:

$

0.29

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

Qwen3-VL series delivers superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions....

Total Context:

262K

Max output:

262K

Input:

$

0.29

/ M Tokens

Cached Input:

$

text

/ M Tokens

Output:

$

1.0

/ M Tokens

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?