🎉 gemma-4-12B-it доступно на SiliconFlow. Попробуйте это СЕЙЧАС.

Модели

Продукты

Цены

Документация

Блог

О

Контакт

Современный

Библиотека моделей ИИ

Один API для запуска Inference на более чем 200 передовых AI Models и развертывания за считанные секунды

Современный

Библиотека моделей ИИ

Один API для запуска Inference на более чем 200 передовых AI Models и развертывания за считанные секунды

Современный

Библиотека моделей ИИ

Один API для запуска Inference на более чем 200 передовых AI Models и развертывания за считанные секунды

All

Featured

LLM

Vision

Image

Video

Audio

Поставщики

Qwen

Text Generation

Qwen3-VL-32B-Instruct

Выпуск: 21 окт. 2025 г.

Qwen3-VL — это Vision-языковая Model в серии Qwen3, достигающая первоклассных (SOTA) результатов на различных Vision-языковых (VL) тестах. Model поддерживает высокоразрешенный Image Input до уровня мегапикселей и обладает сильными возможностями в общем визуальном понимании, многоязычном OCR, детализированном визуальном привязывании и визуальном диалоге. Являясь частью серии Qwen3, он наследует мощную языковую основу, что позволяет ему понимать и выполнять сложные инструкции....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

Выпуск: 21 окт. 2025 г.

Qwen3-VL-Thinking — это версия серии Qwen3-VL, специально оптимизированная для сложных задач визуального мышления. Она включает в себя «Режим мышления», позволяющий генерировать детализированные промежуточные шаги рассуждения (цепочка рассуждений) перед предоставлением окончательного ответа. Этот дизайн значительно улучшает производительность этого model на задачах визуальных вопросов и ответов (VQA) и других задачах vision-языка, которые требуют многослойной логики, планирования и глубокого анализа....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

Выпуск: 15 окт. 2025 г.

Qwen3-VL-8B-Instruct — это Vision-language Model серии Qwen3, демонстрирующая сильные возможности в общем визуальном понимании, визуально-центрированном диалоге и многоязычном Text-распознавании в Image....

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Input:

text

/ M Tokens

Output:

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

Выпуск: 5 окт. 2025 г.

Серия Qwen3-VL обеспечивает превосходное понимание и генерацию Text, более глубокое визуальное восприятие и рассуждение, увеличенную длину контекста, улучшенное пространственное и Video динамическое понимание, а также более сильные возможности взаимодействия агентов. Доступен в плотных и MoE архитектурах, которые масштабируются от края до облака, с изданиями Instruct и улучшенными для рассуждений Thinking....

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Input:

text

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

Выпуск: 11 окт. 2025 г.

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Input:

text

/ M Tokens

Output:

1.0

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.7-Code

Выпуск: 16 июн. 2026 г.

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6....

Total Context:

262K

Max output:

262K

Input:

0.94

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

MiniMaxAI

Text Generation

MiniMax-M3

Выпуск: 1 июн. 2026 г.

MiniMax-M3 is MiniMax’s frontier multimodal coding and agentic model, built on the MiniMax Sparse Attention (MSA) architecture. It supports up to a 1M-token context window and accepts image and video inputs. The model is designed for code generation, agentic workflows, tool use, long-context understanding, and multi-step reasoning, showing strong performance on benchmarks such as SWE-Bench Pro, Terminal-Bench 2.1, and MCP Atlas....

Total Context:

1049K

Max output:

131K

Input:

0.3

/ M Tokens

Input:

text

/ M Tokens

Output:

1.2

/ M Tokens

Nex AGI

Text Generation

Nex-N2-Pro

Выпуск: 2 июн. 2026 г.

Nex-N2 is a family of thinking models with Agentic Thinking. They adaptively decide when and how deeply to reason, unifying agent cognition across coding, search, and tool use into a single coherent paradigm. Key Claims - SOTA among open models on SWE-Verified, SWE-Pro, Terminal Bench 2.0, Tau3, WildClawBench, BFCL V4 - Top-tier in agentic coding (end-to-end dev loops), deep search (BrowserComp, Wild Search, FinSearch), and real-world productivity (GDP Val) - Adaptive Thinking: auto-adjusts reasoning depth per step, 30-50% fewer thinking tokens vs always-on, with equal or better performance - Plug-and-play with Claude Code, Cursor, OpenClaw, and agentic harnesses...

Total Context:

262K

Max output:

256K

Input:

0.5

/ M Tokens

Input:

text

/ M Tokens

Output:

2.5

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.6

Выпуск: 21 апр. 2026 г.

Kimi K2.6 is an open-source, native multimodal agentic model by Moonshot AI, achieving open-source state-of-the-art on benchmarks including HLE with tools, SWE-Bench Pro, and BrowseComp. Built on a MoE architecture with 1T total parameters and 32B activated, the model supports a 256K-token context window and multimodal inputs (image and video) via its MoonViT vision encoder. K2.6 is optimized for agentic workloads: it sustains 4,000+ tool calls over 12+ hours of continuous execution, scales to 300 parallel sub-agents × 4,000 steps per run to produce 100+ files from a single prompt, and supports both Thinking and Instant inference modes with function calling and multi-turn Preserve Thinking...

Total Context:

262K

Max output:

262K

Input:

0.77

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.6-35B-A3B

Выпуск: 9 мая 2026 г.

Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

1.6

/ M Tokens

Qwen

Text Generation

Qwen3.6-27B

Выпуск: 9 мая 2026 г.

Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...

Total Context:

262K

Max output:

262K

Input:

0.3

/ M Tokens

Input:

text

/ M Tokens

Output:

3.2

/ M Tokens

Z.ai

Text Generation

GLM-5V-Turbo

Выпуск: 30 мар. 2026 г.

GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....

Total Context:

205K

Max output:

131K

Input:

1.2

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-397B-A17B

Выпуск: 9 мая 2026 г.

Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...

Total Context:

262K

Max output:

262K

Input:

0.39

/ M Tokens

Input:

text

/ M Tokens

Output:

2.34

/ M Tokens

Qwen

Text Generation

Qwen3.5-122B-A10B

Выпуск: 9 мая 2026 г.

Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.26

/ M Tokens

Input:

text

/ M Tokens

Output:

2.08

/ M Tokens

Qwen

Text Generation

Qwen3.5-35B-A3B

Выпуск: 9 мая 2026 г.

Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.24

/ M Tokens

Input:

text

/ M Tokens

Output:

1.8

/ M Tokens

Qwen

Text Generation

Qwen3.5-27B

Выпуск: 9 мая 2026 г.

Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.25

/ M Tokens

Input:

text

/ M Tokens

Output:

2.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-9B

Выпуск: 9 мая 2026 г.

Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.1

/ M Tokens

Input:

text

/ M Tokens

Output:

0.15

/ M Tokens

Google

Text Generation

gemma-4-12B-it

Выпуск: 9 июн. 2026 г.

Gemma 4 26B is Google DeepMind's latest open-source MoE model, built on a 26B-parameter Mixture of Experts architecture that activates only 3.8B parameters during inference for exceptionally fast token throughput. Purpose-built for advanced reasoning and agentic workflows, it ranks #6 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

0.1

/ M Tokens

Input:

text

/ M Tokens

Output:

0.3

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

Выпуск: 30 янв. 2026 г.

Kimi K2.5 — это open-source, нативная Multimodal агентская Model, созданная через постоянное предобучение на приблизительно 15 триллионах смешанных визуальных и Text token поверх Kimi-K2-Base. С архитектурой MoE на 1 триллион параметров (32 миллиарда активно) и длиной контекста 256 тысяч, она безупречно интегрирует Vision и понимание языка с расширенными агентскими возможностями, поддерживая как мгновенный, так и обдуманный режимы, а также разговорные и агентские парадигмы....

Total Context:

262K

Max output:

262K

Input:

0.45

/ M Tokens

Input:

text

/ M Tokens

Output:

2.25

/ M Tokens

Google

Text Generation

gemma-4-26B-A4B-it

Выпуск: 7 апр. 2026 г.

Total Context:

262K

Max output:

262K

Input:

0.12

/ M Tokens

Input:

text

/ M Tokens

Output:

0.4

/ M Tokens

Google

Text Generation

gemma-4-31B-it

Выпуск: 7 апр. 2026 г.

Gemma 4 31B is Google DeepMind's latest open-source model, built on a 31B dense architecture from the same research foundation as Gemini 3. Purpose-built for advanced reasoning and agentic workflows, it ranks #3 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

0.13

/ M Tokens

Input:

text

/ M Tokens

Output:

0.4