🎉 gemma-4-12B-itはSiliconFlowで利用可能です。今すぐお試しください。

モデル

製品

価格

ドキュメント

ブログ

約

連絡

最先端

AI Model ライブラリ

1つのAPIで200以上の最先端AIモデルでInferenceを実行し、数秒でデプロイ

最先端

AI Model ライブラリ

1つのAPIで200以上の最先端AIモデルでInferenceを実行し、数秒でデプロイ

最先端

AI Model ライブラリ

1つのAPIで200以上の最先端AIモデルでInferenceを実行し、数秒でデプロイ

All

Featured

LLM

Vision

Image

Video

Audio

プロバイダー

Qwen

Text Generation

Qwen3-VL-32B-Instruct

リリース日：2025/10/21

Qwen3-VLは、Qwen3シリーズのVision-Languageモデルであり、さまざまなVision-Language(VL)ベンチマークで最先端(SOTA)のパフォーマンスを達成しています。このModelは、メガピクセルレベルまでの高解像度Image入力をサポートし、一般的な視覚理解、多言語OCR、細かな視覚的グラウンディング、視覚的対話における強力な機能を備えています。Qwen3シリーズの一部として、強力な言語基盤を継承しており、複雑な指示を理解し実行することができます。...

Total Context:

262K

Max output:

262K

Input：

0.2

/ M Tokens

Input：

text

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

リリース日：2025/10/21

Qwen3-VL-Thinkingは、複雑な視覚的推論タスクのために特別に最適化されたQwen3-VLシリーズのバージョンです。これは"Thinking Mode"を取り入れており、最終的な答えを出す前に詳細な中間推論ステップ（Chain-of-Thought）を生成できます。この設計は、マルチステップの論理、計画、および詳細な分析を必要とする視覚質問応答（VQA）やその他のビジョン-言語タスクにおいて、モデルのパフォーマンスを大幅に向上させます。...

Total Context:

262K

Max output:

262K

Input：

0.2

/ M Tokens

Input：

text

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

リリース日：2025/10/15

Qwen3-VL-8B-InstructはQwen3シリーズのVision-Imageモデルであり、一般的な視覚理解、視覚中心の対話、画像における多言語Text認識において強力な能力を示しています。...

Total Context:

262K

Max output:

262K

Input：

0.18

/ M Tokens

Input：

text

/ M Tokens

Output:

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

リリース日：2025/10/05

Qwen3-VLシリーズは、優れたTextの理解と生成、より深い視覚的知覚と推論、拡張されたコンテキスト長、強化された空間およびVideoダイナミクスの理解、より強力なエージェント相互作用の機能を提供します。エッジからクラウドまでスケールするDenseおよびMoEアーキテクチャで利用可能で、指示と推論が強化されたThinkingエディションが含まれています。...

Total Context:

262K

Max output:

262K

Input：

0.29

/ M Tokens

Input：

text

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

リリース日：2025/10/11

Total Context:

262K

Max output:

262K

Input：

0.29

/ M Tokens

Input：

text

/ M Tokens

Output:

1.0

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.7-Code

リリース日：2026/06/16

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6....

Total Context:

262K

Max output:

262K

Input：

0.94

/ M Tokens

Input：

text

/ M Tokens

Output:

4.0

/ M Tokens

MiniMaxAI

Text Generation

MiniMax-M3

リリース日：2026/06/01

MiniMax-M3 is MiniMax’s frontier multimodal coding and agentic model, built on the MiniMax Sparse Attention (MSA) architecture. It supports up to a 1M-token context window and accepts image and video inputs. The model is designed for code generation, agentic workflows, tool use, long-context understanding, and multi-step reasoning, showing strong performance on benchmarks such as SWE-Bench Pro, Terminal-Bench 2.1, and MCP Atlas....

Total Context:

1049K

Max output:

131K

Input：

0.3

/ M Tokens

Input：

text

/ M Tokens

Output:

1.2

/ M Tokens

Nex AGI

Text Generation

Nex-N2-Pro

リリース日：2026/06/02

Nex-N2 is a family of thinking models with Agentic Thinking. They adaptively decide when and how deeply to reason, unifying agent cognition across coding, search, and tool use into a single coherent paradigm. Key Claims - SOTA among open models on SWE-Verified, SWE-Pro, Terminal Bench 2.0, Tau3, WildClawBench, BFCL V4 - Top-tier in agentic coding (end-to-end dev loops), deep search (BrowserComp, Wild Search, FinSearch), and real-world productivity (GDP Val) - Adaptive Thinking: auto-adjusts reasoning depth per step, 30-50% fewer thinking tokens vs always-on, with equal or better performance - Plug-and-play with Claude Code, Cursor, OpenClaw, and agentic harnesses...

Total Context:

262K

Max output:

256K

Input：

0.5

/ M Tokens

Input：

text

/ M Tokens

Output:

2.5

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.6

リリース日：2026/04/21

Kimi K2.6 is an open-source, native multimodal agentic model by Moonshot AI, achieving open-source state-of-the-art on benchmarks including HLE with tools, SWE-Bench Pro, and BrowseComp. Built on a MoE architecture with 1T total parameters and 32B activated, the model supports a 256K-token context window and multimodal inputs (image and video) via its MoonViT vision encoder. K2.6 is optimized for agentic workloads: it sustains 4,000+ tool calls over 12+ hours of continuous execution, scales to 300 parallel sub-agents × 4,000 steps per run to produce 100+ files from a single prompt, and supports both Thinking and Instant inference modes with function calling and multi-turn Preserve Thinking...

Total Context:

262K

Max output:

262K

Input：

0.77

/ M Tokens

Input：

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.6-35B-A3B

リリース日：2026/05/09

Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...

Total Context:

262K

Max output:

262K

Input：

0.2

/ M Tokens

Input：

text

/ M Tokens

Output:

1.6

/ M Tokens

Qwen

Text Generation

Qwen3.6-27B

リリース日：2026/05/09

Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...

Total Context:

262K

Max output:

262K

Input：

0.3

/ M Tokens

Input：

text

/ M Tokens

Output:

3.2

/ M Tokens

Z.ai

Text Generation

GLM-5V-Turbo

リリース日：2026/03/30

GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....

Total Context:

205K

Max output:

131K

Input：

1.2

/ M Tokens

Input：

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-397B-A17B

リリース日：2026/05/09

Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...

Total Context:

262K

Max output:

262K

Input：

0.39

/ M Tokens

Input：

text

/ M Tokens

Output:

2.34

/ M Tokens

Qwen

Text Generation

Qwen3.5-122B-A10B

リリース日：2026/05/09

Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input：

0.26

/ M Tokens

Input：

text

/ M Tokens

Output:

2.08

/ M Tokens

Qwen

Text Generation

Qwen3.5-35B-A3B

リリース日：2026/05/09

Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input：

0.24

/ M Tokens

Input：

text

/ M Tokens

Output:

1.8

/ M Tokens

Qwen

Text Generation

Qwen3.5-27B

リリース日：2026/05/09

Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input：

0.25

/ M Tokens

Input：

text

/ M Tokens

Output:

2.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-9B

リリース日：2026/05/09

Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input：

0.1

/ M Tokens

Input：

text

/ M Tokens

Output:

0.15

/ M Tokens

Google

Text Generation

gemma-4-12B-it

リリース日：2026/06/09

Gemma 4 26B is Google DeepMind's latest open-source MoE model, built on a 26B-parameter Mixture of Experts architecture that activates only 3.8B parameters during inference for exceptionally fast token throughput. Purpose-built for advanced reasoning and agentic workflows, it ranks #6 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input：

0.1

/ M Tokens

Input：

text

/ M Tokens

Output:

0.3

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

リリース日：2026/01/30

Kimi K2.5は、Kimi-K2-Baseの上に約15兆の混合視覚およびText tokensで継続的に事前学習されたオープンソースのネイティブMultimodalなエージェントモデルです。1TパラメータMoEアーキテクチャ（32Bアクティブ）と256Kコンテキスト長を備え、Visionと言語の理解を高度なエージェント機能とシームレスに統合し、即時モードと思考モード、そして会話およびエージェントのパラダイムをサポートします。...

Total Context:

262K

Max output:

262K

Input：

0.45

/ M Tokens

Input：

text

/ M Tokens

Output:

2.25

/ M Tokens

Google

Text Generation

gemma-4-26B-A4B-it

リリース日：2026/04/07

Total Context:

262K

Max output:

262K

Input：

0.12

/ M Tokens

Input：

text

/ M Tokens

Output:

0.4

/ M Tokens

Google

Text Generation

gemma-4-31B-it

リリース日：2026/04/07

Gemma 4 31B is Google DeepMind's latest open-source model, built on a 31B dense architecture from the same research foundation as Gemini 3. Purpose-built for advanced reasoning and agentic workflows, it ranks #3 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input：

0.13

/ M Tokens

Input：

text

/ M Tokens

Output:

0.4