🎉 gemma-4-12B-it 는 SiliconFlow에서 가능합니다. 지금 시도해 보세요.

모델

제품

가격

문서

블로그

에 대하여

연락하다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

All

Featured

LLM

Vision

Image

Video

Audio

제공자

Qwen

Text Generation

Qwen3-VL-32B-Instruct

출시일: 2025. 10. 21.

Qwen3-VL은 Qwen3 시리즈의 vision-language Model로, 다양한 vision-language(VL) 벤치마크에서 state-of-the-art(SOTA) 성능을 달성하고 있습니다. 이 Model은 최대 메가픽셀 수준의 고해상도 Image Input을 지원하며, 일반적인 시각적 이해, 다국어 OCR, 세밀한 시각적 기준 설정 및 시각적 대화에서 강력한 기능을 가지고 있습니다. Qwen3 시리즈의 일환으로서, 이는 강력한 언어 기반을 물려받아 복잡한 지시를 이해하고 실행할 수 있습니다....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

출시일: 2025. 10. 21.

Qwen3-VL-Thinking은 복잡한 시각적 추론 작업에 특별히 최적화된 Qwen3-VL 시리즈의 한 버전입니다. '생각 모드'를 통합하여 최종 답변을 제공하기 전에 상세한 중간 추론 단계(Chain-of-Thought)를 생성할 수 있습니다. 이 설계는 시각 질문 응답(VQA) 및 다단계 논리, 계획 및 심층 분석이 필요한 기타 Vision-언어 작업에서 Model의 성능을 크게 향상시킵니다....

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

출시일: 2025. 10. 15.

Qwen3-VL-8B-Instruct는 Qwen3 시리즈의 Vision-언어 Model로서, 일반적인 시각 이해, 시각 중심 대화 및 이미지 내 다국어 Text 인식에서 강력한 능력을 보여줍니다....

Total Context:

262K

Max output:

262K

Input:

0.18

/ M Tokens

Input:

text

/ M Tokens

Output:

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

출시일: 2025. 10. 5.

Qwen3-VL 시리즈는 우수한 Text 이해 및 생성, 더 깊은 시각적 인식 및 추론, 확장된 문맥 길이, 향상된 공간 및 Video 역학 이해, 더 강력한 에이전트 상호작용 능력을 제공합니다. 엣지에서 클라우드로 확장되는 Dense 및 MoE 아키텍처에서 사용할 수 있으며, Instruct 및 추론 강화 Thinking 에디션으로 제공됩니다....

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Input:

text

/ M Tokens

Output:

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

출시일: 2025. 10. 11.

Total Context:

262K

Max output:

262K

Input:

0.29

/ M Tokens

Input:

text

/ M Tokens

Output:

1.0

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.7-Code

출시일: 2026. 6. 16.

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6....

Total Context:

262K

Max output:

262K

Input:

0.94

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

MiniMaxAI

Text Generation

MiniMax-M3

출시일: 2026. 6. 1.

MiniMax-M3 is MiniMax’s frontier multimodal coding and agentic model, built on the MiniMax Sparse Attention (MSA) architecture. It supports up to a 1M-token context window and accepts image and video inputs. The model is designed for code generation, agentic workflows, tool use, long-context understanding, and multi-step reasoning, showing strong performance on benchmarks such as SWE-Bench Pro, Terminal-Bench 2.1, and MCP Atlas....

Total Context:

1049K

Max output:

131K

Input:

0.3

/ M Tokens

Input:

text

/ M Tokens

Output:

1.2

/ M Tokens

Nex AGI

Text Generation

Nex-N2-Pro

출시일: 2026. 6. 2.

Nex-N2 is a family of thinking models with Agentic Thinking. They adaptively decide when and how deeply to reason, unifying agent cognition across coding, search, and tool use into a single coherent paradigm. Key Claims - SOTA among open models on SWE-Verified, SWE-Pro, Terminal Bench 2.0, Tau3, WildClawBench, BFCL V4 - Top-tier in agentic coding (end-to-end dev loops), deep search (BrowserComp, Wild Search, FinSearch), and real-world productivity (GDP Val) - Adaptive Thinking: auto-adjusts reasoning depth per step, 30-50% fewer thinking tokens vs always-on, with equal or better performance - Plug-and-play with Claude Code, Cursor, OpenClaw, and agentic harnesses...

Total Context:

262K

Max output:

256K

Input:

0.5

/ M Tokens

Input:

text

/ M Tokens

Output:

2.5

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.6

출시일: 2026. 4. 21.

Kimi K2.6 is an open-source, native multimodal agentic model by Moonshot AI, achieving open-source state-of-the-art on benchmarks including HLE with tools, SWE-Bench Pro, and BrowseComp. Built on a MoE architecture with 1T total parameters and 32B activated, the model supports a 256K-token context window and multimodal inputs (image and video) via its MoonViT vision encoder. K2.6 is optimized for agentic workloads: it sustains 4,000+ tool calls over 12+ hours of continuous execution, scales to 300 parallel sub-agents × 4,000 steps per run to produce 100+ files from a single prompt, and supports both Thinking and Instant inference modes with function calling and multi-turn Preserve Thinking...

Total Context:

262K

Max output:

262K

Input:

0.77

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.6-35B-A3B

출시일: 2026. 5. 9.

Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...

Total Context:

262K

Max output:

262K

Input:

0.2

/ M Tokens

Input:

text

/ M Tokens

Output:

1.6

/ M Tokens

Qwen

Text Generation

Qwen3.6-27B

출시일: 2026. 5. 9.

Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...

Total Context:

262K

Max output:

262K

Input:

0.3

/ M Tokens

Input:

text

/ M Tokens

Output:

3.2

/ M Tokens

Z.ai

Text Generation

GLM-5V-Turbo

출시일: 2026. 3. 30.

GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....

Total Context:

205K

Max output:

131K

Input:

1.2

/ M Tokens

Input:

text

/ M Tokens

Output:

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-397B-A17B

출시일: 2026. 5. 9.

Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...

Total Context:

262K

Max output:

262K

Input:

0.39

/ M Tokens

Input:

text

/ M Tokens

Output:

2.34

/ M Tokens

Qwen

Text Generation

Qwen3.5-122B-A10B

출시일: 2026. 5. 9.

Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.26

/ M Tokens

Input:

text

/ M Tokens

Output:

2.08

/ M Tokens

Qwen

Text Generation

Qwen3.5-35B-A3B

출시일: 2026. 5. 9.

Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.24

/ M Tokens

Input:

text

/ M Tokens

Output:

1.8

/ M Tokens

Qwen

Text Generation

Qwen3.5-27B

출시일: 2026. 5. 9.

Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.25

/ M Tokens

Input:

text

/ M Tokens

Output:

2.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-9B

출시일: 2026. 5. 9.

Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

Total Context:

262K

Max output:

262K

Input:

0.1

/ M Tokens

Input:

text

/ M Tokens

Output:

0.15

/ M Tokens

Google

Text Generation

gemma-4-12B-it

출시일: 2026. 6. 9.

Gemma 4 26B is Google DeepMind's latest open-source MoE model, built on a 26B-parameter Mixture of Experts architecture that activates only 3.8B parameters during inference for exceptionally fast token throughput. Purpose-built for advanced reasoning and agentic workflows, it ranks #6 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

0.1

/ M Tokens

Input:

text

/ M Tokens

Output:

0.3

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

출시일: 2026. 1. 30.

Kimi K2.5는 오픈 소스, 네이티브 Multimodal 에이전틱 Model로, Kimi-K2-Base 위에 약 15조 개의 혼합된 시각 및 Text token 을 지속적으로 사전 학습하여 구축되었습니다. 1T-파라미터 MoE 아키텍처(32B 활성)와 256K 컨텍스트 길이를 가지고 Vision과 언어 이해를 원활하게 통합하며, 고급 에이전틱 기능을 제공하여 인스턴트 및 사고 모드, 대화 및 에이전틱 패러다임을 모두 지원합니다....

Total Context:

262K

Max output:

262K

Input:

0.45

/ M Tokens

Input:

text

/ M Tokens

Output:

2.25

/ M Tokens

Google

Text Generation

gemma-4-26B-A4B-it

출시일: 2026. 4. 7.

Total Context:

262K

Max output:

262K

Input:

0.12

/ M Tokens

Input:

text

/ M Tokens

Output:

0.4

/ M Tokens

Google

Text Generation

gemma-4-31B-it

출시일: 2026. 4. 7.

Gemma 4 31B is Google DeepMind's latest open-source model, built on a 31B dense architecture from the same research foundation as Gemini 3. Purpose-built for advanced reasoning and agentic workflows, it ranks #3 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

Total Context:

262K

Max output:

262K

Input:

0.13

/ M Tokens

Input:

text

/ M Tokens

Output:

0.4