尖端技术

AI 模型库

一个API可以对200多个尖端AI模型进行推理,并在几秒钟内部署

尖端技术

AI 模型库

一个API可以对200多个尖端AI模型进行推理,并在几秒钟内部署

尖端技术

AI 模型库

一个API可以对200多个尖端AI模型进行推理,并在几秒钟内部署

Moonshot AI

Text Generation

Kimi-K2.6

Kimi K2.6 is an open-source, native multimodal agentic model by Moonshot AI, achieving open-source state-of-the-art on benchmarks including HLE with tools, SWE-Bench Pro, and BrowseComp. Built on a MoE architecture with 1T total parameters and 32B activated, the model supports a 256K-token context window and multimodal inputs (image and video) via its MoonViT vision encoder. K2.6 is optimized for agentic workloads: it sustains 4,000+ tool calls over 12+ hours of continuous execution, scales to 300 parallel sub-agents × 4,000 steps per run to produce 100+ files from a single prompt, and supports both Thinking and Instant inference modes with function calling and multi-turn Preserve Thinking...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.9

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is a large language model from Alibaba's Qwen3.6 series, featuring a Mixture of Experts (MoE) architecture with 35 billion total parameters and approximately 3 billion active parameters per inference, delivering strong performance with efficient compute utilization. The model supports both thinking and non-thinking modes, offering flexible switching between rapid response and deep reasoning...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.2

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

1.6

/ M Tokens

Qwen

Text Generation

Qwen3.6-27B

Qwen3.6-27B is the first open-weight small-to-mid-sized dense model in the Qwen3.6 series, with targeted improvements for code generation, agent workflows, and real-world development tasks. Compared with Qwen3.5-27B, it delivers clear gains in frontend development, repository-level reasoning, tool use, and complex problem solving, while adding support for preserving reasoning context across turns to reduce redundant reasoning in iterative workflows. It also supports vision understanding with a native context length of 262,144 tokens...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.3

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

3.2

/ M Tokens

Z.ai

Text Generation

GLM-5V-Turbo

GLM-5V-Turbo is Zhipu’s latest flagship multimodal foundation model, optimized for multimodal coding and agent capabilities. It supports up to 200K tokens of image, video, and text context, and, when integrated with frameworks such as Claude Code and OpenClaw, can handle complex long-horizon programming and assistant tasks....

上下文长度:

205K

最大输出长度:

131K

Input:

$

1.2

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

4.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B activated parameters. It natively supports 256K context length, extensible to approximately 1M tokens, with support for 201 languages, unified vision-language understanding, tool calling, and reasoning (thinking) mode...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.39

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

2.34

/ M Tokens

Qwen

Text Generation

Qwen3.5-122B-A10B

Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team, with 122B total parameters and only 10B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 256K context length extensible up to ~1M tokens. Through early fusion training, it achieves unified vision-language capabilities supporting text, image, and video understanding, with strong performance across knowledge, reasoning, coding, agents, visual understanding, and multilingual benchmarks, surpassing GPT-5-mini and Qwen3-235B-A22B on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.26

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

2.08

/ M Tokens

Qwen

Text Generation

Qwen3.5-35B-A3B

Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team, with 35B total parameters and only 3B activated. It features an efficient hybrid architecture combining Gated Delta Networks with sparse Mixture-of-Experts (MoE), natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.24

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

1.8

/ M Tokens

Qwen

Text Generation

Qwen3.5-27B

Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 256K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding with strong performance across reasoning, coding, agents, and visual understanding benchmarks, surpassing Qwen3-235B-A22B and GPT-5-mini on multiple metrics. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.25

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

2.0

/ M Tokens

Qwen

Text Generation

Qwen3.5-9B

Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight dense model in the Qwen3.5 series, it features an efficient hybrid architecture combining Gated Delta Networks with Gated Attention, natively supporting a 262K context length extensible up to ~1M tokens. The model achieves unified vision-language capabilities through early fusion training, supporting text, image, and video understanding. It defaults to thinking mode, supports tool calling, and covers 201 languages and dialects...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.1

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.15

/ M Tokens

Moonshot AI

Text Generation

Kimi-K2.5

Kimi K2.5 是一种开源、原生 Multimodal 主动模型,通过在 Kimi-K2-Base 上进行大约 15 万亿混合视觉和 Text tokens 的持续预训练构建而成。凭借 1T 参数 MoE 架构(32B 活跃)和 256K 上下文长度,它无缝集成了 Vision 和语言理解与先进的主动功能,支持即时和思考模式,以及对话和主动范式。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.45

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

2.25

/ M Tokens

Google

Text Generation

gemma-4-26B-A4B-it

Gemma 4 26B is Google DeepMind's latest open-source MoE model, built on a 26B-parameter Mixture of Experts architecture that activates only 3.8B parameters during inference for exceptionally fast token throughput. Purpose-built for advanced reasoning and agentic workflows, it ranks #6 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.12

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.4

/ M Tokens

Google

Text Generation

gemma-4-31B-it

Gemma 4 31B is Google DeepMind's latest open-source model, built on a 31B dense architecture from the same research foundation as Gemini 3. Purpose-built for advanced reasoning and agentic workflows, it ranks #3 among all open models on the Arena AI leaderboard — outperforming models up to 20x its size — with native function-calling, 256K context, and full Apache 2.0 licensing....

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.13

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.4

/ M Tokens

Z.ai

Text Generation

GLM-4.6V

GLM-4.6V在相同参数规模的模型中,在视觉理解方面实现了SOTA(State-of-the-Art)准确性。首次将功能调用能力原生集成到视觉模型架构中,弥合了“视觉感知”和“可执行动作”之间的差距。这为真实商业场景中的Multimodal代理提供了统一的技术基础。此外,视觉上下文窗口已扩展到128k,支持长视频流处理和高分辨率多Image分析。...

上下文长度:

131K

最大输出长度:

131K

Input:

$

0.3

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.9

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Instruct

Qwen3-VL 是 Qwen3 系列中的 Vision-语言 模型,在各种 Vision-语言 (VL) 基准测试中实现了最先进的性能(SOTA)。该 模型 支持高分辨率 Image Input,最高可达百万像素级别,并拥有在一般视觉理解、多语言 OCR、细粒度视觉对齐和视觉对话方面的强大能力。作为 Qwen3 系列的一部分,它继承了强大的语言基础,使其能够理解和执行复杂的指令。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.2

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.6

/ M Tokens

Qwen

Text Generation

Qwen3-VL-32B-Thinking

Qwen3-VL-Thinking 是 Qwen3-VL 系列的一个版本,专为复杂的视觉推理任务进行了优化。它引入了“思考模式”,使其在提供最终答案之前能够生成详细的中间推理步骤(思维链条)。这种设计显著提升了模型在视觉问题回答(VQA)和其他需要多步逻辑、规划和深入分析的 Vision-语言任务上的表现。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.2

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

1.5

/ M Tokens

Qwen

Text Generation

Qwen3-VL-8B-Instruct

Qwen3-VL-8B-Instruct 是 Qwen3 系列的 Vision-语言模型,展示了在通用视觉理解、以视觉为中心的对话以及图像中多语言 Text 识别方面的强大能力。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.18

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

0.68

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Instruct

Qwen3-VL系列提供卓越的Text理解与生成、更深入的视觉感知与推理、扩展的上下文长度、增强的空间和Video动态理解能力,以及更强的代理互动能力。可用的Dense和MoE架构从边缘到云端扩展,还有指导和推理增强的Thinking版本。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.29

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

1.0

/ M Tokens

Qwen

Text Generation

Qwen3-VL-30B-A3B-Thinking

Qwen3-VL系列提供卓越的Text理解与生成、更深入的视觉感知与推理、扩展的上下文长度、增强的空间和Video动态理解能力,以及更强的代理互动能力。可用的Dense和MoE架构从边缘到云端扩展,还有指导和推理增强的Thinking版本。...

上下文长度:

262K

最大输出长度:

262K

Input:

$

0.29

/ M Tokens

Input:

$

text

/ M Tokens

Output:

$

1.0

/ M Tokens

准备好 加速您的人工智能开发吗?

准备好 加速您的人工智能开发吗?

准备好 加速您的人工智能开发吗?