Kimi K2 Thinking Now on SiliconFlow: Thinking Agent That Reasons and Acts

Nov 17, 2025

TL;DR: Kimi K2 Thinking is now available on SiliconFlow, Moonshot AI's latest and most advanced open-source thinking model. Designed as a reasoning agent, it thinks step by step and can execute up to 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. It excels in reasoning, agentic search, coding, writing and general capabilities. Get started with Kimi K2 Thinking on SiliconFlow with OpenAI/Anthropic-compatible APIs for seamless integration into your agents and workflows.

We're excited to welcome Kimi K2 Thinking, Moonshot AI's most advanced open-source thinking model now available on SiliconFlow. Unlike traditional reasoning models that only think, it reasons and acts, autonomously chaining up to 300 tool calls — search, code, data tools — to solve complex problems end-to-end. This marks Moonshot's breakthrough in test-time scaling: simultaneously extending both reasoning depth and agentic capabilities to unlock new levels of problem-solving power.

With SiliconFlow's Kimi K2 Thinking API, you can expect:

Budget-friendly Pricing: Kimi K2 Thinking $1.1/M tokens (input) and $4.5/M tokens (output).
262K Context Window: Perfect for long documents, complex reasoning, and extended agentic tasks.
Outperforms GPT-5 & Claude Sonnet 4.5: across key reasoning, coding, and agent benchmarks.

Whether you're building reasoning agents, coding copilots, or research assistants, Kimi K2 Thinking is now accessible through SiliconFlow's OpenAI/Anthropic-compatible API — ready to plug into your existing workflows.

Key Features

The Kimi K2 Thinking now available on SiliconFlow features the following key capabilities:

Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift. For example, when building interactive visual simulations, it coordinates reasoning with tool calls to convert high-level instructions into runnable code — greatly improving automation and reliability in complex development tasks.

Production-Ready Speed: Native INT4 quantization achieves 2x inference speed with no quality loss — important when you're running tasks that involve hundreds of operations.
Reliable Over Long Sessions: Handles 200-300 sequential consecutive actions through adaptive reasoning cycles: Plan → Reason → Execute → Adapt → Refine. Unlike typical models that lose focus after 30-50 steps, it decomposes complex problems into clear subtasks and completes end-to-end workflows.
Strong General Writing: Handles creative, analytical, and personalized writing with coherent logic, vivid detail, and empathetic tone — adapting smoothly across styles without losing quality.

Benchmark Performance

Kimi K2 Thinking sets new records across benchmarks assessing reasoning, coding, and agent capabilities, outperforming leading models like GPT-5 and Claude Sonnet 4.5:

Agentic Reasoning: Achieves 44.9% on HLE, a rigorous benchmark of thousands of expert-level questions across 100+ subjects.
Agentic Coding: Scores 71.3% on SWE-Bench Verified and 61.1% on SWE-Multilingual, showcasing strong generalization across programming languages and agent scaffolds. Also delivers notable improvements on HTML, React, and component-intensive front-end tasks.
Agentic Search and Browsing: Reaches 60.2% on BrowseComp, double the human baseline of 29.2%.

Benchmark	Kimi K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5 (Thinking)
Advanced Reasoning
Humanity's Last Exam (Text-only with tools results)	🥇44.9%	41.7%	32.0%
Agentic Web Browsing
BrowseComp	🥇60.2%	54.9%	24.1%
Complex Info Search Reasoning
SEAL-0	🥇56.3%	51.4%	53.4%
Agentic Coding
SWE-Multilingual	61.1%	55.3%	68.0%
SWE-bench Verified	71.3%	74.9%	77.2%
Competitive Programming
LiveCodeBench V6	83.1%	87.0%	64.0%

Developer-Ready Integration

Beyond Kimi K2 Thinking's industry-leading performance, SiliconFlow delivers instant compatibility with your existing development ecosystem:

OpenAI-Compatible Tools: Seamless integration with Cline, Qwen Code, Gen-CLI, and other standard development environments—just plug in your SiliconFlow API key.
Anthropic-Compatible API: Works with Claude Code and any Anthropic-compatible tools for code reviews, debugging, and architectural refactoring.
Platform Integrations: Ready-to-use in Dify, ChatHub, Chatbox, Sider, MindSearch, DB-GPT, and also available through OpenRouter.

With powerful models, seamless integrations, and cost-effective pricing, SiliconFlow transforms how you build—letting you ship faster and scale smarter.

Get Started Immediately

Explore: Try Kimi K2 Thinking in the SiliconFlow Playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
        {
            "role": "user",
            "content": "Please provide information about a person in the following JSON format: {   \"name\": \"string\",   \"age\": \"number\",   \"occupation\": \"string\",   \"hobbies\": [\"string\"] }  Generate a realistic example."
        }
    ],
    "max_tokens": 4096,
    "stop": "1",
    "temperature": 0.7,
    "response_format": {"type": "json_object"}
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
        {
            "role": "user",
            "content": "Please provide information about a person in the following JSON format: {   \"name\": \"string\",   \"age\": \"number\",   \"occupation\": \"string\",   \"hobbies\": [\"string\"] }  Generate a realistic example."
        }
    ],
    "max_tokens": 4096,
    "stop": "1",
    "temperature": 0.7,
    "response_format": {"type": "json_object"}
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "moonshotai/Kimi-K2-Thinking",
    "messages": [
        {
            "role": "user",
            "content": "Please provide information about a person in the following JSON format: {   \"name\": \"string\",   \"age\": \"number\",   \"occupation\": \"string\",   \"hobbies\": [\"string\"] }  Generate a realistic example."
        }
    ],
    "max_tokens": 4096,
    "stop": "1",
    "temperature": 0.7,
    "response_format": {"type": "json_object"}
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)