🎉 LongCat-2.0 는 SiliconFlow에서 가능합니다. 지금 시도해 보세요.

모델

제품

가격

문서

블로그

에 대하여

연락하다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

AI 모델 라이브러리

하나의 API로 200개 이상의 최첨단 AI Models에서 Inference를 실행하고 몇 초 만에 배포할 수 있습니다

All

Featured

LLM

Vision

Image

Video

Audio

Serverless

Tencent

Tencent

Text Generation

Hunyuan-A13B-Instruct

출시일: 2025. 6. 30.

Hunyuan-A13B-Instruct는 80B 매개변수 중 단 13B만 활성화하면서도 주류 벤치마크에서 더 큰 LLM과 맞먹습니다. 이는 하이브리드 추론을 제공합니다: 저지연 “빠른” 모드 또는 고정밀 “느린” 모드, 호출당 전환 가능합니다. 네이티브 256 K-token 컨텍스트는 책 길이의 문서를 열화 없이 소화할 수 있게 해줍니다. 에이전트 기술은 BFCL-v3, τ-Bench 및 C3-Bench 리더십에 맞춰 조정되어 있으며, 이를 훌륭한 자율형 어시스턴트 백본으로 만듭니다. 그룹화된 쿼리 주의력 및 다형식 양자화는 메모리 부담이 적고 GPU 효율적인 Inference를 위해 실사용 배포 시 지원하며, 내장된 다국어 지원과 견고한 안전 정렬로 기업급 애플리케이션에 적합합니다....

Total Context:

131K

Max output:

131K

Input:

0.14

/ M Tokens

Input:

text

/ M Tokens

Output:

0.57

/ M Tokens

Tencent

Text Generation

Hy3

출시일: 2026. 6. 26.

Built for real-world business scenarios, Hy3 features a 295B/21B active MoE architecture, native 256K context support, and three reasoning modes. It enhances coding, long-form comprehension, multi-turn dialogue, and agentic task execution, balancing reliability, efficiency, and cost across both high-frequency interactions and complex workflows....

Total Context:

262K

Max output:

262K

Input:

0.0

/ M Tokens

Input:

text

/ M Tokens

Output:

0.0

/ M Tokens

Tencent

Text Generation

Hy3-preview

출시일: 2026. 4. 7.

Hy3 preview is a 295B-parameter Mixture-of-Experts (MoE) language model from Tencent Hunyuan, built for production-grade agent workloads. With only 21B parameters activated per token and native 256K context support, it handles complex tasks like cross-file code refactoring, long-document analysis, and multi-step tool use, rather than just generating fluent dialogue. Hy3 scores near state-of-the-art on SWE-bench Verified and advanced STEM benchmarks, while offering three inference modes (no_think, think_low, think_high) to dynamically trade off latency and reasoning depth. Its sparse activation architecture delivers competitive intelligence at a significantly lower token cost....

Total Context:

262K

Max output:

262K

Input:

0.066

/ M Tokens

Input:

text

/ M Tokens

Output:

0.26