Models

Products

Pricing

Docs

Blog

About

Contact

🎉 gemma-4-12B-it is available on SiliconFlow. Try it NOW.

🎉 gemma-4-12B-it is available on SiliconFlow. Try it NOW.

Models

FunAudioLLM/CosyVoice2-0.5B

FunAudioLLM/CosyVoice2-0.5B

FunAudioLLM/CosyVoice2-0.5B

API Reference

About FunAudioLLM/CosyVoice2-0.5B

CosyVoice 2 is a streaming speech synthesis model based on a large language model, employing a unified streaming/non-streaming framework design. The model enhances the utilization of the speech token codebook through finite scalar quantization (FSQ), simplifies the text-to-speech language model architecture, and develops a chunk-aware causal streaming matching model that supports different synthesis scenarios. In streaming mode, the model achieves ultra-low latency of 150ms while maintaining synthesis quality almost identical to that of non-streaming mode. Compared to version 1.0, the pronunciation error rate has been reduced by 30%-50%, the MOS score has improved from 5.4 to 5.53, and fine-grained control over emotions and dialects is supported. The model supports Chinese (including dialects: Cantonese, Sichuan dialect, Shanghainese, Tianjin dialect, etc.), English, Japanese, Korean, and supports cross-lingual and mixed-language scenarios.

Available Serverless

Run queries immediately, pay only for usage

Price

$

7.15

/ M UTF-8 bytes

Playground

API Usage

Benchmark

Use Case

Loading...

Metadata

Create on

Dec 16, 2024

License

Provider

FunAudioLLM

HuggingFace

FunAudioLLM/CosyVoice2-0.5B

Specification

State

Available

Architecture

LLM-based TTS

Calibrated

Yes

Mixture of Experts

No

Total Parameters

1B

Activated Parameters

0.5B

Reasoning

No

Precision

FP8

Context length

0K

Max Tokens

Supported Functionality

Serverless

Supported

Serverless LoRA

Not supported

Fine-tuning

Not supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

JSON Mode

Not supported

Structured Outputs

Not supported

Tools

Not supported

Fim Completion

Not supported

Chat Prefix Completion

Not supported

Compare with Other Models

See how this model stacks up against others.

FunAudioLLM

text-to-speech

FunAudioLLM/CosyVoice2-0.5B

Release on: Dec 16, 2024

Total Context:

0K

Max output:

Input:

$

/ M UTF-8 bytes

Output:

$

/ M UTF-8 bytes

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow

·