Models

Products

Pricing

Docs

Blog

About

Contact

Back to Models

Fish-Speech-1.5

fishaudio/fish-speech-1.5

Fish Speech V1.5 is a leading open-source text-to-speech (TTS) model. The model employs an innovative DualAR architecture, featuring a dual autoregressive transformer design. It supports multiple languages, with over 300,000 hours of training data for both English and Chinese, and over 100,000 hours for Japanese. In independent evaluations by TTS Arena, the model performed exceptionally well, with an ELO score of 1339. The model achieved a word error rate (WER) of 3.5% and a character error rate (CER) of 1.2% for English, and a CER of 1.3% for Chinese characters.

API Usage

cURL

Python

JavaScript

curl --request POST \
  --url https://api.ap.siliconflow.com/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "input": "Can you say it with a happy emotion? <|endofprompt|>I'\''m so happy, Spring Festival is coming!",
  "response_format": "mp3",
  "stream": true,
  "speed": 1,
  "gain": 0,
  "model": "fishaudio/fish-speech-1.5"
}'

Details

Model Provider

fishaudio

Type

audio

Sub Type

text-to-speech

Publish Time

Nov 29, 2024

Price

$

15

/ M UTF-8 bytes

Tags

Multilingual

Open in Playround

API Reference

Ready to accelerate your AI development?

Ready to accelerate your AI development?

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·

PAGES

MODELS

PRODUCTS

© 2025 SiliconFlow Technology PTE. LTD.

·