Fish-Speech-1.5
fishaudio/fish-speech-1.5
Fish Speech V1.5 is a leading open-source text-to-speech (TTS) model. The model employs an innovative DualAR architecture, featuring a dual autoregressive transformer design. It supports multiple languages, with over 300,000 hours of training data for both English and Chinese, and over 100,000 hours for Japanese. In independent evaluations by TTS Arena, the model performed exceptionally well, with an ELO score of 1339. The model achieved a word error rate (WER) of 3.5% and a character error rate (CER) of 1.2% for English, and a CER of 1.3% for Chinese characters.
Details
Model Provider
fishaudio
Type
audio
Sub Type
text-to-speech
Publish Time
Nov 29, 2024
Price
$
15
/ M UTF-8 bytes
Tags
Multilingual