Fish-Speech-1.5

Fish-Speech-1.5

fishaudio/fish-speech-1.5

About Fish-Speech-1.5

Fish Speech V1.5 is a leading open-source text-to-speech (TTS) model. The model employs an innovative DualAR architecture, featuring a dual autoregressive transformer design. It supports multiple languages, with over 300,000 hours of training data for both English and Chinese, and over 100,000 hours for Japanese. In independent evaluations by TTS Arena, the model performed exceptionally well, with an ELO score of 1339. The model achieved a word error rate (WER) of 3.5% and a character error rate (CER) of 1.2% for English, and a CER of 1.3% for Chinese characters.

Available Serverless

Run queries immediately, pay only for usage

$

15.0

Per 1M UTF-8 bytes

Metadata

Create on

Nov 29, 2024

License

Provider

Fish Audio

HuggingFace

Specification

State

Available

Architecture

Calibrated

No

Mixture of Experts

No

Total Parameters

Activated Parameters

Reasoning

No

Precision

FP8

Context length

0K

Max Tokens

Supported Functionality

Serverless

Supported

Serverless LoRA

Not supported

Fine-tuning

Not supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

JSON Mode

Not supported

Structured Outputs

Not supported

Tools

Not supported

Fim Completion

Not supported

Chat Prefix Completion

Not supported

Model FAQs: Usage, Deployment

Learn how to use, fine-tune, and deploy this model with ease.

Ready to accelerate your AI development?

Ready to accelerate your AI development?