Microsoft SAM Text-to-Speech - Windows XP Voice SynthesizerSAM TTS

Parler TTS

Parler TTS is a lightweight text-to-speech model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

Trained on 45,000 hours of narrated English audiobooks, Parler TTS offers speaker consistency across generations with 34 characterized speakers that can be specified by name.

Tips for Better Results

  • Include the term 'very clear audio' to generate the highest quality audio, and 'very noisy audio' for high levels of background noise
  • Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
  • The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt
  • To ensure speaker consistency, specify which speaker to use by name: 'Jon's voice is monotone...'

Parler TTS Features

Discover what makes Parler TTS a powerful and versatile text-to-speech solution

High-Fidelity Speech

Generates remarkably natural-sounding speech with high audio quality and clarity.

Speaker Consistency

Maintains consistent speaker characteristics across multiple generations using 34 predefined speakers.

Controllable Features

Control gender, background noise, speaking rate, pitch, and reverberation through simple text prompts.

Optimized Inference

Supports SDPA, torch.compile, batching and streaming for faster generation.

Fully Open-Source

All datasets, pre-processing, training code, and weights are released publicly under Apache 2.0 license.

Fine-Tuning Support

Comprehensive documentation for training and fine-tuning your own Parler TTS models.

About Parler TTS

Parler TTS is a reproduction of work from the paper 'Natural language guidance of high-fidelity text-to-speech with synthetic annotations' by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

Contrary to other TTS models, Parler TTS is a fully open-source release. All of the datasets, pre-processing, training code, and weights are released publicly under a permissive license, enabling the community to build on this work and develop their own powerful TTS models.

The latest checkpoints, Parler-TTS Mini v1.1 and Large v1, are trained on 45,000 hours of narrated audio and introduce speaker consistency across generations with 34 characterized speakers that can be specified by name (e.g., Jon, Lea, Gary, Jenna, Mike, Laura).

Available Models

  • Parler-TTS Mini v1.1 - Faster generation with high-quality results
  • Parler-TTS Large v1 - Higher quality but slower generation
  • Parler-TTS Mini Expresso - Fine-tuned for expressive, voice-consistent generations

Learn more about Parler TTS on GitHub

FAQ

Frequently Asked Questions

Common questions about Parler TTS

1

What is Parler TTS?

Parler TTS is a lightweight text-to-speech model that generates high-quality, natural sounding speech with controllable features like gender, speaking rate, pitch, and more. It's a reproduction of work from the paper 'Natural language guidance of high-fidelity text-to-speech with synthetic annotations' by Dan Lyth and Simon King.

2

How do I control the speaker's voice?

Parler TTS supports 34 characterized speakers that can be specified by name (e.g., Jon, Lea, Gary, Jenna, Mike, Laura). Simply adapt your text description to specify which speaker to use, for example: 'Jon's voice is monotone...'

3

What models are available?

Parler TTS offers two main models: Parler-TTS Mini v1 (faster) and Parler-TTS Large v1 (better quality but slower). Both are trained on 45,000 hours of narrated English audiobooks.

4

How can I improve the quality of generated speech?

Include terms like 'very clear audio' to generate the highest quality audio. Use punctuation to control prosody (e.g., commas for small breaks). Control features like gender, speaking rate, pitch and reverberation directly through the prompt.

5

Can I fine-tune Parler TTS for my own use case?

Yes, Parler TTS is fully open-source, and comprehensive documentation is available for training and fine-tuning your own models. Check out the Parler-TTS repository on GitHub for guides and examples.

6

What license is Parler TTS released under?

The Parler-TTS codebase and its associated checkpoints are licensed under Apache 2.0, enabling the community to build on this work and develop their own powerful TTS models.