Mistral just open-sourced a text-to-speech model that bea...

Mistral AI has released Voxtral TTS, a high-performance, 4-billion parameter open-weights text-to-speech model that competes directly with proprietary tools like ElevenLabs.

It runs on 3 GB of RAM locally and is free. It supports nine languages, offers 3-second voice cloning with high similarity, and delivers sub-second, low-latency performance suitable for on-device applications.

Key Features of Voxtral TTS:

Performance: Achieved high win rates in human evaluation against top competitors, with superior speaker similarity.
Efficiency: The 4B model is lightweight enough to run on consumer hardware (laptops, GPUs).
Voice Cloning: Requires only 3-5 seconds of reference audio for voice cloning and supports cross-lingual voice adoption.
Capabilities: Generates highly emotive, expressive, and natural-sounding speech across nine languages including English, German, Spanish, and Hindi.
License: Released under an open-source, permissive license (Apache 2.0), making it available for developers to deploy freely.

This release is part of Mistral's strategy to move into audio and provide open-source alternatives to premium voice AI services.

Source: Mistral

Mistral just open-sourced a text-to-speech model that beats ElevenLabs

Be the first one to participate!

Tech