Mistral AI has released Voxtral TTS, a high-performance, 4-billion parameter open-weights text-to-speech model that competes directly with proprietary tools like ElevenLabs.

It runs on 3 GB of RAM locally and is free. It supports nine languages, offers 3-second voice cloning with high similarity, and delivers sub-second, low-latency performance suitable for on-device applications.
Key Features of Voxtral TTS:
- Performance: Achieved high win rates in human evaluation against top competitors, with superior speaker similarity.
- Efficiency: The 4B model is lightweight enough to run on consumer hardware (laptops, GPUs).
- Voice Cloning: Requires only 3-5 seconds of reference audio for voice cloning and supports cross-lingual voice adoption.
- Capabilities: Generates highly emotive, expressive, and natural-sounding speech across nine languages including English, German, Spanish, and Hindi.
- License: Released under an open-source, permissive license (Apache 2.0), making it available for developers to deploy freely.
This release is part of Mistral's strategy to move into audio and provide open-source alternatives to premium voice AI services.
Source: Mistral
Be the first one to participate!