Tech

Mistral just open-sourced a text-to-speech model that beats ElevenLabs

Mistral AI has released Voxtral TTS, a high-performance, 4-billion parameter open-weights text-to-speech model that competes directly with proprietary tools like ElevenLabs.

It runs on 3 GB of RAM locally and is free. It supports nine languages, offers 3-second voice cloning with high similarity, and delivers sub-second, low-latency performance suitable for on-device applications.

Key Features of Voxtral TTS:

  • Performance: Achieved high win rates in human evaluation against top competitors, with superior speaker similarity.
  • Efficiency: The 4B model is lightweight enough to run on consumer hardware (laptops, GPUs).
  • Voice Cloning: Requires only 3-5 seconds of reference audio for voice cloning and supports cross-lingual voice adoption.
  • Capabilities: Generates highly emotive, expressive, and natural-sounding speech across nine languages including English, German, Spanish, and Hindi.
  • License: Released under an open-source, permissive license (Apache 2.0), making it available for developers to deploy freely.

This release is part of Mistral's strategy to move into audio and provide open-source alternatives to premium voice AI services.

Source: Mistral

1
0
100%
Login to join the Conversation
Be the first one to participate!
Tech

Space for discussing the latest advancements in technology and everything related to it.