From Sound to Text in Real-Time: Understanding Voxtral Realtime
Most speech-to-text models work like a translator reading an entire letter before responding — they need the full audio clip before they can produce any text. But what if your model could transcribe speech as it’s being spoken, word by word, with barely any delay? That’s exactly what Voxtral Mini 4B Realtime does. Released by Mistral AI under the Apache 2.0 license, it’s a 4-billion parameter model that can transcribe audio in real-time with delays as low as 80 milliseconds....