The Voicebox Model and Its Applications

This is the resources page for the papers that I am going to talk about on May 8, 2025 at Conversational AI Reading Group at Mila.

View the Project on GitHub lsari/voicebox_talk_may_2025

Logistics

Abstract

Voicebox is a non-autoregressive generative speech model based on flow-matching and is trained to perform speech infilling given audio context and the corresponding text. The Voicebox model can be used for zero-shot text-to-speech synthesis, noise removal, content editing, style conversion, and diverse sample generation. In this talk, we will first review the Voicebox model. We will then focus on the synthetic speech generation capability of the model and present several use cases of these synthetic signals in various applications including automatic speech recognition and spoken language understanding. Through a few early studies on using Voicebox generated speech signals, we will discuss the cost saving benefits of the approach in terms of speech data collection and potential shortcomings of using synthetic speech in these applications.

Papers

  1. Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale, Le et al., paper

  2. Audiobox: Unified Audio Generation with Natural Language Prompts, Vyas et al., paper on Arxiv

  3. Towards Selection of Text-to-speech Data to Augment ASR Training, Liu et al., paper on Arxiv

  4. Using Voicebox-based Synthetic Speech for ASR Adaptation, Dhamyal et al., paper on ISCA archive

  5. Improving Spoken Semantic Parsing using Synthetic Data from Large Generative Models, Sharma et al., paper on ISCA archive

  6. Flow Matching for Generative Modeling, Lipman et al., paper on Arxiv

  7. Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition, Kim et al., paper on Arxiv

Other Links

  1. Voicebox demo page
  2. Audiobox demo page

Professional Activities

  1. YFRSW 2025 - Sponsorship opportunities are available for 2025! We are also looking for current female PhD students who can volunteer to talk in our panel discussion.
  2. LinkedIn page for YFRSW
  3. IEEE MLSP 2025 - Sponsorship opportunities are available for 2025! Also stay tuned for registration!
  4. IEEE ASRU 2025 - Call for demos will be available soon!