Lessons from Our Voice Engine: Acoustic Models

May 20, 2021

Rectangle Circle

Speech Recognition Engineer Armin Saeb brings you the fifth installment of our “Lessons from Our Voice Engine” series, featuring high-level insights from our Engineering and Speech Tech teams on how our voice engine works. 

What is an acoustic model (AM)? 

Acoustic Models (AM) are key components of any speech recognition engine. An AM describes the statistical properties of sound events and connects the acoustic information with phonemes or other linguistic units. Hidden Markov Models (HMM) are one of the most common types of AMs. Other acoustic models include Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN).

Why are AMs important for SoapBox?

AMs play an even more critical role at SoapBox than in normal, adult-focused voice engines because recognizing children’s speech is much more challenging. Kids have smaller vocal tracts and slower and more variable speech patterns. They inhabit noisy environments and use a lot of spontaneous speech, imaginative words, ungrammatical phrases, and incorrect pronunciations!

At SoapBox we work hard to design and train AMs that can cater to all of this complexity and do the heavy lifting of accurately converting children’s speech to text.

Continue to Lesson #6 on keyword detection, or catch up on our previous “Lessons from Our Voice Engine”:

Share this