Welcome to “Lessons from our Voice Engine,” a series of blog posts by members of our Engineering and Speech Tech teams that explain, at a high level, how our voice engine works.
Lesson 1 comes from Nick Parslow, a Computational Linguist and member of our Speech Tech team here at SoapBox Labs.
Natural Language Processing (NLP) is about the interface between human language and machine language. This can mean taking a written sentence — just like this one — and extracting the key information from it, such as the semantic ideas, or the intent in the case of a command like left, right, open, close, for example. Going in the opposite direction, NLP can mean taking data and generating a human readable text from it.
In speech recognition systems like the SoapBox voice engine, NLP is used to build what are called language models — statistical models of language that can predict the next word based on the context. Language models are essential to help disambiguate similar sounding phrases. A great example of this is “white shoes” and “why choose.”
Building a language model requires “normalization” of text — taking, for example, all instances of “ice-cream” and “icecream” and converting them to the same form. Without that normalization, the computer thinks of them as completely unrelated words.
Another use of NLP in speech recognition — in particular for English — is to work out the pronunciation of a word. This may be difficult for a machine to work out automatically (comparing “though” and “tough,” for example), so it may involve a mix of manual and automatic estimation.
NLP plays a critical role at SoapBox because it links the spoken version of language with the written form, thereby allowing our voice engine to better understand and assess what a child is saying — their intent, their pronunciation, and much more — beyond just the words themselves.
Continue to “Lesson from Our Voice Engine” #2 on custom language models (CLMs).
© SoapBoxLabs. 2021 – All Rights Reserved