8 things to know before choosing a voice partner

November 11, 2022

Rectangle Circle

Voice-enabled experiences are everywhere, but how well do they work for kids? Can speech recognition technology, or “voice AI” as it’s sometimes called, really enhance a gaming experience or help a child learn to read? 

The answer, of course, depends on the voice AI you decide to use. 

The education, digital gaming, and entertainment companies that choose SoapBox are adding speech technology to make their experiences more fun and impactful for kids, as well as to extend their product lifecycles and profit margins.

Here are the top eight questions customers ask us before starting their voice journey with us.

8 FAQs on speech technology for kids

1. How does SoapBox differ from other speech technology providers?

We love this question! Here are a few of our key differentiators over other voice providers:

As accurate as a human annotator: Despite the unpredictability and huge variation in children’s voices and language, our voice engine has been proven to perform as well — if not more accurately — than a human (for example, a teacher who sits next to a child scoring their reading errors).

We’ve now delivered over 60 million learning moments for kids in classrooms across the US, UK, and Asia using our trusted, accurate speech technology.

SoapBox’s voice engine is as accurate as — or, in some cases, more accurate than — when teachers manually conduct assessments.

Mary Eisele, VP of Intervention Products, McGraw Hill

Confidence around bias: We’ve worked really hard since our founding in 2013 to build speech technology that understands all kids equally, regardless of race, age, accent, or socioeconomic background. Our hard work was rewarded in October this year when we became the first voice company (and first company ever) to earn the Prioritizing Racial Equity in AI Design product certification from Digital Promise and the EdTech Equity project. 

Privacy and control of your data: SoapBox is a privacy-by-design company. All voice data sent to our system is anonymized and de-identified, and never sold to third parties for marketing or advertising. Our clients own their data and decide on the jurisdiction for data processing and whether their data is retained post-processing.

Features: The SoapBox voice engine has been built to cater to the unique challenges of colloquial language, brand names, and the noisy, emotive, unpredictable speech of kids. For entertainment clients, we offer features like voice search, custom wakeword, voice activity detection, and chat moderation.

For education clients, we’ve built plug-and-play features like phoneme breakdown and custom pronunciation, and an oral reading fluency assessment product called SoapBox Fluency.

Speaking of features, SoapBox will soon release another important feature for education clients — Prosody — to measure the expressiveness a child uses when reading. So watch our blog for updates!

Here’s Adrian Mullan, CEO and Co-Founder of Norby, on why implementing our speech technology was a “game changer”:

2. How easy is it to get started with SoapBox’s speech technology?

Many SoapBox clients use our voice engine right off the shelf for education and play use cases that require keyword spotting, pronunciation assessments, character and game navigation (“command and control” as we call it), and multiple choice questions. Got a tech person used to APIs and following technical docs? If so, expect to go live within 24 hours!

Other customers want to voice-enable experiences based on much longer bodies of texts, like in the case of SoapBox Fluency for oral reading fluency assessments. We build these customers a custom language model (CLM). Contact us to tell us more about your use case and how we can get you started quickly and easily.

3. Who are some of your current clients?

In education, we’re proud to power voice-enabled experiences for big names like McGraw Hill, Imagine Learning, and Amplify and smaller digital pioneers like Learning Without Tears, Lingumi, and Norby. More than 50 companies worldwide use our technology and comprise the 60 million learning moments we’ve already delivered to kids. 

Most of our entertainment-focused voice projects are yet to be publicly announced. Contact us if you’d like to learn more, but in the meantime, let’s just say that clients in this space are among the biggest digital gaming and media companies in the world. 

Many of these customers participate in our white papers and webinars to talk about their experiences with SoapBox. Here’s our Can Speech Recognition Help Children Learn to Read white paper, where 12 industry leaders shared their thoughts and experiences with speech technology in education. 

And here’s a recent case-study webinar we did with Imagine Learning. Check out their impressive before-and-after stats from voice-enabling their Fluent Reader+ product.

4. Does SoapBox support other languages? 

Most of our clients are starting their voice journey with English, but our voice engine is language agnostic, allowing us to work with other languages on a case-by-case basis.

We have Spanish, Portuguese, and Mandarin products in beta, so watch our blog for updates on when these will be launched to the market. 

A photo of a girl standing outside, talking into a smart phone.

5. How does SoapBox handle different accents and dialects?

SoapBox builds accurate, inclusive speech recognition technology that works for children everywhere. Our acoustic models are trained on kids’ voices from 193 countries to ensure that no child is penalized for using a different dialectical variant or accent.

Many of our customers, especially education companies, are delivering voice-enabled experiences to kids in the classroom. Imagine the diversity of accents and dialects you’d find in a New York City preK-12 classroom: our voice engine understands all of those students equally.

6. What are some examples of kids using your speech tech?

Here are just a few videos showing how kids can use our speech technology for learning and play across a range of use cases:

Oral reading fluency assessments

Vocabulary practice


Interactive TV

AR experiences

7. Can SoapBox’s voice engine handle background noise?

Yes! Our speech models are built to understand children’s speech in real-world noisy environments — and there’s no need for headsets and mics. We also test in noisy environments — classrooms, living rooms, etc. — with signal-to-noise ratios (SNRs) of 10 to 20 decibels (dB).

Here’s five-year-old Aku practicing his English in a noisy playroom on our client Lingumi’s English language learning (ELL) platform (and our speech tech hears him perfectly!): 

8. What’s SoapBox’s pricing model?

Pricing for our voice engine is typically based on usage or transactions. We offer bespoke discounted pricing for customers with larger volumes of transactions, and a flat annual fee for customers who are smaller scale. All pricing includes access to our developer documentation area and ticketed Support Desk. 

Get in touch with us and we’ll figure out the best pricing model for you. 

Looking for more common FAQs about our products, our people and our business? We’ve got an information-packed FAQ section covering each in more detail!

More questions for us? Get in touch

Email us at hello@soapboxlabs.com, or Get Started with voice by telling us more about your product.

Share this