Join SoapBox + CA at SXSW EDU 2024: Learn more

Our voice

A fundamentally different approach to speech recognition

Most automatic speech recognition technology (ASR) has been built using predominantly adult voice data, and models the predictability of adult language, speech patterns, and behaviors. These pre-trained systems perform poorly with more challenging demographics like kids, and the younger the child the poorer they perform. These systems have also been proven to suffer from bias

To be effective as an educational tool, speech recognition systems need to offer human-level accuracy down to the phoneme-level, for kids of all ages, in noisy environments. For entertainment experiences such as gaming and the Metaverse, for example, voice technology needs to accommodate the unique challenges of colloquial language, brand names, and noisy, emotive and “distracted” speech.

SoapBox Voice Engine Graphic 3
SoapBox Voice Engine Graphic Mob

At SoapBox, we’ve re-imagined and redesigned every step in the speech recognition development process to build a proprietary, privacy-first, and highly accurate voice engine that works for kids ages 2 years and up. Our kid-specific focus has allowed us to achieve the highest levels of accuracy and to mitigate bias in relation to ages, accents, and dialects. According to the independent evaluations of our many clients, SoapBox’s voice engine performs on a par, or surpasses human-level accuracy.

SoapBox’s highly accurate speech recognition engine works in the real world because it’s built on our large, proprietary, database of natural child speech, of all accents and dialects, from noisy environments, and from 193 countries. Our language and speech models are also readily adaptable to new speech contexts, emerging dialects, and vocabularies.


soapbox features accurate


Human-level accuracy down to the individual phoneme.

soapbox features integration

Flexible & customizable

Many use cases available off-the-shelf. Partner with us for bespoke solutions.

soapbox features customisable

Easy integration

Lightweight and low-code API, integrates within hours.

soapbox features on off embed

Online / offline / embedded

Flexible delivery options tailored to your use case.

What’s different about SoapBox’s voice technology?

SoapBox’s voice engine is unique in the industry because it gives you the power to build, change and update the voice experience you’re offering to kids, without needing support from us.

Each time you send audio to our API, you can decide, on the fly, what standard or nonsense word, brand name, or colloquialism you want to search for and assess. And when you need a custom approach for transcriptions or fluency assessments, we can build custom models for you.

Available off-the-shelf

Pre-trained, no customization required

Use cases include command and control, search, utility, menus, multi-choice questions, pronunciation assessments, conversation, keyword spotting, moderation filters.

Custom-built models

Proprietary systems and processes designed for rapid model customization.

Use cases include speech transcripts, voice chat transcripts, dictation, fluency assessments.

Data that delivers insights

SoapBox’s voice engine offers our clients a rich set of real-time data and valuable insights into how best children play and learn. For example, in preK-12 literacy practice and assessments, our voice engine not only converts speech-to-text but also offers feedback down to the individual phoneme level, and generates data points to help teachers to better understand student progress. This data includes words correct per minute (WCPM), insertions, omissions, substitutions, pauses, hesitations, and other errors. 

SoapBox clients also use these rich sets of data to accelerate product research and new product development.

hexagon on the perimeter
c on the perimeter
boy with a tablet

Frequently asked questions

What are the benefits of voice technology?

Voice technology transforms passive experiences into interactive ones, allowing kids to engage more actively and use their voices to learn and play more immersively. It also makes learning a more rewarding and relatable experience to the modern student, in and beyond the classroom.

Voice-enabled apps, tools, and games give kids agency and access. They also breathe new life and opportunity into education and entertainment products and product roadmaps. 

Here are some examples of the many benefits of voice technology for kids: 

  • A pre-literate child can command and control their game or toy without having to struggle with a controller or menu. 
  • An emergent bilingual child can use their voice to practice pronouncing words out loud in a new language and receive immediate feedback and recognition for their progress.
  • A teacher can undertake more regular and automated reading assessments at scale, rather than individually with students, and get accurate scoring and feedback on each student’s progress. 

Is voice technology the underlying technology of voice assistant solutions?

Yes, the voice technology that powers the smart speaker in your kitchen is fundamentally the same technology that powers voice-enabled apps, platforms, and products like games, toys, and learning tools.

Pioneering voice tech companies like SoapBox have built a specialist version of this speech recognition software so that it performs with high accuracy for all kids’ voices, without bias and with ambient noise, while also protecting their voice data privacy.

Does the SoapBox voice engine also work for older children and adult users?

Yes! At SoapBox, our mission has always been to deliver a solution that works for all ages, accents, and dialects. To do this, we’ve focused on the hardest problem of all — delivering high accuracy for kids of all ages. We think of adults as the most well-behaved, mature, articulate children, so our models work well for them too!

Does SoapBox build end-user products?

No, SoapBox does not build end-user solutions. We are a B2B company, whose voice engine powers end-user education and entertainment experiences for kids. Third-party entertainment and education companies license our voice technology to power their experiences for kids. See a sample list of our customers on our home page.

How accurate is SoapBox’s voice engine?

SoapBox is the only voice engine that offers accuracy for kids of all ages and accents down to the phoneme level. Here is some of the external validation we’ve received from our clients:

  • In their partner evaluation, MetaMetrics found a very high correlation between the SoapBox voice engine and human annotators for oral reading fluency assessments.
  • During extensive beta testing over a six-month period, education pioneer Amplify found a correlation of 96% between SoapBox’s automatic assessments and human scoring, a correlation comparable to that of human-to-human scoring that exceeds market standards. More details about Amplify’s evaluation and use case for our voice technology can be found in this literacy-focused white paper

Two preeminent US-based academic research institutions have recently undertaken independent studies of the accuracy of competing speech systems and found that for kids, SoapBox performed best. One of these studies focused on the existence of bias in speech recognition and found that SoapBox performed equally well among groups of Black, Latinx, and white children from different socioeconomic backgrounds and demonstrated no bias in speech recognition towards or against any particular cohort. Public-facing reports of these studies have yet to be published.

How easy is it to integrate SoapBox’s API?

SoapBox Labs’ APIs are built to deliver functional integrations quickly and easily (i.e., we offer low-code and straightforward API implementations). Many of our clients have brought their integrations live within 24 to 48 hours. 

We use standard, RESTful APIs your engineers will be familiar with, as well as straightforward URLs, HTTP response codes, and standard HTTP features, which are understood by HTTP clients. We also support cross-origin resource sharing, allowing you to interact securely with our API from a client-side web application. 

We provide clients with access to our developer portal, which includes extensive documentation to support their integration experience, and access to our ticketed Support Desk for technical queries.

We’d love to tell you more about SoapBox’s voice engine and API integration. Simply tell us about your use case on our Get Started form, and a member of our team will be happy to schedule a call. 

Does SoapBox’s voice engine work on all kinds of devices?

Our voice engine runs best on mid- to higher-end devices. Performance can depend on a number of factors, including the use case. We test performance across a range of physical device types on both Apple and Android. Our lowest development target for Apple is IOS9 and for Android is API16.  

Tell us about your target device and use case by completing our Get Started form, and we’ll be in touch to schedule a call.