Voice-First Experiences for Children’s Picture Books and Beyond
Sunday, February 10, 2019
Posted by: Matt Hammersley
The Evolution of Voice
Ever since we first asked Siri to tell us the weather outside in February 2010 —and she did—the consumer and commercial applications for voice technology have been discussed, deliberated upon, and debated from bootstrap offices at small start-ups to corporate boardrooms at Fortune 500 companies.
As CEO of Novel Effect, I have been in both of those rooms. Founded in 2015 by me, my wife Melissa, and our sister-in-law Melody Furze, Novel Effect began with a very simple mission: to make reading books aloud with your kids fun and engaging at a time when screens seemed to be taking time and attention away from that bonding experience.
Over the course of the next year, we built a screen-free, voice-driven storytelling platform that enhances story time by syncing background music and sound effects to a reader’s voice as a book is read aloud, creating a new interactive experience around a traditional print product. A little less than three years later, we closed our Series A funding round with Amazon and a select number of well-known VC firms and are now a leader in voice-driven media and entertainment.
Today, I regularly meet with executives across media from cable distribution and streaming services to content studios and film production. Over the past year, I have seen a significant shift in the interest and commitment these companies are devoting to the development of voice-driven experiences. Because of this, I firmly believe that voice is at a pivotal turning point driven by a consumer base that is ready to embrace all the possibilities voice-interactivity presents.
Siri, Alexa, and Google Home have all been game changers in this space with the use cases for call-and-response voice interaction constantly being adapted for the convenience of the consumer. It still amazes me that you can turn off the lights in your home from 2,000 miles away with a simple voice command.
However, while the experiences are getting admittedly more sophisticated, the user experience is still very similar to the first time you asked Siri about the temperature.
The experience goes something like this:
- User opens an application.
- User prompts the application to listen by calling its name (Hey Siri, OK Google).
- User asks a question or issues a command.
- Application provides answer or delivers the desired action.
- User thanks application and goes about their day.
What has changed over the past ten years is the user. They have come to accept, and perhaps even expect, that voice-enabled technology should be a part of their everyday lives. In large part, this correlates to our growing dependence on our devices.
According to a recent Pew study, 77% of Americans have smart phones with nearly one-third of American households having at least three. A Nielsen study further notes that 25% of households now have one smart speaker, with 40% of those households owning more than one. With this proliferation of device ownership, it isn’t surprising to hear (no pun intended) that by the year 2020 55% of all searches will be voice based.
Beyond search, there are real impacts for advertising, commerce, and content. At the most basic levels, we’ve seen higher engagement with existing content as users play podcasts, music channels, and audiobooks through their in-home assistants. The more forward-thinking developers have started to adapt existing audio content into more interactive activities for Alexa Skills and Google Actions.
Ultimately, thinking creatively about content is what will open the door for publishers, game developers, animation studios, and others to drive innovation in the voice space and grow beyond the home assistant call-and-response experience. When asked about the future of voice, I often challenge creators and developers to think about voice-interactivity as a marketing and brand development tool and how they can create engaging voice experiences that can be shared with friends and family to drive brand awareness and loyalty.
To think creatively about the voice space, it is helpful to understand a bit about how voice recognition works.
The Science of Voice
Excuse me while I get a bit technical for a minute on the complex subject area of speech recognition, the concepts of phones and phonemes, and the difference between our brains and computers. I promise to keep it light!
When I say a word out loud, my voice generates sounds that corresponds to the letter (or group of letters in the word). These sounds are called phones. If I were to say the word “boy,” the phones produced would correspond to the sounds for “b,” “o,” and “y.”
Related to this is the concept of phonemes, which are the basic sound building blocks that all words are built from. Right about now, you are probably thinking that there isn’t a huge difference between phones and phonemes. But that is because our brains don’t really differentiate between the two.
When we listen to speech, our brains skip right to the phones and turn them back into words, sentences, thoughts, and ideas, sometimes even anticipating what people are going to say before they finish getting their thoughts out.
Even with the rise of machine learning, deep learning, and natural language processing capabilities, computers are still learning how to skip and anticipate the intent of the words being spoken. The computers learn by absorbing the phonemes as well as the phones. They have gotten pretty good at this and are working toward gaining that intent knowledge.
Very broadly speaking, these are the steps computers go through as they learn to understand speech:
- Simple pattern matching: The computer recognizes each spoken word in its entirety because it has learned that word already.
- Pattern and feature analysis: The computer breaks down the word into its bits and identifying key features, such as the vowels it contains.
- Language modeling and statistical analysis: The computer uses knowledge of grammar and the probability of certain words following one another to begin to increase speed and accuracy.
- Artificial neural networks: The computer runs models that it has learned through exhaustive training to reliably recognize patterns.
A few challenges that computers face as they are learning:
- Separating words from background noise to ensure they are grabbing and focusing on the right subject matter.
- Gauging the speed of the speech to understand when one word or sentence ends and another begins (particularly with fast talkers!).
- Understanding the changing pitch of individual voices, particularly high-pitched voices.
- Distinguishing between similar sounding words (such as “their” and “there”) to determine the correct meaning in context of the rest of the sentence.
There are many other challenges that computers face, and many smart developers out there creating solutions to aid for faster processing. One example is Cypher, whose software isolates the speaker’s voice and ignores everything else. As machine learning and natural language processing continues to move forward, the applications and uses for voice will continue to grow.
The Future of Voice
The user experience of call-and-response interaction is fairly predictable at this point and current voice technology in Siri, Alexa, Google, etc. is optimized for that experience. For a variety of reasons, not the least of which is privacy, the voice assistant creators have employed this pattern as it provides a time window for how long the device listens to the consumer before it begins to search for the answer. This window is typically around 10–15 seconds. As the applications listens, it searches its preferred database to generate an answer or maps the right trigger to produce the desired reaction (such as turning off your lights).
Novel Effect was conceived as a platform that would break that mold and provide users with a way to enhance everyday experiences with their voice. The underlying premise was that as voice becomes more integral in how users engage with technology, the users’ expectations for what that technology can do will grow exponentially.
Our flagship mobile app was designed with children’s picture books in mind. Books were important parts of my and Melissa’s childhoods, and we wanted to ensure that they would be a part of our daughter’s as well, even with the competition they would face from screens and devices.
Hence, Novel Effect… the first-of-its-kind platform to use screen-free technology to blend existing print books with rich interactive content. As the app runs in the background, enveloping the reader in a sensory immersive experience, the focus remains on the physical book.
Reading a typical picture book aloud usually takes at least 10 minutes. This presented a challenge for launching on the existing voice assistants due to their time restraints. Our solution was to launch our platform as an app.
As we also take privacy very seriously, we designed the app to have the voice processing take place locally. What this means is that when a user accesses a particular soundscape in our app, it is downloaded locally to their device, which means that even though the app is listening through the whole story, that audio data does not have to be passed to the cloud to process the voice.
We were able to optimize the voice recognition for all kinds of voice speeds, accents, and pitches. Our platform works just as well for a child reading to her dad as it does for a dad reading to his daughter. I know this from firsthand experience as my three-year-old daughter Eleanor loves to read to me and trigger the music and sound effects with Novel Effect.
Best of all, the app is incredibly simple to use. A parent, teacher, or caregiver simply opens the app, searches for the book they are about to read, selects that tile in our app, and hits play. Then they put the device aside and start reading the print book with their child. As they read, our app is listening for the specific words and phrases in the text and syncing background music, sound effects, and character voices in real time.
Having developed this platform in app form allowed us to perfect our voice recognition algorithms and enables us to grow exponentially in the future. With planned product releases in Android and a web-based version in development, we believe we are changing the way users will interact with voice technology in the very near future.
We are really excited about the possibilities that our voice recognition system has unlocked to create unique and engaging experiences. All forms of media, from books, to plays, to movies, to games can be enhanced through voice interaction.
Seeing the faces of kids and adults light up as they realize Novel Effect is reacting to their voices and words is amazing. Their surprise quickly turns to excitement as they continue to read and “hear” and see what happens next, whether it’s a story they’ve never read before or one they know well. Everyone should feel this joy when reading and our ultimate ambition is to make that happen for every child . . . and for every child at heart!
This article is brought to you through a partnership with Amnet, a technology-led provider of services and solutions, catering to the needs of businesses for content transformation, design, and accessibility. The points of view expressed are those of the author and do not necessarily represent the perspectives of Amnet or of BISG.
Matt Hammersley is co-founder and CEO of Novel Effect. A lifelong entrepreneur, Matt began his first start-up as a child selling tomatoes at a roadside stand. He went on to spend more than a decade as an engineer and patent attorney in South Carolina, Delaware, and Texas before co-founding Novel Effect with his wife. He loves traveling, playing golf, and Clemson football. He's passionate about telling great stories and has two children who serve as his nightly audience at their home in Seattle.
Seattle-based Novel Effect is voice interactive entertainment that brings stories to life through an ambient voice platform. Using speech recognition, the flagship mobile app follows along synchronizing special effects, sounds, and music in real time with a storyteller’s voice. Winner of Best Integrated Mobile Experience at the 22nd Annual Webby Awards, the app bridges the physical and digital worlds through the combination of technology and physical media. Novel Effect has patent-pending technology that was featured on Shark Tank and was integral to their participation in the 2017 Alexa Accelerator powered by Techstars. Learn more at www.noveleffect.com and download the app on iOS devices at https://apple.co/2dsswGr.
Smartphone reference: http://www.pewresearch.org/fact-tank/2017/05/25/a-third-of-americans-live-in-a-household-with-three-or-more-smartphones/
Home speaker reference: