Thursday, May 1, 2014

The Download: Making One’s Voice Heard is Not as Easy as it Sounds

No Comments
For a long time, touchless voice control was considered the holy grail of mobile devices. We knew it would make the experience on our phones irresistible—for our users driving in the car, cooking in the kitchen, glued to the TV watching that last episode of Breaking Bad—or whatever other reason you have for not wanting to pick up your phone.
We learned a lot in the process of building it.
Touchless Control was thrilling to launch and has been even more exciting to evolve. But the process of building the feature was an experience for us, too! For today’s Download—our regular spotlight on Motorola software—we want to share what we learned in developing Touchless Control:
  • Everyone is different—especially in the way they speak. We set out to build a deeply personal, super useful phone experience for our users—and we quickly realized that for a device to recognize a voice command, we had to consider all of the different dialects across the world that our phones support (English, Spanish, Portuguese, French, German, and Italian). For every language, we gathered 100-200 people of various genders and regional accents to help us build a reference dictionary. This dictionary would provide the foundation for the technology. Then we tested the software against audio clips from TV shows. The last step was ruthless trial and error until we could refine the product to work for everyone. It turns out, to make something truly personal, you do have to make it work for everyone!
  • Toys can yield insight. Touchless Control was the first hands-free voice control ever offered on a mobile phone. So during the development phase we tried to find products in other industries that used similar voice command technology. We found two, both in the toy industry: a blue, one-eyed Alien alarm clock that could listen to you speak and a little Christmas ornament sold on TV that would automatically play holiday songs when you called out to it. A good toy provides endless joy.
  • It’s all in the trigger. The trigger—or the “OK Google Now” phrase that activates Touchless Control—is a crucial cornerstone in getting always-on software to work. It had to be very distinctive, so it wouldn’t be confused with other ubiquitous phrases used in everyday conversation, and it had to be easy for the device to pick up. In linguistic terms, this meant the trigger had to be at least four syllables (ideally at least five) and include unique sounds. We worked with several professional linguists and tested dozens of potential triggers. Some random words like “magic” and “genie” performed really well. But ultimately “OK Google Now” worked the best and made sense for our Android-based software.
  • Some women tend to say their w’s with an h. We didn’t even hear it ourselves at first, really. But when we looked into the audio data, sure enough, we learned that a significant number English-speaking women tend to begin their w sound with an h. Quite elegant.
Today we’re inspired to see that the majority of Moto X users are actively using Touchless Control with a significant number using it more than five times a day. Google searches are the most often used command followed by voice-activated calls. The touchless features are especially popular among our younger users.
One of the most frequent questions we received for last week’s Hangout on Air with Punit was whether we were continuing to develop new Touchless Control features and commands. The short answer is yes. We’ve learned a lot in building this software from the beginning, and we look forward to launching future features and commands in the coming months.
Let us know what you’d like to see in future iterations by telling us on our Google+, Facebook, or Twitter using the hashtag #motodownload.
Posted by Richard Hung, Product Manager

Leave a Reply

Share

Please