Voice Control

91��ý

Advances in machine learning, speech recognition, and natural language understanding will drive the development of virtual assistants and bots that act more and more like people; will be controlled by and respond with human voices; and will fulfill search queries, act as proxies, accomplish tasks, and ask questions of us in return. [1]

How It’s Developing

Voice-control provides a new option for interacting with computers and technologies. It is part of an evolution from computer languages and typed commands, to more graphical user interfaces, to touch screens, and now gesture control and voice. [2] Faster wireless speeds and the proliferation of smartphones provided the perfect initial spaces for these virtual assistants to be deployed to consumers.

Siri marked one of the first mass-market voice-controlled assistants. Prior to its acquisition by Apple, the co-founders of Siri sought an entirely new paradigm for accessing the Internet, allowing artificially intelligent agents to summon composed answers from multiple sources, rather than pull relevant resources for humans to consult on their own – a move from a search engine to a “do engine.” [3] When asked a question, Siri would send the audio of the speaker’s question to a server where speech recognition software would “transcribe” the spoken words and then map the contents of a question onto a domain of potential actions before picking the action that seemed most probable, based on its understanding of the relationships between real-world concepts. Siri could also apply details about the time of day, a user’s preferences, and location to inform its response or to ask for more information. [4] Consumers first experienced Siri on iPhones and with the introduction of Apple’s HomePod. Siri has been deployed in a speaker and as an in-home assistant, offering music and podcasts as well as messages, weather, traffic, sports, and alarms. [5]

Google’s Google Now virtual assistant evolved into an upgraded Google Assistant service that can field individual and follow-up questions and understand a conversation to return the right answer. [6] Along with voice control, Google Assistant also works in a text chat form, allowing Google to deploy Google assistant across multiple devices, including phones, Allo chat bot app, the voice-controlled Google Home speaker, and numerous smart home devices; as of 2017, Assistant was said to be integrated into more than 100 million individual devices, including smart TVs, automobiles, and wearables. [7]

Amazon’s Echo device, first introduced in 2015, is an always listening device designed to play music and answer basic household questions when activated with a wake word. While Amazon started further behind Apple and Google in the area of voice control, it benefited from its consumer reach, selling nearly 1 million devices during the 2015 holiday season. Independent developers wrote apps to work with the speaker’s voice controls, allowing the device to control other smart home devices, connect to apps, and perform a growing number of tasks. [8]

Apple, Google, and Amazon are eager to see this technology spread, with new skills added to their platforms and the technology integrated into other devices. In 2015, Amazon gave developers the opportunity to build new capabilities for Echo’s Alexa through the Alexa Skills Kit; by February 2017, Alexa had over 10,000 skills up from 7,000 in January and just 1,000 in June 2016. [9] Amazon also opened its Voice Processing Technology (along with technologies for wake word recognition, beamforming, noise reduction, echo cancellation) to third party hardware makers interested in building Alexa into their devices. [10] Google, already having Google Assistant integrated into numerous devices, made a Google Assistant SDK available for manufacturers to build the Google Assistant into any hardware. [11]

As the skills for voice-controlled technology expands, the speech technology that powers them also improves. Amazon introduced new tags for the standardized Speech Synthesis Markup Language (SSML) that can be used to make Alexa to whisper, vary its speaking speed, bleep out words, add pauses, change the pronunciation of a word, spell a word out, add audio snippets, and insert special words and phrases.. [12]

Product developers are eager to expand the reach of voice-controlled technology across audiences. Toy-maker Mattel introduced Aristotle, a $349 voice-activated speaker built for children and families that can begin as a smart baby monitor with a camera that streams video to parents’ phones, plays audio that can help sooth crying babies, and even includes tracking functions that can monitor feedings and changings to more seamlessly replenish baby products. Aristotle is also able to adapt as a nanny, friend, and tutor for older children, programed to understand young voices so it can introduce games for toddlers and field homework questions for school-age children. [13]

Marketers see voice-controlled technology as an opportunity to provide more information to consumers. Amazon’s “Notifications for Alexa” feature, while opt-in based on each user’s preferences, would proactively alert users with information that’s deemed important to them, including breaking news and random weather reports. [14] In a particularly bold move, Burger King launched a TV commercial that attempted to wake up Google Home devices to expand the reach of the advertisement even after the commercial was over. It was launched without coordination from Google and Google quickly moved to limit the advertisement’s effect and reach. [15] As the technology providers open the platform up to developers, some will likely seek ways to monetize their skills, including the introduction of “sponsored messages” inserted into device responses. While Amazon’s developer agreement forbids “any advertising for third party products or services” for apps unrelated to music streaming, radio, or news briefs, the growth of Skills will likely make this difficult to monitor. [16]

As voice-controlled technology becomes more integrated into homes, it will adapt to recognize multiple family members and residents. Google adjusted its Google Home assistant to allow for multiple users, each of whom can be uniquely identified by their voice; while convenient, such features also make clear the ability for these devices to more accurately track searches, requests, and directions to specific individuals based on their voice. [17] Amazon is also reportedly pursuing a feature that would allow Alexa to distinguish between individual users based on their voices. [18]

Why It Matters

As voice-controlled devices become more popular, they will likely become a more readily available tool for reference. In 2015, 65% of smartphone owners reported using voice assistants like Apple’s Siri, a steady growth from prior years; in her , Mary Meeker estimated that half of all web searches will be conducted through voice and image searches within the next four years. [19] Content integration will likely accelerate this trend. A growing number of news publishers, such as NPR, the Washington Post, and Al Jazeera, are investing in content and editorial teams specifically targeting the home voice assistant or smart speaker market. [20] Google announced partnerships with Bon Appetit, the New York Times, and Food Network to make step-by-step, voice-activated guides for more than five million recipes available through the Home speaker. Users will still need to search for and save recipes to their device, but, once saved, the instructions are conveniently available via voice command. [21]

As users increasingly accept the responses produced by voice-controlled technology, there may be concern for the relevance and authority of the information pulled for these responses. In a conversation interface, users will not always have the option of sorting through multiple possible responses (as they would in a web search), of immediately knowing the source of the information provided, or of seeing some of the details that might alert them to problems with the information. Rather, the technology simply picks the programed source for news, reference, etc., and conveys it to the listener, with some options for customization of sources built into the app. [22]

Voice-controlled technology could also change the way people access and “read” content. While still in its early stages, the Washington Post is experimenting with Amazon’s Polly technology that produces audio transcriptions of text, making audio versions of four articles available daily. [23] While currently only available on mobile devices, there could likely come a time when users would be able to make a voice request for specific content, such as the business section of a newspaper, a website, or even whole books, to be retrieved and read aloud by a voice-controlled device.

Children and young people will grow with voice-controlled technology, becoming more accustomed to having these devices answer homework questions, settle disputes, and entertain them. All of these situations could have an impact on children's social, interpersonal, and language development as well as their intellectual development, moving them toward more simplistic inquiry and acceptance of simple answers instead of taking on more complex questions and answers. [24]

These voice-controlled virtual assistants could become intellectual equalizers, substituting in for a superb memory or acting as an on-hand reference. [25] In such a world, how will humans find valuable ways to work together instead of working in isolation with their voice-controlled assistant?

While voice-controlled devices could be a tool for education and learning, voice-control could also become an increasingly important area of research and technological development. Amazon initiated an to fund and support researchers working on voice technology at Carnegie Mellon University, Johns Hopkins University, University of Southern California, and University of Waterloo. [26]

Voice-controlled technology may increasingly appear in public and shared spaces. Several hotels, including Marriott and Wynn Resorts, are testing devices from Apple and Amazon in hotel rooms to help guests turn on lights, close drapes, control room temperature, and change television channels via voice command. [27] Schools might also find ways to use smart speaker. Saint Louis University has unveiled plans to provide all 2,300 student residences on campus with Echo Dots, all of which can access an SLU Alexa skill that provides answers to "more than 100" common questions, including the location of a building, event timing, or library hours. [28] The use of these devices in semi-public spaces could raise privacy concerns as guests toggle between personal accounts and more standardized accounts set to a specific space. [29] It could also change users’ expectations for what they can do in public and shared spaces.

Concerns for privacy might also arise over the private exchanges overheard by voice-controlled speakers. This information became a central focus in a murder investigation in Arkansas when police asked Amazon for data that may have been recorded on an Echo device while a murder was taking place. While an Echo device typically sits in an idle state with its microphones listening for key words like “Alexa” before it begins recording and sending data to Amazon’s servers, it’s not unusual for the Echo to wake up by mistake and grab snippets of audio, leading investigators to request the data in the event the device overheard key events. [30] Amazon refused to hand over data, claiming that the data and the responses from the voice assistant itself were protected by the First Amendment, but the defendant ultimately agreed to allow Amazon to forward his Echo's data to prosecutors, leaving the legal standard for when data from an Echo or other Internet of Things devices can be used in a court of law unanswered. [31]

In addition to privacy concerns, there might be concerns over disruption. Amazon introduced a voice calling and messaging feature to their Echo devices meant to increase the convenience and access for sending and receiving voice communications; however, the initial launch came without an option to block contacts from calling them when the feature is enabled, providing any number of contacts with direct access into even private spaces where the device is connected. [32]

Voice-controlled technologies could form a complex relationship around issues of diversity. Many of the virtual assistants carry female-sounding names and use female voices by default, perpetuating notions of female servitude and societal sexism. [33] Additionally, voice-controlled technologies defer to the most standard forms of speech, making regional accents, cultural syntax, and correct foreign pronunciations problematic and perhaps also challenging new speakers of a given language; these technologies also push more speakers to adopt a “machine” voice that is different from their regular speaking voice when engaging with friends and family. [34] Mozilla's initiative seeks to collect voice data to help build an open-source voice database that anyone can use to make innovative apps for devices and the web - the initiative might also help to collect a more diverse representation of voices that might spur more inclusive voice controlled technology.

At the same time, voice-controlled technologies could provide benefits to specific portions of the population, including individuals with disabilities or older adults who could benefit from voice assistants to control their homes, order groceries, provide reminders and notifications, or more easily access digital content. [35] Voice controlled technologies could make life easier for individuals who struggle with traditional computer interfaces, be that difficulty controlling a mouse or track pad or problems reading a computer screen.

As more players enter this space – especially big players like Apple, Google, and Amazon – voice-controlled products carry the potential for fragmentation as certain services (iTunes, Gmail) integrate with only certain devices. [36]

Notes and Resources

[1] "Terrifyingly convenient," Will Oremus, Slate, April 3, 2016, available from .

[2] "Terrifyingly convenient," Will Oremus, Slate, April 3, 2016, available from .

[3] "Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone," Bianca Bosker, Huffington Post, January 22, 2013, available from .

[4] " Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone," Bianca Bosker, Huffington Post, January 22, 2013, available from .

[5] "Apple’s HomePod puts Siri in a speaker," David Pierce, Wired, June 5, 2017, available from .

[6] "Google unveils Google Assistant, a virtual assistant that’s a big upgrade to Google Now," Matthew Lynley, TechCrunch, May 18, 2016, available from .

[7] "Google Assistant is about to be everywhere," Andrew Tarantola, Engadget, May 17, 2017, available from .

[8] "The real story of how Amazon built the Echo," Joshua Brustein, Bloomberg, April 19, 2016, available from .

[9] "Amazon opens up Alexa’s microphone and voice processing technology to hardware makers," Nat Levy, GeekWire, April 13, 2017, available from .

[10] "Amazon opens up Alexa’s microphone and voice processing technology to hardware makers," Nat Levy, GeekWire, April 13, 2017, available from .

[11] "Google Assistant is about to be everywhere," Andrew Tarantola, Engadget, May 17, 2017, available from .

[12] "Amazon’s Alexa can now whisper, bleep out swear words, and change its pitch," Ashley Carman, The Verge, May 2, 2017, available from .

[13] "Mattel's new AI will help raise your kids," Mark Wilson, Fast Company, April 17, 2017, available from .

[14] "Amazon’s Alexa is getting smarter, but potentially more intrusive," Alejandro Alba, Vocativ, May 16, 2017, available from .

[15] "This Burger King ad forces your Google Home device to tell you about Whoppers," Mary Beth Quirk, Consumerist, April 12, 2017, available from .

[16] "Amazon’s Alexa may soon throw ads into its responses," Allee Manning, Vocativ, May 12, 2017, available from .

[17] "Google Home now recognizes specific users’ voices, allows for multiple accounts," Chris Moran, Consumerist, April 20, 2017, available from .

[18] "Exclusive: Amazon developing advanced voice-recognition for Alexa," Lisa Eadicicco, Time, February 27, 2017, available from .

[19] "Shouting at your computer is the future of search," Allee Manning, Vocativ, June 3, 2016, available from .

[20] "For news publishers, smart speakers are the hot new platform," Lucia Moses, Digiday, July 23, 2018, available from .

[21] "Google Home is upping your cooking game with 5 million new recipes," Brett Williams, Mashable, April 26, 2017, available from .

[22] "Terrifyingly convenient," Will Oremus, Slate, April 3, 2016, available from .

[23] "WaPo is testing audio articles with Amazon tech," George Slefo, Advertising Age, June 9, 2017, available from .

[24] "How millions of kids are being shaped by know-it-all voice assistants," Michael S. Rosenwald, Washington Post, March 2, 2017, available from .

[25] "Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone," Bianca Bosker, Huffington Post, January 22, 2013, available from .

[26] "Amazon establishes Alexa Fund Fellowship to support universities researching voice technology," Nat Levy, GeekWire, March 2, 2017, available from .

[27] "Siri and Alexa are fighting to be your hotel butler," Hui-yong Yu and Spencer Soper, Bloomberg, March 22, 2017, available from .

[28] "Saint Louis University will put 2,300 Echo Dots in student residences," Jon Fingas, Engadget, August 16, 2018, available from .

[29] "Siri and Alexa are fighting to be your hotel butler," Hui-yong Yu and Spencer Soper, Bloomberg, March 22, 2017, available from .

[30] "Should an Amazon Echo help solve a murder?" Michael Reilly, MIT Technology Review, December 27, 2017, available from .

[31] "Did Alexa hear a murder? We may finally find out," David Kravetz, ArsTechnica, March 7, 2017, available from .

[32] "Amazon is bringing voice calls to the Echo," David Priest, CNET, May 9, 2017, available from .

and

"Amazon says caller blocking for Alexa/Echo is coming, amid customer complaints," Todd Bishop, GeekWire, May 13, 2017, available from .

[33] "Terrifyingly convenient," Will Oremus, Slate, April 3, 2016, available from .

[34] "Y'all have a Texas accent? Siri (and the world) might be slowly killing it," Tom Dart, The Guardian, February 10, 2016, available from .

[35] "How millions of kids are being shaped by know-it-all voice assistants," Michael S. Rosenwald, Washington Post, March 2, 2017, available from .

[36] "Google Home is cool, but catching up to Amazon Echo won’t be easy," Brian Barreett, Wired, May 19, 2016, available from .