Part 1 – Empowering the world and the Next Billion Users
When you hear ‘Voice,’ ‘digital assistant,’ or ‘smart speaker,’ you will have very different ideas depending on where you live (and I use Voice to cover not just natural language processing and recognition, but everything it enables through digital assistants). This will range from ‘I hear a lot about it, but all it does is play music and turn the lights on and off. Why would that ever really catch on?’ to ‘Duh, everyone uses it and it’s super useful. Of course Voice is huge.’ to ‘Voice is changing the world in massively meaningful ways.’ Spoiler alert, I’m in the last camp.
So, why are our ideas on the subject so varied? At iProspect we surveyed roughly 10,000 people across APAC, Europe, and the Americas, and we discovered some very interesting distinctions. There are dynamic markets, like China, India, Indonesia, and Latin America, and there are conservative markets like Europe, Japan, and Australia. Voice adoption in dynamic markets is very high, often topping 70% of smartphone users, while adoption is slower in more conservative markets.
This is due to a couple factors it is important to keep in mind when working with Voice. Dynamic markets undergo significant change and upheaval on a much more frequent basis than conservative markets. Let’s look at APAC since it has such a representative dichotomy of these types of markets. In the last 20-30 years we can see incredible growth that it took conservative, Western markets 100 years to similarly achieve:
(For those who haven’t yet read Hans Rosling’s fantastic book Factfulness, it goes into these trends and more. I highly recommend it. Also, you can play around with the data here.)
Moving from level 1 to level 3 is very substantial. It’s the equivalent of going from getting your drinking water from a well or river, to having running water in your home, but across all aspects of your life. Truly life-changing on many levels.
What that means is the only thing that has been consistent in dynamic markets has been change leading to a more open mind in trying new things and adopting new technology. This has in turn led to interesting innovation. In India, Indonesia, LATAM, and parts of Africa we are seeing the rapid adoption of smart-feature phones—phones that run a slimmed down version of Android called KaiOS, on very inexpensive ($20~$40 USD) hardware. No expensive OLED touchscreens here, but what you do have is a microphone button which activates the Google Assistant, allowing you to navigate your phone and the internet with just your voice.
Take a minute to consider what this means. Try and forget all the phones and tech you’ve used until now. Forget about Swype, forget about Blackberry’s full keyboard, and forget about T9 prediction. Even forget using a keyboard and mouse. You’re presented with a nearly unlimited fountain of information (the internet). It can open your world to new opportunities and knowledge beyond anything you’ve yet experienced. Now, given the choice of having to learn a completely new way to interface through key presses, or continue on simply speaking as you have since as far back as you can remember, chances are you’ll just keep using your voice. And for those who lack access to education or more expensive technology—the illiterate, the poor, and many others—it’s their first real opportunity. No more relying on others for information. There is now an alternative to what’s on the radio or TV, what everyone in the community is saying. And all you have to do now is ask.
Voice technology is leading to not only mobile-first markets, but Voice-first markets. Currently 3 out of 4 people are connected to the internet around the world, and those who are not? Voice is helping to bring them online at a blistering rate. Google calls these the Next Billion Users (NBU). Voice, cheap data plans and tech, and then the internet will continue to accelerate communities from Level 1 to Level 2 to Level 3 and to Level 4. If you’re a business, it’s worth planning and putting into action how you will help these consumers today.
Pretty exciting right? Keep up because we’re just getting started.
Part 2 – Personalization at scale and the future of advertising
Let’s look at the conservative countries (likely where you live if you’re reading this). These markets are used to a high standard of living and anything below a certain level of quality or usefulness is often disregarded. You probably remember the first time you interacted with Voice—likely Siri, who started at a frustratingly low level of accuracy. Contrast that to 3 years ago (2017), when English language voice recognition technology passed 95% accuracy, the level at which humans recognize what is being spoken to them. The machines are now better listeners than we are, but many of us still have a bad taste left in our mouth from our first interaction.
And that leads to questions like ‘when is Voice really going to catch on?’ And the answer for these markets is that it will catch on whether make an effort to adopt or not. The next TV or microwave your parents buy, the next car you drive, the next watch or wearable you gotta have—all of these will have a Voice assistant very soon, if they don’t already.
Voice will come to you, whether you’re ready or not.
Which means those who get in on the ground floor are going to have a huge advantage in what will likely be the largest platform shift since mobile. So what are we, as businesses, doing about it? Not enough. We do Voice SEO (search engine optimization), to make sure our answer is given from Google Assistant or Siri. We optimize product listings so that our client’s products are recommended when someone asks Alexa to buy them something. And we use location data, product feeds, and inventory levels to give the best recommendations to someone looking for a product in-store or restaurant nearby. But this is just scratching the surface of what Voice has to offer.
There is a good amount of buzz going around marketing circles lately about ‘personalization at scale.’ For those who aren’t in the biz, personalization is the practice of optimizing experiences to each individual user, often through the use of data. So when I visit Amazon, I get recommended products more relevant to me, and when you visit, your recommendations are better suited to you. This can happen everyone from advertisements you see, to product recommendations, to even the McDonald’s menu you order from. When overdone this can be creepy, but when done properly it’s incredibly useful for consumers as well as businesses. So why am I talking about personalization in a piece about Voice? Because it’s the ultimate way to personalize.
Voice is the key to personalization at scale.
Consider chatbots. These, too, have come a long way thanks to the advances in natural language processing (or how a computer understands information from how we humans typically communicate). These bots have come from the rigid question-in answer-out, to taking the nuances of user input and matching it with information and action that is most relevant. Google recently announced a ‘bot’, Meena, who can talk about anything. Meena is trained on 2.6 billion parameters from roughly 341GB of text data. (For context, when made into a .txt file, this entire article is 20KB. 341GB is 341,000,000KB. It’s a lot of data.) This allows for very consistent conversations. Some research into comparing this to how humans converse naturally puts Meena just 7% (86% for humans vs 79% for Meena) behind humans.
If chatbots have gotten more useful, what are the business implications? Well, you can now chat with advertisements. Whether a text-based chat embedded in a banner ad on a website, or an interactive ad that comes on when listening to Pandora. This means that a well-developed chat can be much more effective than a static advertisement. It can be truly personalized to any and all users. Programming multiple value propositions into a chat can allow it to identify the best fit for the user interacting with it. Instead of a banner ad with just one message, you can have a chatbot embedded in an ad that can propose any and all of your products, services, and features. That’s a lot more personalized, and can be infinitely scaled.
Everyone in advertising is also quite worried about Google phasing out ‘cookies’ (Apple’s Safari browser already does this). Cookies are small signals that websites place into your browser to keep track of you (see this article which was also listed above for more info). The website can use this to know that you are the user logged in so they don’t have to ask you to log in again, for example. These are also used to retarget advertisements on other websites to users who have visited their website and various other things. With these going away (due to privacy concerns), businesses are worried about how they will retarget consumers to come back. Well, Voice helps with this as well. You don’t need cookies and retargeting to talk to a consumer you’ve opened a chat with. You’re in their direct messages. Just send them another message and you’re good to go.
You can also connect these chatbots to your order system and have consumers purchase via chat, without leaving the site or platform they were already on. No more losing sales because someone
clicked on a link
that opened a new website
that loaded slowly due to poor optimization and distant servers
which asked them for info they already put in on the previous site
then asked for more information when checking out beyond what is necessary or comfortable for the sale
resulting in the person getting frustrated and deciding it’s not worth it. And people wonder why conversion rates online are low.
Think of this change as the difference between a billboard and a salesperson. Not only can you dynamically alter your value proposition to exactly what the consumer needs, but you can also get immediate feedback when someone is not interested, and use that to better advertise in the future. How many times have you seen ads for something you’ve already bought or something you have zero interest in? With the rise of adblockers and click-through rates in single digits, it’s safe to say everyone has. One of the reasons why the advertising industry isn’t as popular as it once was is because of this. It’s annoying. I work at an advertising agency and I use the Brave Browser because it automatically blocks ads and other intrusions. Feedback through Voice advertisements will allow advertisers to stop wasting their money on these scenarios. Long have advertisements been a one-way street: “Buy this! Look here! Listen to me!” Conversational Advertisements open a dialogue with the consumer. And everyone prefers to be heard to yelled at.
Advertising has been a one-way street. Voice makes it a dialogue with the consumer.
But as we’ve seen with the internet, it won’t be all positive. Personalization at scale through Voice also means disinformation at scale is possible as well.
Part 3 – The future accelerated: more meaningful connections
Let’s talk about Voice Applications. Alexa calls these skills, and Google calls them actions, but at the end of the day they are voice-enabled applications which a digital assistant can access. And they will change the world.
Think about the human body. We have various data inputs (vision, hearing, smell, taste, touch) into a central processor (the brain) which then outputs (via speech and touch). Voice today is much simpler. Smart speakers have ears (mics), brains (backend processing and chat logic), and mouths (speaker). Assistants in cell phones go a step further having eyes (cameras) and an output we humans lack–screens. What is most interesting is the opportunity to expand the capabilities of a digital assistant via additional inputs (sensors for pressure, temperature, light, motion, etc. spread across distance and in multiples) and outputs (screens, AR, VR, and hopefully soon things like volumetric images and parametric speakers creating 3D images you can feel and sounds you can hear while someone next to you cannot).
The key to utilizing this is the backend (the brain). And luckily this is very open-ended already, so often the only thing limiting us is ourselves (and funding). Going beyond just chatbots, we can have videos and entertainment enhance these apps into full-on experiences, like this cocktail mixing skill we worked on for Diageo. On mobiles, you can use voice apps to deep-link into your mobile applications on iOS or Android. What this means is you could ask Google Assistant or Siri to do something, and the assistant would then open your mobile app to that specific function, no additional navigation required. My mom uses this feature all the time, without even realizing it, when she asks Siri to give directions via the Waze app. Voice apps also allow you to talk to a smart speaker or digital assistant which in turn can give additional information or lead to a purchase. It could also send information or links to your mobile device for later use. And with studies saying anywhere from 41% to over 50% of American households have a smart speaker (and smart devices will only continue to proliferate as we mentioned in Part 2), Voice can activate traditional advertising in a very interesting way. When watching a TV advertisement, or when you see a billboard, or while listening to the radio you could be told to ‘Ask Siri…’ or ‘Say Hey Google…’ for more information. Similar to what can be done with QR codes, links, and search keywords, but easier to remember and quickly acted on as you just have to say the word.
Voice is already massively useful in many circumstances, but what is around the corner will be truly revolutionary. And that is the convergence of smart assistants with experts. You’ve likely seen that John Legend lent his voice to the Google Assistant or that Samuel L. Jackson lent his to Alexa.
This is just the start. Voice cloning, similar to DeepFakes video, is no longer something of science fiction thanks to companies like Resemble.ai and Sonantic.io, as well as Amazon Polly and Google Cloud’s text-to-speech protocol. It’s time to think about sonic branding and who your brand’s literal voice should be.
Beyond just the voice, the neural processing behind these assistants allows us to better upload know-how and expertise as well. By cloning their voices and uploading their knowledge, you could do any of the following, even today:
- Wake up and have Marc Jacobs pick out your outfit for the day.
His fashion sense could be combined with image recognition APIs to identify how well your outfit fits your body type. In fact, the Amazon Echo Look took an initial stab at this.
- Take a break at lunch to discuss books with Bill Gates.
He could easily upload his thoughts on the latest books he’s read to share with you, and you could have a conversation about it.
- Get football lessons from Ronaldo in the afternoon.
Upload his training regimen as well as expertise, and then have the user go to a football field. Use the AR function in your phone to map the field and your location on it. Attach the phone to your chest or use a tripod while you do some dribbling and kicking drills. Image recognition combined with a pressure sensor in your shoe (like this one made by Google + Adidas) or the ball can diagnose the angles and power of your actions, and then Ronaldo can critique you. Take the phone from your chest and look through the screen to see Ronaldo projected next to you on the field, demonstrating the perfect form.
- Cook dinner with Gordon Ramsey.
Again, using image recognition or even just user input, combined with heat and/or smoke sensors, you could have Gordon screaming down your throat that you’ve burnt the bloody chicken breast to hell and no one will want to eat it as it will taste like sawdust!
- Get an Ivy League education anywhere, anytime.
The current education system is broken. It’s impossible for any teacher to perfectly meet the needs of each individual student. Especially with class sizes of 30 or more. Especially with ‘standardized’ curriculum. Especially with an end goal of passing a test and not true learning. It’s simply too tall of a task for our earnest teachers to do by themselves. But if they band together? If all the best teachers in the world were your exclusive tutor, I’m confident you would learn at an unbelievable rate. So let’s do it. Upload the teaching methods of 100 of the best teachers in the world to a digital assistant. You could even use all the informational videos on YouTube as a data source to start building this. Program logic that personalizes to each student. When a problem is incorrect in this way, expand with this explanation. When a student asks this question, teach more about this. When they don’t ask anything, use this matrix of questions to identify their needs and meet them (Google made an interesting action to teach Indian kids Hindi and English). Anyone, anywhere in the world could access this super teacher at the same time. Think of the benefits a much more educated world would have. This may be the best way to fight the disinformation at scale mentioned earlier.
(If you want to reach further, you could say these assistants are the beginning of ‘uploads,’ the idea of people uploading their consciousness to a digital medium, perhaps to ‘live’ forever. But that’s a whole other discussion for another day.)
Nearly everyone on Earth is now connected thanks to the internet. But that connection is quite shallow. Voice and assistance will make those connections incredibly rich and rewarding. As one person you can only have a conversation with one other person at a time, but combined with a digital assistant, you can talk with anyone and everyone who will listen, simultaneously. This is what makes Voice so exciting. One of my favorite books is Sapiens: A Brief History of Humankind by Yuval Noah Harari (highly recommend you read it as well as his other books). In it, Yuval discusses how collaboration has continually revolutionized mankind. Banding together and communicating to take down a mammoth in the Ice Age. Working together with agriculture to feed many. Connecting the lessons of the past and communicating across great distances with written language. Concerted collaboration of millions in a united endeavor via non-physical organizations and corporations. And today, nearly everyone connected further via the internet. You can’t connect more people than everyone. It is literally everyone. But from there you can increase the quality of those connections. Each of those steps listed above, the number of connections increased but the quality of each decreased. Voice and digital assistants allow those shallow, inefficient connections to become deeper, meaningful to each individual, and scaled infinitely in parallel. This is what Voice is about.