Sometime in the near future, we won't need to type on a smartphone or computer to silently communicate our thoughts to others.
"We're moving as fast as possible to get the technology right, to get the ethics right, to get everything right."
In fact, the devices themselves will quietly understand our intentions and express them to other people. We won't even need to move our mouths.
That "sometime in the near future" is now.
At the recent TED Conference, MIT student and TED Fellow Arnav Kapur was onstage with a colleague doing the first live public demo of his new technology. He was showing how you can communicate with a computer using signals from your brain. The usually cool, erudite audience seemed a little uncomfortable.
"If you look at the history of computing, we've always treated computers as external devices that compute and act on our behalf," Kapur said. "What I want to do is I want to weave computing, AI and Internet as part of us."
His colleague started up a device called AlterEgo. Thin like a sticker, AlterEgo picks up signals in the mouth cavity. It recognizes the intended speech and processes it through the built-in AI. The device then gives feedback to the user directly through bone conduction: It vibrates your inner ear drum and gives you a response meshing with your normal hearing.
Onstage, the assistant quietly thought of a question: "What is the weather in Vancouver?" Seconds later, AlterEgo told him in his ear. "It's 50 degrees and rainy here in Vancouver," the assistant announced.
AlterEgo essentially gives you a built-in Siri.
"We don't have a deadline [to go to market], but we're moving as fast as possible to get the technology right, to get the ethics right, to get everything right," Kapur told me after the talk. "We're developing it both as a general purpose computer interface and [in specific instances] like on the clinical side or even in people's homes."
Nearly-telepathic communication actually makes sense now. About ten years ago, the Apple iPhone replaced the ubiquitous cell phone keyboard with a blank touchscreen. A few years later, Google Glass put computer screens into a simple lens. More recently, Amazon Alexa and Microsoft Cortana have dropped the screen and gone straight for voice control. Now those voices are getting closer to our minds and may even become indistinguishable in the future.
"We knew the voice market was growing, like with getting map locations, and audio is the next frontier of user interfaces," says Dr. Rupal Patel, Founder and CEO of VocalID. The startup literally gives voices to the voiceless, particularly people unable to speak because of illness or other circumstances.
"We start with [our database of] human voices, then train our deep learning technology to learn the pattern of speech… We mix voices together from our voice bank, so it's not just Damon's voice, but three or five voices. They are different enough to blend it into a voice that does not exist today – kind of like a face morph."
The VocalID customer then has a voice as unique as he or she is, mixed together like a Sauvignon blend. It is a surrogate voice for those of us who cannot speak, just as much as AlterEgo is a surrogate companion for our brains.
"I'm very skeptical keyboards or voice-based communication will be replaced any time soon."
Voice equality will become increasingly important as Siri, Alexa and voice-based interfaces become the dominant communication method.
It may feel odd to view your voice as a privilege, but as the world becomes more voice-activated, there will be a wider gap between the speakers and the voiceless. Picture going shopping without access to the Internet or trying to eat healthily when your neighborhood is a food desert. And suffering from vocal difficulties is more common than you might think. In fact, according to government statistics, around 7.5 million people in the U.S. have trouble using their voices.
While voice communication appears to be here to stay, at least for now, a more radical shift to mind-controlled communication is not necessarily inevitable. Tech futurist Wagner James Au, for one, is dubious.
"I'm very skeptical keyboards or voice-based communication will be replaced any time soon. Generation Z has grown up with smartphones and games like Fortnite, so I don't see them quickly switching to a new form factor. It's still unclear if even head-mounted AR/VR displays will see mass adoption, and mind-reading devices are a far greater physical imposition on the user."
How adopters use the newest brain impulse-reading, voice-altering technology is a much more complicated discussion. This spring, a video showed U.S. House Speaker Nancy Pelosi stammering and slurring her words at a press conference. The problem is that it didn't really happen: the video was manufactured and heavily altered from the original source material.
So-called deepfake videos use computer algorithms to capture the visual and vocal cues of an individual, and then the creator can manipulate it to say whatever it wants. Deepfakes have already created false narratives in the political and media systems – and these are only videos. Newer tech is making the barrier between tech and our brains, if not our entire identity, even thinner.
"Last year," says Patel of VocalID, "we did penetration testing with our voices on banks that use voice control – and our generation 4 system is even tricky for you and me to identify the difference (between real and fake). As a forward-thinking company, we want to prevent risk early on by watermarking voices, creating a detector of false voices, and so on." She adds, "The line will become more blurred over time."
Onstage at TED, Kapur reassured the audience about who would be in the driver's seat. "This is why we designed the system to deliberately record from the peripheral nervous system, which is why the control in all situations resides with the user."
And, like many creators, he quickly shifted back to the possibilities. "What could the implications of something like this be? Imagine perfectly memorizing things, where you perfectly record information that you silently speak, and then hear them later when you want to, internally searching for information, crunching numbers at speeds computers do, silently texting other people."
"The potential," he concluded, "could be far-reaching."