At the end of its I/O presentation on Wednesday, Google released a “one more thing” surprise. In a short video, Google showed off a pair of augmented reality glasses that have one purpose: to display audible translations right in front of your eyes. In the video, Google Product Manager Max Spear called the prototype’s ability “captions for the world,” and we see family members communicating for the first time.
Now, just wait a second. Like many people, we’ve used Google Translate before and largely consider it a very impressive tool that does a lot of embarrassing duds. While we can trust him to give us directions to the bus, that is not the same as trusting him to correctly interpret and convey our parents’ childhood stories. And hasn’t Google already said that it is finally breaking the language barrier?
In 2017, Google released real-time translation as a feature of its original Pixel Buds. Our former colleague Sean O’Kane described the experiment as “a laudable idea with a dismal execution” and reported that some of the people he tried it with said he looked like he was five years old. . That’s not quite what Google showed in their video.
Also, we don’t want to ignore the fact that Google promises that this translation will take place inside a pair of AR glasses. Not to hit a nerve, but the reality of augmented reality hasn’t even quite caught up with Google’s concept video of a decade ago. You know, the one that served as the predecessor to the much-maligned and embarrassing to wear Google Glass?
To be fair, Google’s AR Translation Glasses seem a lot more focused than what Glass was trying to accomplish. From what Google has shown, they’re meant to do one thing – display translated text – not act like an ambient computing experience that could replace a smartphone. But even then, making AR glasses isn’t easy. Even a moderate amount of ambient light can make it very difficult to see text on transparent screens. It’s hard enough to read captions on a TV with a bit of sun glare through a window; now imagine that experience but attached to your face (and with the added pressure of striking up a conversation with someone you can’t understand on your own).
But hey, technology is changing fast – Google may be able to overcome a hurdle that has been holding its competition back. It wouldn’t change the fact that Google Translate isn’t a magic bullet for multilingual conversations. If you’ve ever tried to have a real conversation through a translation app, you probably know that you need to speak slowly. And methodically. And clearly. Unless you want to risk a garbled translation. One slip, and you might be done.
People don’t converse in a vacuum or as machines do. Just as we change code when talking to voice assistants like Alexa, Siri, or Google Assistant, we know we need to use much simpler sentences when dealing with machine translation. And even when we speak correctly, the translation can still be clumsy and misinterpreted. Some of our Edge Colleagues fluent in Korean have pointed out that Google’s own pre-roll countdown for I/O displays an honorific version of “Welcome” in Korean that no one actually uses.
This slightly embarrassing flub pales in comparison to the fact that, according to tweets by Rami Ismail and Sam EttingerGoogle showed more than half a dozen reversed, broken, or otherwise incorrect scripts on one slide during his presentation Translate. (android font note that a Google employee acknowledged the error and it was corrected in the YouTube version of the keynote.) To be clear, it’s not that we expect perfection – but Google tries to tell us that it’s close to cracking the real-time translation, and those kinds of errors make that incredibly unlikely.
Congratulation to @Google to get the Arabic script back and offline for @sundarpichai‘s on *Google Translate*, because small independent startups like Google can’t afford to hire someone with knowledge of Arabic script at the elementary level of a 4-year-old. pic.twitter.com/pSEvHTFORv
— Rami Ismail (رامي) (@tha_rami) May 11, 2022
Google is trying to solve a immensely complicated problem. Translating words is easy; understanding grammar is difficult but possible. But language and communication are much more complex than these two things. As a relatively simple example, Antonio’s mother speaks three languages (Italian, Spanish and English). She sometimes borrows words from one language to another in the middle of a sentence, including her regional Italian dialect (which is like a fourth language). This stuff is relatively easy for a human to analyze, but could Google’s prototype glasses handle it? Never mind the messier parts of the conversation like unclear references, incomplete thoughts, or innuendos.
It’s not that Google’s goal isn’t admirable. We absolutely want to live in a world where everyone can experience what research participants are doing in the video, watching in wide-eyed wonder as they see the words of their loved ones appear in front of them. Breaking down language barriers and understanding each other in a way we couldn’t before is something the world needs much more; it’s just that there’s a long way to go before we reach that future. Machine translation is here and has been for a long time. But despite the plethora of languages he can handle, he doesn’t speak human yet.