Google needs to fix its notoriously bad bilingual speech recognition on Assistant and Gboard (Update
Google collects vast amounts of speech data across all of its products, and while it hasn't been too transparent about the practice, we as users profit from it for the most part. Speech recognition has consistently gotten better over the years, which has allowed impressive sci-fi tech like smart speakers to enter our homes. There's one department where Google needs to step up its game, though: multilingual speakers are having a hard time using more than one language on any Google product.
While there are a lot of people that live all of their lives speaking one language only, many countries around the world routinely juggle between two or even more all the time. These days, plenty of products are still developed by people with only one primary tongue, so bilingual support is often not as good as it could be (which is also true outside of speech and audio processing).
That's a problem for people who switch between more than one language regularly. Android Police founder Artem routinely struggles with Gboard's dictation feature while our editor Rita just straight up removed her secondary language, French, from Assistant. There are also reports on Reddit complaining about issues arising from a bilingual Assistant. Some Redditors have even noticed that their Google Home speakers need more processing time when they have to juggle two languages.
Here are some examples from Gboard's bilingual woes for me, a native German speaker. These issues would certainly be amplified for people using more than two languages.
As you can hear and see, there are also some instances where Google arbitrarily switches languages in the middle of a sentence while processing. That sometimes does make sense (I change languages mid-sentence depending on who I talk to), but it often happens when it's not supposed to happen. That's the case in these examples, where Google completely missed that I was talking English and created some nonsense German gibberish, translated for your convenience:
"I will now demonstrate how bad[ly] Google understands me" -> "Arena Damen straight Herbert Google landesgrenzen" -> "Arena ladies straight Herbert Google borders"
"If only there was a way I could automate this" -> "BVG ticketautomat" -> "BVG vending machine" (BVG is the public transport agency here in Berlin, so this combination surprisingly makes sense.)
"Hold on, I still have to finish the article" -> "AUTOMEISTER Haftbefehl nicht radeke" -> "CARMASTER arrest warrant not radeke" (that last word isn't even a word in German, and I have no idea why CARMASTER would show up in all caps.)
Note that it took a few tries to get this level of misunderstanding out of Gboard and in the snippets you hear here, my accent showed through the most, which could amplify the problem. Still, Gboard should be able to notice that what it's producing in German is pure gibberish and switch to English instead.
The Assistant also messes up when you set it to two languages. Let's start with a classic I hear about every second day when I try to set a timer in English, recorded by my Google Home devices:
I told Google to "Set a timer for 5 minutes." For some reason, Google then decided to understand the first part in German: "Stelle Timer" instead of "set a timer." The Assistant got the rest of the sentence in English, so it decided that "for five minutes" must be the name of the timer (as in, "Set a timer called 'fries'") and asked me how long it should be. That's when I need to answer in German and tell it to set a timer for five minutes, again. The answer, fully translated: "Sure, timer for five minutes called 'for five minutes.' Starting now." ¯\_(ツ)_/¯
Translation of the German nonsense sentence above: "Timer for already." What I actually said: "Turn off the lights."
Every once in a while, Google likes to understand something completely different. When I said "turn off the lights," it thought I wanted to set a "timer for already" in German, a sentence that just doesn't make any sense.
These issues don't just arise for people with thick accents. Artem doesn't have an accent at all, and Google still sometimes thinks I'm speaking English when I'm addressing it in (accent-free) German. The same is true for Rita with her English/French Assistant combination. It keeps happening to us to the point where it's more convenient to just turn off one of the languages altogether.
On Gboard, it would already help if there was a quick and easy way to nudge Google into re-transcribing what you've just said — maybe a long-press menu that lets you choose which language the keyboard should try again?
As for smart speakers, that's somewhat more difficult but certainly doable. Why can't I follow up a misheard, mistranslated command by saying, "Hey Google, that was German, try again?" This approach still falls short for multilingual households that throw in three different languages and slangs in one sentence, but that's not something I would reasonably expect at the moment given the bilingual woes we live with.
Google could also use a combination of location data and browsing history to assess my native language. Then it could expect my English to sound a little more German and develop some tolerance for mistakes or unusual pronunciations. Of course, this could work for any other combination of languages. But this is a next-level solution, and I'd believe that's further out and harder to implement than other approaches.
Languages are hard, and we have to remind ourselves that we often don't end up understanding each other in our everyday lives either. I feel like we might be expecting too much perfection from our machines. However, what algorithms do lack is a way of quickly ruling out false positives and understanding intent.
If I were to tell a person to set a timer while I'm cooking and that person had to decide whether I said "10 minutes" or "10 hours," they would most likely opt for minutes given the context. The same is true across languages. If I only understand gibberish because I'm tuned to English at a given moment (trust me, this happens), I pause and try to go through what I've heard just to realize it was actually German.
For machines, this is probably by no means an easy feat, but Google has already managed to isolate a single voice in a crowd in a two-year-old tech demo, so it seems odd that multilingual support is still so subpar.
We have to give Google credit where credit is due, though: Amazon has only introduced bilingual support to Alexa in October 2019 while Siri doesn't support more than one language at a time at all. This gives me hope that Google has just been so early to the game (remember Google Now?) that there's just a lot of room for future improvement. Google is also pioneering user-accessible on-the-fly translations via Assistant and instant camera translations, so it's clear the company recognizes how important this field is.
In the meantime, I've turned off German on my Assistant, and apart from difficulties with local addresses and songs, the experience is so much better now.
Over the last week, I've tested a way to mitigate the bilingual woes on Gboard, but I wouldn't call it a proper solution. Head to the keyboard's language settings and toggle off the default multilingual typing option. Then you can manually change languages whenever you want to write or dictate in another tongue. That's less convenient than the automatic switch that usually works flawlessly when you type by hand, so what I've done is this: I've only deactivated multilingual input on my English keyboard to improve English dictation and left it enabled on my primary German keyboard.
Gboard's voice typing accuracy is now much improved for me, though I'm still occasionally seeing one or two misunderstood English words sprinkled among my German dictation. Just like turning off your secondary language on Assistant, this isn't a real fix for the underlying issues Google has with multiple languages, but playing with these settings might help you, too.