sohogasil.blogg.se - Google translate bot wiki

Google translate bot wiki tv#

(Speakers who can even, in the case of Swedish, create a bot to automatically make basic Wikipedia articles for rivers, mountains, and other natural features.)

They have highly literate speakers with internet access who can contribute to projects like Wikipedia. They have regularized spelling systems and dictionaries that can be rolled into spellcheckers and predictive text models. They're found in countries that tech companies imagine their customers might be living in or might at least visit on holiday, meaning it's worth localizing interfaces and adding them as translation options.

Google translate bot wiki tv#

Their speakers have the kind of disposable income that makes media companies translate popular novels and subtitle foreign movies and TV shows. They're the languages of entire nation-states, with national TV and radio recordings that can be used as the foundation for text-to-speech models. In addition to EU documents, Swedish, Greek, Hungarian, and Czech have a wealth of language resources, created one human at a time over centuries. Translation tools are already scraping the bottom of the parallelĬorpus barrel: In many languages, the largest parallel translated text Translation tools are already scraping the bottom of the parallel corpus barrel: In many languages, the largest parallel translated text is the Bible, which leads to peculiar circumstances where Google translates nonsense syllables into prophecies of doom. The machine can't translate informal social media posts very well if it's been trained only on formal legal documents. Ideally, this corpus contains documents from a variety of genres: not just parliamentary proceedings but news reports, novels, film scripts, and so on. In order to be reasonably effective, machine translation requires an enormous parallel corpus for each language. Machine translation engines use parallel corpora to figure out regular correspondences between languages: if "regering" or "κυβέρνηση" or "kormány" or "vláda" all frequently appear in parallel to "government," then the machine concludes these words are equivalent. Human-translated documents make a great base for what linguists call a parallel corpus - a large mass of text that's equivalent, sentence-by-sentence, in multiple languages. Part of the reason is that Greek, Czech, Hungarian, and Swedish are among the 24 official languages of the European Union, which means that a small hoard of human translators translate many official European Parliament documents every year. Why do Greek, Czech, Hungarian, and Swedish, with their 8 to 13 million speakers, have Google Translate support and robust Wikipedia presences, while languages the same size or larger, like Bhojpuri (51 million), Fula (24 million), Sylheti (11 million), Quechua (9 million), and Kirundi (9 million) languish in technological obscurity? And Oromo, a language spoken by some 34 million people, mostly in Ethiopia, which has just 772 articles in its Wikipedia. But there’s also Odia, the official language of the Odisha state in India, with 38 million speakers, which has no presence in Google Translate. There’s Swedish, which has 9.6 million speakers, the third-largest Wikipedia with over 3 million articles, and support in Google Translate, Bing Translate, Facebook, Siri, YouTube captions, and so on. These midsize languages are still fairly widely spoken, but they have vastly inconsistent levels of support online. Her book Because Internet: Understanding the New Rules of Language is due out in July 2019 from Penguin.īut in the murky middle ground are a couple hundred languages that are spoken by speakers in millions. She's the cocreator of Lingthusiasm, a podcast that's enthusiastic about linguistics. Gretchen McCulloch is WIRED's resident linguist.