Machine Translation: What is it?
What is machine translation? Machine translation (MT) refers to automatic translation done by a computational engine, or program, which produces “complete” output in the target language based on source language input. It can be distinguished from translation tools – such as digital dictionaries, glossaries, and translation memories – by the autonomy with which it works. It is common for human translators to use software that facilitates their use of multiple digital tools; this is described as computer-assisted translation (CAT). Machine translation, on the other hand, does not allow for human input until the final stage – known as post editing.
Computer-Assisted Translation vs Machine Translation
When using CAT tools, a translator may be prompted with translation suggestions based on similar translations they have used in the past. They may have identical phrases auto-completed with pre-existing translations, and they may have a digital glossary that checks against pre-approved terminology while they are working. All of this, however, is controlled by the translator and their actions, settings, and uses stored human translations. They allow the human translator to work more consistently and efficiently. But the human is still central to the process – even identical phrases may need to be translated differently based on a new context, so there is human engagement at every step.
When a machine translator is used, the AI (Artificial Intelligence) makes all decisions for all text. Source text is entered and target translation is provided. The analysis and decisions are essentially a black box. Once the system is designed and during the process, the translator has no ability to change or guide the output. Once the translation is provided, then – and only then – may the human intervene by editing the machine’s translation. In this way, it is mimicking the normal translation team of two humans – a translator and an editor – but with one half being an AI.
Machine Translation: How does it work?
In older, rules-based systems, each word was identified, parsed for meaning and grammar, and then step-by-step recreated in the target language. This was a laborious, highly involved system to design as any one language’s grammar and range of potential wordings and nuances is near boundless, meaning that a purely rules-based system is bound to perform poorly outside of simple textbook translations.
The next major form of AI designed for translation was statistical, that is: it took the bilingual corpus texts and analyzed them for likely connections. This is similar to predictive text technology first explored by Chinese typographers as early as the 1950s and which came to a global audience through mobile phone services such as T9 starting in the 1990s. When a language pair has a large amount of bilingual text with which to train the engine, statistical MT can become very accurate; however, it struggles with rarer language pairs and has severe limitations when the subject matter, speech register, or dialect differs from the trained material.
Most engines used a hybrid system that leveraged both rules-based and statistical algorithms to translate material. This allowed for the relative strengths and weaknesses of each approach to balance out. Some popular systems today still use this, such as SYSTRAN. Google Translate, arguably the most widely recognized and used online MT resource, stood out for essentially pursuing a pure statistical approach until the recent introduction of the neural machine translation (NMT) software in 2016. Microsoft Translator – the engine underlying Skype, Bing, and others – released its hybrid NMT model for their translation software that same year. NMT was a natural outgrowth of the technology sector’s growing enthusiasm for deep learning and other forms of advanced artificial intelligence development.
Why do we still need human translators?
MT technology has grown by leaps and bounds since its early days, and further innovations in AI and machine learning are sure to improve the quality of MT output over the coming years. (Learn more about NMT here.) However, language is a dizzyingly complex phenomenon – hence the need for smart, self-learning AI over exhaustive rule books in the design of MT systems. Currently, NMT systems work through artificial neural networks that improve via three main vectors: (1) deep learning, (2) feature learning, or (3) human correction. That last portion is still key, in many ways.
Why is language so difficult? Because language is dynamic – it changes across time and context, it is flexible – it can bend to fit the speakers’ needs and desires, and it is complex – morphology, word order, context, social register, and orthography all come to play when determining the meaning of any phrase. Even human translators can sometimes make mistakes when they don’t have the full context or comprehension of a given word or phrase, but they usually know when to be cautious of misstep. AI, on the other hand, has no self-awareness and so no idea when it is treading on thin ice and may not have all the necessary information to make a confident translation.
So, when using MT to accomplish anything other than trying to understand someone’s Facebook comment, there is a level of accuracy that is required which cannot be guaranteed by the technology alone. Whether documents are filed for a clinical trial, sent for international finance, or submitted for immigration, the authorities involved will expect the translation to be complete and accurate. Machine translation has no way of double-checking itself for context, completeness, and accuracy that satisfies this need for quality. (Plus, there may be other issues, as we’ve written about here.) For this reason, a human translator is still required to postedit the MT translation – and sometimes two linguists work on it together, treating it more like raw material to be worked on rather than a legitimate starting translation.
To be clear, MT is playing an increasingly large role in the translation and localization process, but it is far more likely to replace CAT tools in the coming years, than it is to replace the human element of translation. Language is just too human for computers to perfectly understand AND produce, for now. Not that this will always be the case – and, indeed, it is likely AI will be able to adequately learn formal, standardized language very soon. But it remains to be seen if NMT can ever produce flawless natural language on par with humans themselves.