314-725-3711 | Since 1998

XML or (eXtended Markup Language) has revolutionized how we deal with translation of online content and offline content. XML based content can now be found in almost about anything. In the translation industry, we have been working for a while with XML Localisation Interchange File Format (XLIFF). XLIFF is the XML standard for the localization industry under care of OASIS.

XLIFF is making life a lot easier for developers to prepare their software for localization. While the 2.0 standard has added a lot of consideration to localization best practices and is standardized for conformity, the basic structure of each XML file is based around a <source> and a <target> element. However, the bilingual nature of the file does present some workflow problems when using Translation Memory workflow.

Quality Control in Bilingual Files 

XLIFF was designed to get linguists back to focusing on words rather than code. And that’s a great thing. However, the most important development that affects our workflow is the bilingual nature of most XLIFF files. Our Translation Memory software (SDL Studio) is able to map the input (source) and output (target) separately into a bilingual workflow using our XLIFF file filter. This works great up to a certain point, but the standard presents complications that have not been totally worked out yet in Translation Memory software.

The main concern of the bilingual nature of XLIFF file is that it takes away an important quality control process. Whenever we get updates in XLIFF files (a content creator may update a page or post in the source language and then send it again for translation), the file may already pre-populated with existing translations. If you have a well managed translation workflow, this step is unnecessary. Our Translation Memory software is specifically designed to control translation updates in a centralized manner so that there is consistency. One concern with these formats is that translations can easily be manipulated and “creep” back into the translation workflow at a later time.

Traditionally, whenever we work with a translated document, we rely only on the latest English document and our Translation Memory. It’s a common misconception in our industry that providing past translations for reference or as control documents to work from helps our workflow. No matter how well intended, our quality control process is to always only work from the updated English and let the Translation Memory (TM) handle the updates.The TM is the only control document that allows us to see what has changed in the English and what needs to be updated. Plus, when changes are made to specific terminology, we can apply that to everything in the Translation Memory and re-apply that to the English.

CDATA support

One other issue with XLIFF files is the use of CDATA fields. CDATA fields are useful when exporting raw coded text because it preserves the text as markup language, rather than entities. However, CDATA can be terribly inconvenient in translation when a lot of your text is filled with HTML code or other coding. While XML has offered a much smoother process in getting content out of systems and back into systems, we still have to deal with code that interacts with the text we want to translate.

Luckily, our industry has moved forward on this as well and offers great support for basic HTML entity conversion within text. However, the XLIFF standard does not seem to have any provisions for embedded content in CDATA fields. XLIFF has been made very rigid to work with certain localization concepts, but working with embedded content is certainly not one of these provisions. Therefore, the use of XLIFF filters within our tools is getting to be quite useless. Any markup language gets exported out as plain text, rather than tags. This has forced us to treat XLIFF like any other XML file and create a filter that extracts text. Using XML filters, we are able to also process embedded HTML content in CDATA fields to come out nice and clean.

image of robot presenting explanation of xliff acronymAnother complaint about XLIFF is that with a lot of embedded content, we seem to lose segmentation. In the Translation Memory world, we work with clear segmentation of (typical) sentence based structures in order to break down the translation to workable chunks of information. These chunks of information also become a valuable asset for companies that provides consistency with repetitive data and also cost savings. Although there is code part of the XLIFF standard that defines segmented data, we’ll have to see if the various export platforms for WordPress will ever support that automatically or if it becomes a manual process.

Bilingual files are just bad for translation management

XLIFF certainly has gotten us further in managing the localization process. However, our quality processes and software are simply not used to dealing bilingual files. And even when we do have provisions to process XLIFF in the intended bilingual way, there doesn’t seem to be any options to work outside of the standard format. The ability to manage languages and do version control is a dream for both developer and language service provider, but we would like to see more flexibility to allow us to be able to segment information and to handle embedded code.

As part of our quality process, we keep the bilingual nature of translation strictly separate from the deliverable. The reason for that being that we like to keep control over the translation process using one control document that we try to manage as best as possible: The Translation Memory. Every piece of software that is known in our industry uses Translation Memory and Terminology Management as a way to control consistency and manage translation updates. When clients want to send us translations from outside sources, we explain that the issue with that is that it falls outside of our quality process. The mere existence of bilingual files seems to undermine the quality process. Standards don’t necessarily always improve a process.