In this post we’re going to interpret the adage “an ounce of prevention is worth a pound of cure” as “starting at the source can help eliminate pain points in translation.” Preparing content for translation can prevent delays, additional costs and version dilution, not to mention a pound of frustration!
Without knowing to some degree how translation software works, it can be difficult to know how to optimally prepare your content to be translated. Lucky for you, that’s why this blog is here! And even more luckily, the major Dos and Don’ts can mostly be broken down to one rule:
Translating from the original source file is always best, and a PDF is *not* a source document.
Can you create a PDF out of nothing? No. A PDF is always generated from another file type, such as a Microsoft Word document or an Adobe InDesign file. That means that where there is a PDF, there is another file type with the same content somewhere in the cybersphere. Tracking it down and providing it to your language service provider (or LSP as we say in the biz) will be the best first step to ensuring the smoothest possible translation experience.
Where Pain Points in Translation Begin: PDFs and Translation Software
Given the nature of PDFs, processing them in translation software almost always brings in elements that make the translation job difficult at best and impossible at worst.
Here’s why: translation software, such as SDL Trados Studio, which is what we use here at Language Solutions, Inc., breaks up jobs into segments. This is done for myriad reasons: so that translators can work in a more organized environment, so that quality checks can be done on smaller units and decrease the risk of missing errors, and so that the software can save concrete translations into its translation memory. For those unfamiliar, a translation memory (or TM) saves the source text and its translation so that the translator can reuse the same translation for any repeated or similar segments (always with the option to adjust for any changes to the source segment, of course). This improves both efficiency and consistency.
Pain Points in Translation, continued: Not All Returns Are Created Equal
Back to segmentation: in order to break projects into segments, translation software follows a set of rules (just like any software). One of these is that a hard return will signal the program to end the segment and put following text into a new one. A hard return is done by hitting “Return” or “Enter,” as many of us are likely accustomed to doing whenever we want to move text onto a new line. But there is another option: a soft return, which moves text down to a new line but indicates that the text on the different lines should be kept together as a unit. You can insert a soft return by hitting Shift + Enter. Here is what those look like in Microsoft Word:
You can see these marks by hitting your paragraph marker button:
Translation software will recognize the difference and do its best to keep text with soft returns in the same segment, since it is likely that text that should be kept together as a unit in the source language should be translated together as a unit in the target language.
Not All Spaces Are Created Equal Either!
Something else we see quite a lot in files is the use of a regular space when a non-breaking space would be a better choice. A non-breaking space is exactly what it sounds like: a space between two characters – two letters, a letter and a number, a letter and a punctuation mark, you name it! – that will not allow the two elements on either side of the space to be separated across multiple lines of text. For example, you have a measurement such as 2 mg or 6 m2 and you do not want the numeral and the unit to be separated from one another. Instead of simply hitting the space bar in between 2 and mg, you can hit Ctrl + Shift + Space to insert a non-breaking space. That way even if the measurement is at the end of a line of text, the numeral and unit will not break apart. Observe:
Your instinct when faced with the first sentence in the image above might be to simply hit Enter before the 2 so that 2 mg appears together on the following line. This can be problematic for a number of reasons. First of all, if you edit any of the text ahead of the measurement, where the line breaks will change but the return, hard or soft, will remain between the 2 and the mg. As we will see shortly, this can also cause issues with segmentation. It unnecessarily splits the sentence where it doesn’t need to be split. A non-breaking space maintains the continuity of the sentence. Plus, it signals to a translator that they should also use a non-breaking space in the translation, if applicable.
And just like hard and soft returns look different in Microsoft Word, so do regular and non-breaking spaces when you turn on your paragraph markers:
Segmentation: Why It Matters
Back to why this becomes an issue when processing (or attempting to process) PDFs in translation software: If a PDF was created from a source file that used hard returns to bring text onto a new line, translation software will see the hard returns and process each line as a separate segment. For example:
While the text on these three lines may need to be split up as such visually, those pesky hard returns will mean that translation software creates three separate segments: “In Business”, “for Over” and “20 Years”. And while this example only contains six words that can easily be seen to go together in English, the rules of grammar in different languages (preposition placement, for instance) means that translating each of these segments in a way that makes sense may not be as intuitive as is seems. Furthermore, what about our handy translation memory? It is more likely that we would want the program to remember how to translate “In Business for Over 20 Years” as one unit rather than each of the three separate units listed above – those are not as practical for leverage of content in future projects.
Translation software can also pick up spaces between elements in a PDF as column and section breaks which do not translate well (if you’ll allow us the pun) in segmentation.
What’s a Translator to Do?
So, where does this leave an LSP provided with a PDF for translation? With two options: to recreate the PDF into a different file for processing in translation software, or to process the PDF in translation software and then rework the segments so that they are not all split at hard returns and column and section breaks. While translation software allows for this, it is time consuming and the translator will have to continuously check where they are merging segments against the PDF file. This is not only tedious, but as with any manual process, leaves room for error! Plus, allowing translation software to segment the text, as it was made to do, is a huge time-saver. All of this means that using true source files (set up without unnecessary hard returns) instead of PDFs for translation allows for quicker turnarounds at a lower cost.
Troubleshooting: When Source Files Are Elusive
If you do not have access to the source file your PDF was created from, don’t despair! You still have ways to optimize your content for translation. Adobe Acrobat DC Pro comes equipped with an “Export PDF” function built in so that you can send the contents of your PDF to a Microsoft Word document or other more functional format for translation. This function has come a long way and whereas we always used to rely on the ABBYY PDF Transformer (more on that shortly), if your PDF was created from a Word file, Acrobat’s export function should be able to get it back to a Word document without too much trouble.
The ABBYY PDF Transformer mentioned above has some additional functionality that Adobe does not. While there is the option for a 1-click convert function in ABBYY, there is also the option to select how particular items are recognized. For example, if there is text on top of an image in your PDF that doesn’t get recognized, or there is text next to a logo and you want the text but not the logo in your converted file, you can use a selection tool in ABBYY to highlight this text for extraction. You can also highlight a table using the table tool to ensure that text in a table gets converted into a table in your export format.
PS: Both Adobe Acrobat and ABBYY can convert PDFs into Excel worksheets as well as Word documents (and a few other file formats, too). If your PDF contains tables and text, and you only want ABBYY to extract data from tables to go into Excel, there is an option for that: a checkbox that says, “Ignore text outside tables” (pretty intuitive, right?).
If you’re not sure which file type will work best for your translation project, don’t hesitate to contact your LSP – they’d be happy to help!
Unfortunately, this strategy is not entirely unproblematic. This is why having a copy of the original source file is always best, and why, if someone has to convert the PDF to an alternate file type, it’s better to have the owner of the content handle that rather than the LSP.
For starters, fonts can cause issues. If the PDF contains a font that you (or the translator) do not have in Microsoft Word, for example, Word will select a font that it deems to be similar. Sometimes this “substitute” font is indiscernible from the original, while other times it is, shall we say, less successful.
And if the PDF was created from an InDesign file and you export it to a Word document, a nightmarish web of column and section breaks and hard returns will likely end up in the Word file and make it almost as unusable for translation as the PDF (for the reasons outlined earlier).
Potential Pitfalls, Part II: InDesign
Since InDesign files are true source files, they ordinarily work beautifully for translation. The major exception is if the text they contain has been outlined (or created as outlines, which is the terminology used in InDesign). We talk about the pros and cons of outlining fonts in another post (and the designer and printer online community also has loads to say on the subject), but suffice it to say here that if you want your LSP to be able to translate your InDesign file, the text it contains cannot have been converted to outlines. It must still be editable text. Checking if your font is outlined should be easy enough – try clicking into the text to make changes (adding or deleting a letter, for example; just make sure you undo whatever change you make before sending the file for translation!). If you can make edits to the text, good news! Your text is still editable and your file should be a go for translation. If you cannot, this means that the text has been converted to outlines, and you’ll need a plan B.
There is also a lot of buzz online about “rescuing” text from its outlined form, and you may be able to turn outlines back into an editable font, though not without what may turn into a whole lot of additional time and frustration. Depending on the amount of text, and the difficulties you run into with the strategies you find online, you may simply end up retyping outlines as editable text. But who has time for that??
Morals of the Story
One reply that is repeated over and over again in online forums about outlining fonts is the same we’re preaching here: always keep a copy of your source file accessible. If you are not the one making changes to source files but you have been tasked with getting translations of new documentation or updates to existing documentation (we also address translations of updated documents here), you may need to have a conversation with the appropriate parties about ensuring that you have access to source documentation so that translation is as efficient a process as possible.
This last point relates back to our Global Communication Maturity Model (GCMM). Is translation something you need on a regular basis but source file accessibility continues to be restricted or nonexistent? Internal process management, in this case having access to source files, can make a huge difference in external process management, e.g., eliminating pain points, and thus increasing efficiencies of time and cost, in translation.
TL; DR? Key takeaways
- Source documents should always be your go-to when sending copy for translation.
- A PDF is not a source document.
- Avoid unnecessary hard returns and get in the habit of using soft returns when breaking up text that should remain as a single unit.
- PDFs can be converted to more translation-friendly formats using Acrobat’s export feature or the ABBYY PDF Transformer.
- Beware of any unwanted side effects of conversion, like font differences or column and section breaks, when exporting PDFs for translation.
- InDesign files must contain editable text, not outlines, to be used for translation.
- If you are responsible for contracting translation but do not have access to source documents, start a conversation about how you can save your company time and money by being able to send source files to your LSP.
If you are still scratching your head about all of this, don’t hesitate to ask a question in the comments or contact us! We’d love to discuss your translation needs and how we can work together to help eliminate your pain points in translation and optimize your global communication.