In this post we’re going to interpret the adage “an ounce of prevention is worth a pound of cure” as “starting at the source can help eliminate pain points in translation.” Preparing content for translation can prevent delays, additional costs and version dilution, not to mention a pound of frustration!
Without knowing to some degree how translation software works, it can be difficult to know how to optimally prepare your content to be translated. Lucky for you, that’s why this blog is here! And even more luckily, the major Dos and Don’ts can mostly be broken down to one rule:
Translating from the original source file is always best, and a PDF is *not* a source document.
A PDF is always generated from another file type, such as a Microsoft Word document or an Adobe InDesign file. That means that where there is a PDF, there is another file type with the same content somewhere in the cybersphere. Tracking it down and providing it to your language service provider (or LSP as we say in the biz) will be the best first step to ensuring the smoothest possible translation experience.
Where Pain Points in Translation Begin: PDFs and Translation Software
Given the nature of PDFs, processing them in translation software almost always introduces elements that make the translation job difficult at best and impossible at worst.
Here’s why: translation software, such as SDL Trados Studio, which is what we use here at Language Solutions, Inc., breaks up jobs into segments. This is done for myriad reasons: so that translators can work in a more organized environment, so that quality checks can be done on smaller units and decrease the risk of missing errors, and so that the software can save concrete translations into its translation memory. For those unfamiliar, a translation memory (or TM) saves the source text and its translation so that the translator can reuse the same translation for any repeated or similar segments (always with the option to adjust for any changes to the source segment, of course). This improves both efficiency and consistency.
Not All Returns Are Created Equal
Back to segmentation: in order to break projects into segments, translation software follows a set of rules (just like any software). One of these is that a hard return ends the segment. A hard return is done by hitting “Return” or “Enter,” as many of us are likely accustomed to doing whenever we want to move text onto a new line. But there is another option: a soft return, which moves text down to a new line but indicates that the text on the different lines should be kept together as a unit. You can insert a soft return by hitting Shift + Enter. Here is what those look like in Microsoft Word:
You can see these marks by hitting your paragraph mark button:
Translation software will recognize the difference and do its best to keep text with soft returns in the same segment.
Not All Spaces Are Created Equal Either!
Something else we see quite a lot in files is the use of a regular space when a non-breaking space would be a better choice. A non-breaking space is exactly what it sounds like: a space between two characters – two letters, a letter and a number, a letter and a punctuation mark, you name it – that will not allow the two elements on either side of the space to be separated across multiple lines of text. Using non-breaking spaces appropriately can prevent pain points in translation as well as in version control of your source document, should you need to make edits at a later date.
For example, you have a measurement such as 2 mg and you do not want the numeral and the unit to be separated from one another. Instead of simply hitting the space bar in between 2 and mg, you can hit Ctrl + Shift + Space to insert a non-breaking space. That way even if the measurement is at the end of a line of text, the numeral and unit will not break apart. Observe:
Your instinct when faced with the first sentence in the image above might be to simply hit Enter before the 2 so that 2 mg appears together on the following line. This can be problematic for a number of reasons. First of all, many languages typically take up either more or less space on a page than English text. This means that the return (whether hard or soft) that you used to get 2 and mg to be together in the English document will likely be misplaced in the translated version.
Second of all, when you use a non-breaking space you can “set it and forget it,” if you will. Why rely on a manual process such as finding split units and “correcting” them when a non-breaking space will keep the text together no matter what other edits are made?
As we will see shortly, using a return instead of a non-breaking space can also cause issues with segmentation. A return unnecessarily splits the sentence, while a non-breaking space maintains continuity. Plus, it signals to a translator that they should also use a non-breaking space in the translation, if applicable.
And just like hard and soft returns look different in Microsoft Word, so do regular and non-breaking spaces when you turn on your paragraph marks:
Segmentation: Why It Matters
Back to why this becomes an issue when processing (or attempting to process) PDFs in translation software: If a PDF was created from a source file that used hard returns to bring text onto a new line, translation software will see the hard returns and process each line as a separate segment. For example:
While the text on these three lines may need to be split up as such visually, those pesky hard returns mean that translation software creates three separate segments: “In Business”, “for Over” and “20 Years”. And while this example only contains six words that can easily be seen to go together in English, grammar rules in different languages can make it challenging to translate each of these segments in a way that makes sense. Furthermore, what about our handy translation memory? It is more likely that we would want the program to remember how to translate “In Business for Over 20 Years” as one unit rather than each of the three separate units listed above – those are not as practical for content leverage in future projects.
Translation software can also pick up spaces between elements in a PDF as column and section breaks which do not translate well (if you’ll allow us the pun) in segmentation.
What’s a Translator to Do?
So, where does this leave an LSP given a PDF for translation? With two options: to recreate the PDF into a different file type, or to process the PDF in translation software and then rework the segments. While translation software does typically allow for this, it is time consuming and the translator will have to continuously check where they are merging segments against the PDF file. This is not only tedious, but as with any manual process, leaves room for error. Plus, allowing translation software to segment the text, as it was made to do, is a huge time-saver. All of this means that using true source files instead of PDFs for translation allows for quicker turnarounds at a lower cost.
Troubleshooting: When Source Files Are Elusive
If you do not have access to the source file your PDF was created from, don’t despair! You still have ways to optimize your content for translation. Adobe Acrobat DC Pro comes equipped with an “Export PDF” function built in so that you can send the contents of your PDF to a Microsoft Word document or other more functional format for translation. This function has come a long way and whereas we always used to rely on the ABBYY PDF Transformer (more on that shortly), if your PDF was created from a Word file, Acrobat’s export function should be able to get it back to a Word document without too much trouble.
The ABBYY PDF Transformer mentioned above has some additional functionality that Adobe does not. While there is the option for a 1-click convert function in ABBYY, there is also the option to select how particular items are recognized. For example, if there is text on top of an image in your PDF that doesn’t get recognized, or there is text next to a logo and you want the text but not the logo in your converted file, you can use a selection tool in ABBYY to highlight this text for extraction. There is also a table tool to ensure that text in a table gets converted into a table in your export format.
PS: Both Adobe Acrobat and ABBYY can convert PDFs into Excel worksheets as well as Word documents (and a few other file formats, too). If your PDF contains tables and text, and you only want ABBYY to extract data from tables to go into Excel, there is an option for that: a checkbox that says, “Ignore text outside tables” (pretty intuitive, right?).
If you’re not sure which file type will work best for your translation project, don’t hesitate to contact your LSP – they’d be happy to help!
Unfortunately, this strategy may still result in some pain points in translation. This is why having a copy of the original source file is always best, and why, if someone has to convert the PDF to an alternate file type, it’s better to have the owner of the content handle that rather than the LSP.
For starters, fonts can cause issues. If the PDF contains a font that you (or the translator) do not have in Microsoft Word, for example, Word will select a font that it deems to be similar. Sometimes this “substitute” font is indiscernible from the original, while other times it is, shall we say, less successful.
If the PDF was created from an InDesign file and you export it to a Word document, a nightmarish web of column and section breaks and hard returns will likely end up in the Word file. This can either make the Word document almost as unusable for translation as the PDF (for the reasons outlined earlier), or result in just as much time spent and potential for error as retyping the entire PDF.
Potential Pitfalls, Part II: InDesign
Since InDesign files are true source files, they ordinarily work beautifully for translation. The major exception is if the text they contain has been outlined (or created as outlines, which is the terminology used in InDesign). We talk about the pros and cons of outlining fonts in another post (and the designer and printer online community also has loads to say on the subject), but suffice it to say here that if you want your LSP to be able to translate your InDesign file, the text it contains cannot have been converted to outlines. It must still be editable text. Checking if your font is outlined should be easy enough – try clicking into the text to make changes. If you can make edits to the text, good news: your text is still editable and your file should be a go for translation. If you cannot, this means that the text has been converted to outlines, and you’ll need a plan B.
There is also a lot of buzz online about “rescuing” text from its outlined form, and you may be able to turn outlines back into an editable font, though not without what may turn into a whole lot of additional time and frustration.
Morals of the Story
One reply that is repeated over and over again in online forums about outlining fonts is the same we’re preaching here: always keep a copy of your source file accessible. Perhaps you have been tasked with getting translations of new documentation or updates to existing documentation (we also address translations of updated documents here), but you do not manage source files. If this is the case, you may need to have an internal conversation to ensure that you have access to source documentation.
This relates back to our Global Communication Maturity Model (GCMM). Is translation something you need on a regular basis but access to source files continues to be restricted or nonexistent? Internal process management can make a huge difference in external process management, e.g., eliminating pain points in translation, thus increasing time and cost efficiencies.
TL; DR? Key takeaways
- Source documents should always be your go-to when sending copy for translation.
- A PDF is not a source document.
- Avoid unnecessary hard returns and get in the habit of using soft returns and non-breaking spaces appropriately.
- PDFs can be converted to more translation-friendly formats using Acrobat’s export feature or the ABBYY PDF Transformer.
- Beware of any unwanted side effects of PDF conversion, like font differences or column and section breaks.
- InDesign files must contain editable text, not outlines, to be used for translation.
- If you are responsible for contracting translation but do not have access to source documents, start a conversation about how you can save your company time and money by being able to send source files to your LSP.
If you are still scratching your head about all of this, don’t hesitate to ask a question in the comments or contact us! We’d love to discuss how we can work together to help eliminate your pain points in translation and optimize your global communication.