Always keep in mind: A PDF is not a source document and Translating PDFs is not working in the source document. We get many requests sent to us where the document for translation is in a PDF format. However, translating PDFs is not a one-step preparation process. The first thing we typically ask is: do you have the source file? This tends to throw some requestors off because typically the PDF is the only document they deal with. However, PDF documents are, in a very basic sense, nothing more than a “digital printout” of a document that was created somewhere else. Typically we see PDFs either created from a design program such as InDesign or from an Office program like Word or PowerPoint. 

PDFs are convenient for collaboration when reviewing documents.

Why PDFs are so great…

PDF documents add a level of convenience that you don’t have with many other files. Anyone with a computer and internet access should be able to read PDFs (hence: Portable Document Format / PDF).

PDFs accurately display a design piece without needing any of the fonts as these are embedded in the file. PDFs are also very handy for review work. If you have Acrobat Pro, you can enable PDF documents to allow adding markup by anyone who has the free Reader version. We use PDF markup a lot in our own review of typeset documents because it enables us to collaborate with our translators and our clients on the review.

Comments and review markup can easily be exchanged, added on or even imported from different sources of the same document. PDFs often are also used in print production, can be used to enhance accessibility and documents can be secured. There are also plenty of enterprise wide solutions and content management systems that compile information into PDFs along with variable data. PDF documents simply are a part of business and so we need to be mindful of how the use PDFs affect other processes, like translation.

When PDFs are not so great…

Look through the PDF Properties to find out how your PDF was created.

In translation, you always want to try to work with the source document because it is the best possible place at which you can start your preparations. PDFs have become such a mainstream media that we often forget where they came from and that can be a problem for translation.

In almost all cases, we deal with PDFs that either came from InDesign or Word. How do you find out where your PDF came from? Often times, you can tell by the PDF properties how the PDF was produced. Simply go to the File > Properties in Reader and check the Application or PDF Producer. It will tell you often if it was created in Word or InDesign or another program. Sometimes it is not so clear and you may need to dig further. Maybe the author gives away the source of the content.

For content creators and designers, having a repository of the original source file is critical if you want to make updates in the future, or… if you anticipate needing translations. Now, there are many programs out there, even the Translation Memory software that we use, that claim that they can work with and edit PDFs into workable files. Maybe for a small, one-pager you may get some good results, but in general PDFs are never a good source for translation.

It constantly amazes us to hear that translators are dealing with agencies who let them handle translations directly from PDFs formats while still expecting a per-word rate. Translators are often left to fix a lot of formatting and tagging issues that are prevalent when processing PDF files. Isn’t that the role of Project Management? Translators should not accept PDFs for translation at a per-word rate unless they charge for their time. Even then, it’s not typically the translator’s core competency.

Translations simply do not flow the same way as the original source content. Once you get into exporting or manipulating PDF files, you are dealing with the inherent structure of a PDF that is simply not set up to handle text flow. Sometimes you get text boxes that may overlap. Table formats almost never come back to you in pure table formats and needs to be set up again.

Recommendations for Translating PDFs

When you need to translate a PDF, your best option is to simply find the original source document. It’s going to provide you with best possible start of any translation project.

Even though InDesign files don’t seem to be very useful to most people in an organization because you need special software, these days InDesign is a great file format to translate with. We even spend time ahead of translation to optimize InDesign files for translation just so that we can work in that format. Only unless your source authoring program is proprietary and/or your file format is not XLIFF or XML based in any form that we can logically extract text, you’d be hard pressed to find a source document we cannot work with.

Recreate in Word or PowerPoint

If you can’t find the source for some reason, you typically need to recreate the file. Sometimes it is possible to recreate design files fairly accurately in Word if it is mostly text based. But how you set up the Word document affects translation costs. One program we make use of is ABBYY PDF Transformer. It’s a great tool for scanned documents and mostly text based designs as long as the text is either represented perfectly horizontally or vertically. FineReader is a more enhanced product by ABBYY that supports many other languages including Asian languages and it offers more support for other file formats. Either of these programs does require you to have text that is clearly “scannable” through Optic Character Recognition (OCR) methods. You have to be careful to proofread the output, especially if you intend to reuse it for commercial purposes. If text is set against different backgrounds that makes it difficult for OCR programs to recognize characters, we sometimes prefer just a simple copy/paste of text into Word to start with and create a layout from there. PDF text, when copied, tends to break at the end of every line because line breaks are hard coded into the document, so that’s a disadvantage as you need to eliminate these hard returns. Plus, this method would not work with PDFs that are flattened or with scans.

TIP FOR SECURED DOCUMENTS: A secured PDF that does not allow copying of text or exporting it out to another format can be “scanned” by ABBYY PDF Transformer into a new PDF. From there, you can select text and copy it into another program.

Import PDFs into Illustrator

Design files that have been turned into PDFs and that are not flattened also can be imported using Illustrator. PDFs and Illustrator both support vector based art and therefore Illustrator can pick up design elements as vectors as well as display fonts correctly if you have those fonts available.  When we recreate design files, it depends on how well you get ahead by importing materials over simply recreating the document from scratch. Remember that in translation, the flow of text and the expansion of text is an important factor in the setup of any file for translation. When you spend more time correcting issues after importing the artwork than you would in recreating from scratch, it’s obviously not a good use of your time.

Recreate from Scratch

Sometimes you just need to accept that recreating from scratch is a better investment than trying to salvage an existing design in PDF format. We always recommend redesigning the document in English (or source language) first because it provides you with the proof by which you then can apply the translation. And because we apply translation best practices in preparing the file for translation, your source documentation is already optimized when you decide to make updates.

Language and Design Readiness

In your quest for global communication excellence, considerations towards the use of PDFs as source materials may not seem an important factor, but they do have an effect on the translation workflow. If your organization spends a lot of time finding source documentation for translation or recreating documents, you may want to look at streamlining your content management practices around translation. Language and Design Readiness is one of the 4 major Process Improvement Categories that we look at when addressing Localization Maturity. It’s often the first, and sometimes the only area in which organizations are willing to invest in process improvements because the improvements are typically easy to manage and the pain points are clearly visible after going through a few projects.

Need help assessing your Language & Design Readiness in your organization?


More in Expertise
Image of WPML Language Solutions
WPML integration for WordPress Translation