Translate PDFs. Layout and fonts survive.
Drop your text-selectable PDF and get it back translated with paragraphs, columns, tables, fonts, and images in the same positions. Output as PDF or editable DOCX — your choice.
What is Native PDF?
You can tell a PDF is native by clicking and dragging — if you can select text, it's native. Native PDFs are generated from Word, InDesign, LaTeX, Google Docs, or similar — the text is real, the structure is parseable. About 75% of business PDFs are native. If you can't select text, it's a scanned PDF, and the pipeline is different — see /translate/pdf-scanned for OCR-based translation.
Why Native PDF is tricky for AI translation
- Reading order: PDF stores glyphs by visual position, not logical order. Multi-column layouts, sidebars, and footnotes can scramble into nonsense if reading order is inferred naively.
- Text expansion: Spanish runs 25–30% longer than English. A PDF with tight columns will overflow. Naive translation produces clipped text.
- Font substitution: source font may not contain target-language glyphs (English Helvetica → Arabic, Chinese). Substitution must preserve weight, size, and color.
- Embedded tables: PDF tables are visual constructs (lines + cells of text), not semantic tables. Extracting and re-rendering them is an art.
- Footnotes and references: numbered references in main body link to footnote positions on the same page. Translating without preserving links breaks the reference chain.
- Images with embedded text: a native PDF can still contain image-only diagrams with captions. Those need OCR + translation.
- Forms: PDF form fields (/Tx, /Btn) are translatable independently of body text.
How Fily handles Native PDF
- Block-level extraction: PDF parsed into structured blocks — paragraphs, table cells, headers, footers, captions, footnotes — with positions and reading order preserved.
- Layout reconstruction: output PDF reuses the source layout where possible. DOCX output reconstructs columns, tables, and styles.
- Font handling: target-language fonts substituted intelligently when the source font lacks coverage.
- Text-expansion buffer: layout engine adjusts line breaks and paragraph reflow for expansion (ES, FR, DE) and compression (ZH, JA).
- Image-text handling: image regions with embedded text are detected; OCR runs on those regions separately.
- Tables: cell-by-cell translation with column structure preserved.
- Footnotes: reference numbers stay linked; footnote text translated with original numbering.
- Output format: searchable PDF or DOCX — picked at upload.
Pipeline: pdf_qa_12step@2.0.0 · pdf_qa_12step_v2@1.0.0
The Native PDF workflow with Fily
Upload
Drop your .pdf (single or batch ZIP). Optional: glossary, TM, style guide.
Process
Fily runs the Native PDF pipeline + 12 QA steps. Typical job: 10–20 minutes.
Download
Same format, ready to deliver. QA report HTML attached.
Common upload: a 20-page native PDF (legal contract, technical spec, marketing one-pager) with mixed paragraphs, tables, and inline images. Fily delivers a translated PDF that opens identically in Acrobat — same layout, fonts substituted where needed, paragraphs reflowed for text expansion.
Frequently asked about Native PDF
Other formats Fily translates
Ready to translate a Native PDF file?
No card. No setup. Upload one file and see the output.