Will the output PDF look like the original?

Yes — same layout, fonts substituted where needed, paragraphs reflowed to handle text expansion. Complex layouts (multi-column with embedded diagrams) may need light touch-up.

Can I get a DOCX instead?

Yes. Pick "Output as DOCX" at upload. You get an editable Word file with reconstructed structure — paragraphs, headings, tables, lists.

What about tables in the PDF?

Detected and preserved. Cell structure, column widths, and header rows survive in both PDF and DOCX output.

My PDF has diagrams with English text inside the images. Will those be translated?

Yes. Image regions with embedded text trigger OCR; recognized text is translated; results are overlaid. This adds a few minutes to processing.

Does Fily handle PDF forms?

Form fields are translated independently. Field IDs preserved so form fillers still work.

How big can my PDF be?

Up to 500 pages in a single job. Larger documents should be split — happy to help script that.

PDF · AI translation

Translate PDFs. Layout and fonts survive.

Drop your text-selectable PDF and get it back translated with paragraphs, columns, tables, fonts, and images in the same positions. Output as PDF or editable DOCX — your choice.

See how it works

0PDF jobs0words processed

What is Native PDF?

You can tell a PDF is native by clicking and dragging — if you can select text, it's native. Native PDFs are generated from Word, InDesign, LaTeX, Google Docs, or similar — the text is real, the structure is parseable. About 75% of business PDFs are native. If you can't select text, it's a scanned PDF, and the pipeline is different — see /translate/pdf-scanned for OCR-based translation.

Why Native PDF is tricky for AI translation

Reading order: PDF stores glyphs by visual position, not logical order. Multi-column layouts, sidebars, and footnotes can scramble into nonsense if reading order is inferred naively.

Text expansion: Spanish runs 25–30% longer than English. A PDF with tight columns will overflow. Naive translation produces clipped text.

Font substitution: source font may not contain target-language glyphs (English Helvetica → Arabic, Chinese). Substitution must preserve weight, size, and color.

Embedded tables: PDF tables are visual constructs (lines + cells of text), not semantic tables. Extracting and re-rendering them is an art.

Footnotes and references: numbered references in main body link to footnote positions on the same page. Translating without preserving links breaks the reference chain.

Images with embedded text: a native PDF can still contain image-only diagrams with captions. Those need OCR + translation.

Forms: PDF form fields (/Tx, /Btn) are translatable independently of body text.

How Fily handles Native PDF

Block-level extraction: PDF parsed into structured blocks — paragraphs, table cells, headers, footers, captions, footnotes — with positions and reading order preserved.

Layout reconstruction: output PDF reuses the source layout where possible. DOCX output reconstructs columns, tables, and styles.

Font handling: target-language fonts substituted intelligently when the source font lacks coverage.

Text-expansion buffer: layout engine adjusts line breaks and paragraph reflow for expansion (ES, FR, DE) and compression (ZH, JA).

Image-text handling: image regions with embedded text are detected; OCR runs on those regions separately.

Tables: cell-by-cell translation with column structure preserved.

Footnotes: reference numbers stay linked; footnote text translated with original numbering.

Output format: searchable PDF or DOCX — picked at upload.

Pipeline: pdf_qa_12step@2.0.0 · pdf_qa_12step_v2@1.0.0

The Native PDF workflow with Fily

Upload

Drop your .pdf (single or batch ZIP). Optional: glossary, TM, style guide.

Process

Fily runs the Native PDF pipeline + 12 QA steps. Typical job: 10–20 minutes.

Download

Same format, ready to deliver. QA report HTML attached.

Common upload: a 20-page native PDF (legal contract, technical spec, marketing one-pager) with mixed paragraphs, tables, and inline images. Fily delivers a translated PDF that opens identically in Acrobat — same layout, fonts substituted where needed, paragraphs reflowed for text expansion.

Beyond the standard pipeline

What we've built around Native PDF

Edge cases clients brought us for this format — and the pipelines we shipped to solve them.

Golden TM · auto-build

A translation memory that builds itself

Clients wanted TM leverage but didn't want to maintain a TM — exporting, cleaning and re-importing TMX after every project is a job nobody has time for.

Every confirmed segment is promoted into the organization's Golden TM automatically, scoped per client and language pair, and reused on every future job. Reviewer edits in the dashboard feed straight back in. Exact matches then cost $0.

semantic audit · +5%

Back-translation you can hand to compliance

For regulated content, a reviewer who doesn't read the target language still has to sign off — and 'trust the AI' is not an audit trail a compliance team accepts.

An independent pass translates every segment back to the source and compares for semantic drift, producing an auditable per-segment report — surfaced inside the review editor so a monolingual reviewer can approve with confidence.