What image formats do you accept?

jpg, jpeg, png, tiff, tif, webp, bmp, gif.

What if my photo is rotated or skewed?

Fily auto-rotates and deskews before OCR. Quality is best with photos taken straight-on at decent lighting.

Can it read handwriting?

Yes, modern vision-language OCR handles cursive and print handwriting reasonably well — accuracy is lower than typed, flagged in the QA report.

Multiple images at once?

Yes. Upload them in a ZIP or one at a time.

Can I get the original image with translated text overlaid?

Not in standard mode — output is DOCX. Overlay-on-image is on the roadmap.

Is this the same as scanned PDF?

Same pipeline foundation, different input. PDFs go to /translate/pdf-scanned, single images come here.

Images · OCR + AI translation

Translate images. Photos of documents, signs, screenshots, anything.

Drop a .jpg, .png, .tiff, or .webp. Fily runs OCR, translates the recognized text, and gives you back a clean DOCX with the original layout reconstructed.

See how it works

0Image jobs0words processed

What is Images (OCR + AI)?

Image translation handles single-image inputs: phone photos of documents, screenshots of UI in another language, photos of signs or menus, scanned single pages. The pipeline is OCR-then-translate, with layout reconstruction adapted to the source.

Why Images (OCR + AI) is tricky for AI translation

Resolution and orientation vary: phone photos are often rotated, skewed, or under-lit.

Mixed content: a photo can have typed text, handwriting, and image-of-text inside one image.

Background complexity: photos of signs in the street have OCR confused by background patterns.

Single-page is different from multi-page PDF: image OCR pipelines often skip layout, dumping a wall of text.

Screenshots have specific layout: UI text, buttons, menus, in nested visual hierarchy that should survive.

How Fily handles Images (OCR + AI)

Pre-OCR normalization: deskew, contrast adjustment, orientation correction.

Dual-backend OCR with automatic failover (same engine as scanned PDFs).

Layout-aware reconstruction: text blocks with positions inform the output layout, not a flat dump.

Mixed-content handling: typed text and handwriting flagged separately in the QA report.

Screenshot mode: opt-in for UI screenshots — preserves visual hierarchy (titles, buttons, menus) as a structured DOCX.

Pipeline: image_qa_12step_v2@1.0.0

The Images (OCR + AI) workflow with Fily

Upload

Drop your .jpg / .png / .tiff (single or batch ZIP). Optional: glossary, TM, style guide.

Process

Fily runs the Images (OCR + AI) pipeline + 12 QA steps. Typical job: 10–20 minutes.

Download

Same format, ready to deliver. QA report HTML attached.

Common upload: a photo of a printed form in a foreign language that someone needs translated, a screenshot of a foreign-language UI to localize, or a photo of a single-page contract picked up in another country. Output is a DOCX with the recognized text, translated, in an approximately matching layout.

Beyond the standard pipeline

What we've built around Images (OCR + AI)

Edge cases clients brought us for this format — and the pipelines we shipped to solve them.

pdf_qa_12step_v2 + image_qa

Dual-backend OCR for impossible scans

Healthcare LSPs were hitting walls with 150-DPI faxed medical records that generic OCR failed on silently — returning blank pages without warning.

A fallback chain: a frontier vision-language OCR as primary, automatic failover to an alternate backend on truncation or JSON errors. Page-level retry, confidence flagging, and handwriting detection in the QA report.

semantic audit · +5%

Back-translation you can hand to compliance

For regulated content, a reviewer who doesn't read the target language still has to sign off — and 'trust the AI' is not an audit trail a compliance team accepts.

An independent pass translates every segment back to the source and compares for semantic drift, producing an auditable per-segment report — surfaced inside the review editor so a monolingual reviewer can approve with confidence.