Step-by-step instructions
- 1
pdftohtml (command line, Linux/Mac)
Install: `sudo apt install poppler-utils` (Ubuntu) or `brew install poppler` (Mac). Basic conversion: `pdftohtml input.pdf output.html`. This creates an HTML file with embedded images extracted from the PDF. For a single-file output: `pdftohtml -s input.pdf output.html`. For XML output (for parsing): `pdftohtml -xml input.pdf output.xml`. pdftohtml preserves text, layout, and images but doesn't handle complex multi-column layouts perfectly.
Go to converter - 2
pdf2htmlEX (best layout fidelity)
pdf2htmlEX produces HTML that looks nearly identical to the PDF: `pdf2htmlEX input.pdf output.html`. Install via Docker: `docker run -ti --rm -v $(pwd):/pdf bwits/pdf2htmlex pdf2htmlEX input.pdf output.html`. The output uses CSS and HTML to replicate the PDF layout, embedding fonts and images. The resulting HTML is large (fonts, vector data) but displays accurately in browsers. Ideal for converting reports and brochures to web format.
- 3
Adobe Acrobat (professional quality, Windows/Mac)
Open the PDF in Adobe Acrobat Pro. File → Export To → HTML Web Page. Acrobat analyzes the document structure and creates clean, well-structured HTML. The conversion is especially good for documents with defined headers, lists, and tables — Acrobat infers semantic structure from the visual layout. Tables often convert better with Acrobat than open-source tools. A subscription to Acrobat Pro is required.
- 4
Python: pdfminer.six (programmatic/NLP use)
For text extraction (not layout-preserving): `pip install pdfminer.six`. Script: `from pdfminer.high_level import extract_text_to_fp; from io import StringIO; output = StringIO(); extract_text_to_fp(open('input.pdf', 'rb'), output, output_type='html'); html = output.getvalue()`. This extracts text as HTML but loses most layout. Use it when you want the text content for NLP, searching, or CMS import — not for visually faithful conversion.
Why convert PDF to HTML?
PDFs are great for print and guaranteed layout, but terrible for the web. HTML is indexable, accessible, responsive, and linkable — everything a PDF isn't.
Your files never leave your device
FormatDrop runs the conversion engine entirely inside your browser using WebAssembly. No file upload. No server. Nothing stored. You can verify this by opening DevTools → Network tab and watching: zero upload requests.
Frequently asked questions
Why does PDF to HTML conversion look different from the original?
Can search engines index a converted HTML page?
How do I handle a PDF with scanned images (no selectable text)?
No account. No upload. Works in any browser.