Academic PDF conversion

Convert Academic PDFs to Kindle — Columns, Footnotes, and Headings Preserved

IEEE papers, arXiv preprints, ACM proceedings, and graduate theses come with layout complexity that generic converters cannot handle. leafbind reads each column in order, links footnotes, and maps numbered section headings to a navigable chapter list.

Upload your PDF — free

The problem

Why academic PDFs break every other converter

Academic publishing uses a set of layout conventions that made sense for print but are structurally hostile to ebook conversion. The double-column format used by IEEE, ACM, and most major journals splits a single flow of text across two vertical columns on each page. Most converters read text left-to-right across the full page width — which interleaves both columns line by line into unreadable output.

Section numbering is another obstacle. Academic papers use hierarchical numbered headings — 1, 1.1, 1.2, 2, 2.1, 2.2.1 — that Calibre and most generic EPUB converters cannot reliably distinguish from numbered list items or figure labels. The result is a document with no navigable structure and a blank table of contents.

Footnotes compound the problem. Journal-style footnotes sit at the physical foot of the page — a position that loses all meaning when page boundaries disappear in reflow. Most converters either strip footnotes entirely or dump them in a disconnected block at the document end, with no link back to the in-text citation. For papers that rely heavily on footnotes — legal scholarship, philosophy, history of science — this makes the converted document nearly unusable. Inline citations in the form [1], [14, 15], or (Author, 2022) survive only if the converter treats them as ordinary text and does not strip superscript runs. Figure captions face a similar fate — images and their captions are often separated during extraction, leaving orphaned figures with no explanation.

The fix

What the leafbind pipeline preserves

leafbind is built around four pipeline capabilities that address exactly these failure modes. See side-by-side screenshots from the same source document on the quality comparison page.

Column-aware text extraction

The pipeline uses coordinate-based analysis to identify column boundaries from the PDF's internal geometry. Each column is read sequentially from top to bottom before moving to the next. For double-column IEEE and ACM papers, this means the left column is fully extracted before the right column begins — exactly the reading order the author intended.

Numbered section heading detection

Section headings in academic PDFs are visually distinct — larger font, bold weight, often followed by a section number. The pipeline classifies text runs by rendered font size and weight, identifies heading candidates by visual prominence, and tags them as h2 and h3. Both flat-numbered sections (Section 1, Section 2) and hierarchical schemes (1.1, 2.3.4) are supported. The result is a structured Kindle book with a working chapter navigation menu.

Footnote detection and linking

The pipeline detects superscript footnote markers in body text, matches each to its corresponding footnote body at the page foot, and generates linked pairs in the EPUB or KFX output. On Kindle Paperwhite and Scribe, tapping a footnote number jumps to the note text. A return link brings you back to your reading position.

Inline citations and figure captions preserved

Inline citation markers — whether numeric [1], author-year (Smith, 2019), or symbol-based — are retained as body text and not stripped. Figure captions are associated with their adjacent images during extraction, preserving the figure-caption relationship in the converted output.

Compatibility

Supported document types

The pipeline is calibrated for digital-born PDFs produced by academic publishing tools. The following document types work reliably:

✓IEEE conference and journal papers

✓arXiv preprints (PDF format)

✓ACM Digital Library proceedings

✓University theses and dissertations

✓Technical textbooks and monographs

✓Medical and scientific journal articles

✓Legal scholarship with footnote-heavy text

✓Government and policy research reports

Scanned PDFs:

Scanned PDFs (image-only pages) are supported on the premium tier via an AI-assisted OCR pass powered by Gemini. Clean black-and-white scans of academic papers produce accurate OCR; colour scans or documents with complex embedded images may produce more artifacts. For digital-born academic PDFs, OCR is not invoked — the standard extraction path handles them directly. Inline mathematical equations may render as plain text substitutes rather than properly formatted notation, and chemical structures and hand-drawn diagrams are not reconstructed.

Source notes

Tuned for IEEE, arXiv, and academic journal articles

IEEE conference and journal papers use a strict two-column layout with numbered section headings (I, II, III), and reference lists numbered in citation order. The pipeline detects the column boundary from coordinate clusters, reads each column top-to-bottom in sequence, and preserves the numbered hierarchy as Kindle chapter entries. Footnote-style references resolve to popup links on a Kindle Scribe.

arXiv preprints ship as text-based PDFs generated from LaTeX, which gives the extraction layer high-quality coordinate data and clean glyph mapping. Equations rendered as inline text typically survive conversion as plain-text substitutes; equations rendered as vector graphics are preserved as inline images. Bibliography sections and cross-references are linked when the source PDF includes the underlying anchors.

Journal articles from publishers like Nature, PLOS, ACM, and medical journals follow similar two-column patterns to IEEE but with publisher-specific heading styles. The font-size histogram approach picks up section headings regardless of font choice, so the pipeline handles a wide range of journal templates without per-publisher tuning. For the academic reader on a Kindle Scribe, the output reads like a published book — with chapter navigation, font-size control, and tappable footnote popups.

How to convert

Three steps to a readable academic library

Upload your academic PDF

Drop your paper, thesis, or textbook on the leafbind upload page. Files up to 20 MB are accepted on the free tier; premium accounts accept up to 100 MB.

Let the pipeline analyse the layout

leafbind automatically detects column boundaries, identifies section numbering patterns, matches footnote markers to footnote bodies, and classifies figure captions. No configuration needed.

Download and send to Kindle

Your converted file is ready within seconds. Download the EPUB or KFX file and send it to your Kindle via USB, email, or the Send to Kindle app.

FAQ

Common questions

Does it work on scanned academic PDFs?

Yes, on the premium tier. When the pipeline detects a scanned page — one where the PDF contains an image rather than selectable text — it routes that page through a Gemini OCR pass. Clean black-and-white scans of academic papers produce accurate OCR; heavily degraded or colour-heavy scans may produce more artifacts. For text-based PDFs — the vast majority of digital-born papers from IEEE, ACM, arXiv, and university repositories — the standard extraction path is used and OCR is not needed.

Will chapter and section numbers survive? What about headings like 1.1 or 2.3.4?

Yes. The pipeline detects numbered headings by their visual prominence and position. Whether your paper uses a flat Section 1 / Section 2 pattern or a hierarchical 1.1, 1.2.3 numbering scheme, the headings are tagged as h2 and h3 in the output. On Kindle, they appear in the chapter navigation menu so you can jump directly to any section.

What about inline citations like [1] or (Author, 2022)?

Inline citations are preserved as body text. They are not stripped, reordered, or linked (linking citations to a bibliography is outside the scope of v1). The citation markers appear exactly where they appear in the original PDF — so if your PDF reads "…as shown in prior work [14, 15]…" your Kindle will read the same thing.

Is there a file size limit?

The free tier accepts PDFs up to 20 MB, which covers most individual research papers and many textbooks. The premium tier raises the limit to 100 MB, which handles large textbooks and multi-chapter dissertations. If your file exceeds 100 MB, consider splitting it by chapter using a PDF editor before uploading.

Start converting — free

Free tier: 3 conversions per day, up to 20 MB per file. No account required.

Upload your PDF — free