Python, software

Comparing Python PDF Generation Options

At at least one point in your programming career, you’ll be asked to generate a PDF. Maybe receipts are legally required to be in PDF format, or you need to send something to a printer, or most commonly, many users just expect PDF reports.

So you search Google for “how to generate a PDF with Python.” You’ll find three options:

  • PyFPDF, or similar build-a-pdf-line-by-line open-source libraries
  • Python-pdfkit, or similar HTML-to-PDF browser-based open-source libraries
  • Commercial engines, like DocRaptor or PDFreactor

What are the differences? Advantages and disadvantages? How do you choose? Let me explain.

Build a PDF Line-By-Line

Here’s the default code example for PyFPDF:

from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('Arial', 'B', 16)
pdf.cell(40, 10, 'Hello World!')
pdf.output('tuto1.pdf', 'F')

For many use cases, this is ideal. You’ll get the exact PDF you want, every time. But as you can see, it requires building the PDF object by object, line by line. 

It’s common to already a document written in HTML, or perhaps as a developer you’re most familiar with building frontend code with HTML and CSS. In this case, you probably want an HTML to PDF solution.

Browser-Based Engines

Alternatively, there are many HTML to PDF libraries based on various browsers. Headless Chrome is extremely popular these days (as it should be), but there are many older tools built on PhantomJS and wkhtmltopdf (don’t use these; they rely on ancient webkit engines).

These libraries are generally good for simple PDF documents, but they tend to break down under complex documents with more than one page or pixel-perfect design and layout requirements. This is because browsers are based on the concept of a single continuously-scrolling website page. They don’t understand “pages” at all.

pdfkit’s has a really simple interface:

import pdfkit

pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

pychromepdf lets you use the more modern Headless Chrome generator, but it’s more complicated to set up and maintain.

Commercial Engines

Finally, as you can probably guess, the commercial PDF generators offer the most advanced functionality. That functionality comes at a price. A PDFreactor license starts at $2,500 and PrinceXML $3,800. DocRaptor’s online HTML to PDF API provides access to Prince’s engine starting at a more affordable $15/mo.

python pdf generation options DocRaptor

But these HTML to PDF libraries come with features like:

  • Dynamic table of contents
  • CSS-based headers and footers
  • Different page styles and backgrounds for different sections of the document
  • Flexbox support
  • Advanced page break handling
  • Watermarks
  • Accessible PDFs

DocRaptor’s Python library is as simple as:

import docraptor

doc_api = docraptor.DocApi()
doc_api.api_client.configuration.username = ‘YOUR_API_KEY_HERE’

response = doc_api.create_doc({
  "test": True,
  "document_content": "<html><body>Hello World</body></html>",
  # "document_url": "http://docraptor.com/examples/invoice.html,
  "document_type": "pdf",
  # "javascript": True,
})

So which is best?

That’s up to you! With a larger budget, you can support more complex PDFs and complete your project much faster. With a simpler document, maybe the open-source libraries will save you money (depending on how much work you have to do on the infrastructure side).

If the document hasn’t been created yet, PyPFDF’s pixel perfection may be the best route. The choice is yours.

Share your Thoughts