At at least one point in your programming career, you’ll be asked to generate a PDF. Maybe receipts are legally required to be in PDF format, or you need to send something to a printer, or most commonly, many users just expect PDF reports.
So you search Google for “how to generate a PDF with Python.” You’ll find three options:
- PyFPDF, or similar build-a-pdf-line-by-line open-source libraries
- Python-pdfkit, or similar HTML-to-PDF browser-based open-source libraries
- Commercial engines, like DocRaptor or PDFreactor
What are the differences? Advantages and disadvantages? How do you choose? Let me explain.
Build a PDF Line-By-Line
Here’s the default code example for PyFPDF:
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('Arial', 'B', 16)
pdf.cell(40, 10, 'Hello World!')
pdf.output('tuto1.pdf', 'F')
For many use cases, this is ideal. You’ll get the exact PDF you want, every time. But as you can see, it requires building the PDF object by object, line by line.
It’s common to already a document written in HTML, or perhaps as a developer you’re most familiar with building frontend code with HTML and CSS. In this case, you probably want an HTML to PDF solution.
Browser-Based Engines
Alternatively, there are many HTML to PDF libraries based on various browsers. Headless Chrome is extremely popular these days (as it should be), but there are many older tools built on PhantomJS and wkhtmltopdf (don’t use these; they rely on ancient webkit engines).
These libraries are generally good for simple PDF documents, but they tend to break down under complex documents with more than one page or pixel-perfect design and layout requirements. This is because browsers are based on the concept of a single continuously-scrolling website page. They don’t understand “pages” at all.
pdfkit’s has a really simple interface:
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')
pychromepdf lets you use the more modern Headless Chrome generator, but it’s more complicated to set up and maintain.
Commercial Engines
Finally, as you can probably guess, the commercial PDF generators offer the most advanced functionality. That functionality comes at a price. A PDFreactor license starts at $2,500 and PrinceXML $3,800. DocRaptor’s online HTML to PDF API provides access to Prince’s engine starting at a more affordable $15/mo.
But these HTML to PDF libraries come with features like:
- Dynamic table of contents
- CSS-based headers and footers
- Different page styles and backgrounds for different sections of the document
- Flexbox support
- Advanced page break handling
- Watermarks
- Accessible PDFs
DocRaptor’s Python library is as simple as:
import docraptor
doc_api = docraptor.DocApi()
doc_api.api_client.configuration.username = ‘YOUR_API_KEY_HERE’
response = doc_api.create_doc({
"test": True,
"document_content": "<html><body>Hello World</body></html>",
# "document_url": "http://docraptor.com/examples/invoice.html,
"document_type": "pdf",
# "javascript": True,
})
So which is best?
That’s up to you! With a larger budget, you can support more complex PDFs and complete your project much faster. With a simpler document, maybe the open-source libraries will save you money (depending on how much work you have to do on the infrastructure side).
If the document hasn’t been created yet, PyPFDF’s pixel perfection may be the best route. The choice is yours.