How to convert a pdf document pages to images using python ?

Published: October 01, 2019 Protection Status

Examples of how to convert a pdf document pages to images using python

1. Using the python module pdf2image

The python module pdf2image is available on github. To install it a solution is to use pip:

pip install pdf2image

Note: the module needs poppler to run. If you use anaconda python distrubtion, it can be installed for example using the following command:

conda install -c conda-forge poppler

Then the module can now be imported:

>>> from pdf2image import convert_from_path

and the function convert_from_path() function can be used:

>>> pages = convert_from_path('document.pdf', dpi=200)

1.1 Convert all pdf document pages to images

To convert all pages of the pdf document to images, a solution is to use a loop over the iterative element pages:

>>> for idx,page in enumerate(pages):
...'page'+str(idx)+'.jpg', 'JPEG')

1.2 To convert a given page

To convert a given page:

>>> page_idx = 0
>>> page = pages[page_idx]
>>>'image.jpg', 'JPEG')

2. Using imagemagick

Another solution is to use imagemagick. To create a preview of the first page of a pdf document for example:

>>> import os
>>> os.system('convert document.pdf[0] image.jpg')

To change image size and the resolution:

convert -density 144 document.pdf[0] -resize 50% image.jpg