Best tool for text extraction from PDF in Python 3.4 [closed]

Question

You need to install the pypdf package to be able to work with PDFs in Python. pypdf can extract text/images. The text is returned as a Python string. To install it, run pip install pypdf from the command line. This module name is case-sensitive so make sure to type all lowercase.

from pypdf import PdfReader

reader = PdfReader('my_file.pdf')
print(len(reader.pages))  # gives '56'
page = reader.pages[9]    #'9' is the page number
page.extract_text()

The last statement returns all the text that is available in page 9 of ‘my_file.pdf’ document.

Leave a Comment Cancel reply