How to check if PDF is scanned image or contains text

September 6, 2023 by Tarik

The below code will work, to extract data text data from both searchable and non-searchable PDF’s.

import fitz

text = ""
path = "Your_scanned_or_partial_scanned.pdf"

doc = fitz.open(path)
for page in doc:
    text += page.get_text()()

You can refer this link for more information.

If you don’t have fitz module you need to do this:

pip install --upgrade pymupdf