Extract / Identify Tables from PDF python [closed]

After many fruitful hours of exploring OCR libraries, bounding boxes and clustering algorithms – I found a solution so simple it makes you want to cry!

I hope you are using Linux;

pdftotext -layout NAME_OF_PDF.pdf

AMAZING!!

Now you have a nice text file with all the information lined up in nice columns, now it is trivial to format into a csv etc..

It is for times like this that I love Linux, these guys came up with AMAZING solutions to everything, and put it there for FREE!

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)