What is the ideal font for OCR?

After trying a lot of different fonts and OCR engines I tend to get the best results using Consolas. It is a monospaced typeface like OCR-A, but easier to read for humans. Consolas is included in several Microsoft products. There is also an open source font Inconsolata, which is influenced by Consolas. Inconsolata is a … Read more

How do I segment a document using Tesseract then output the resulting bounding boxes and labels

Success. Many thanks to the people at the Pattern Recognition and Image Analysis Research Lab (PRImA) for producing tools to handle this. You can obtain them freely on their website or github. Below I give the full solution for a Mac running 10.10 and using the homebrew package manager. I use wine to run windows … Read more

Extracting code from photograph of T-shirt via OCR

You can probably type faster than you can clean up images and install OCR engines: #!/usr/bin/perl (my$d=q[AA GTCAGTTCCT CGCTATGTA ACACACACCA TTTGTGAGT ATGTAACATA CTCGCTGGC TATGTCAGAC AGATTGATC GATCGATAGA ATGATAGATC GAACGAGTGA TAGATAGAGT GATAGATAGA GAGAGA GATAGAACGA TC GATAGAGAGA TAGATAGACA G ATCGAGAGAC AGATA GAACGACAGA TAGATAGAT TGAGTGATAG ACTGAGAGAT AGATAGATTG ATAGATAGAT AGATAGATAG ACTGATAGAT AGAGTGATAG ATAGAATGAG AGATAGACAG ACAGACAGAT AGATAGACAG AGAGACAGAT TGATAGATAG ATAGATAGAT TGATAGATAG … Read more

Using Tesseract for handwriting recognition

In short, you would have to train the Tesseract engine to recognize the handwriting. Take a look at this link: Tesseract handwriting with dictionary training This is what the linked post says: It’s possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract But don’t expect very good results. Academics have typically gotten … Read more

Getting the bounding box of the recognized words using python-tesseract

Use pytesseract.image_to_data() import pytesseract from pytesseract import Output import cv2 img = cv2.imread(‘image.jpg’) d = pytesseract.image_to_data(img, output_type=Output.DICT) n_boxes = len(d[‘level’]) for i in range(n_boxes): (x, y, w, h) = (d[‘left’][i], d[‘top’][i], d[‘width’][i], d[‘height’][i]) cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.imshow(‘img’, img) cv2.waitKey(0) Among the data returned by pytesseract.image_to_data(): … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)