2013-11-16

I am creating a desktop/winform application that reads tif/pdf payable invoices and extract all the invoice information to store into database.

I can read the standard barcodes(QR Code, Code39 etc), and some of the payable invoice' standard fields(Invoice Date, Company Name, Address) with OCR (ocr specific region of image) but unable to capture Line items, amount correctly.

I extract information in two phases:

1. Read specific regions based on the template(user mapped region
for specific fields)

2. OCR whole page and search for payable
invoice standard field names and values

I have idea about following 3 approaches:

1. Create a Template for one type of Invoice and process all invoices.

2. Nural network based engine which need to be trained with sample data to work it based on patterns.

3. Form processing, a kind of OMR. The OCR to look at exact same coordinates where fields were placed on form(during form desing)

Question:

How to extact payable invoice using OCR or some inteligent reader?

Primarily I look for some algorithem (C# + OCR engine)/ philoshpy of payable invoice capturing but reference to some SDK with same feature or solid kind of commercial product would be helpfull too.

I googled and found Abbyy FlexiCapture Engine, IRIS Capture & Extract somewhat promissing but mostly are based on templates, or training. They claim that no template or training required but nothing looks 100 auto capture.

Kindly refere some product (at least with free trial), SDK or Example/sample.

Show more