Thursday, February 18, 2010

Come Together

By Mark Brousseau

At a time when businesses are focusing like never before on operations efficiency and corporate responsiveness, respondents to a recent TAWPI Question of the Week cited data extraction as the most important part of the capture process to automate. Data extraction is the process of interpreting data on documents for further processing or storage. Sixty-percent of respondents to the survey identified data extraction as the key area for automation, topping business process integration (20 percent), classification/sorting (20 percent), filing/archive (0 percent) and data validation (0 percent).

The results come as no surprise to ibml President and CEO Derrick Murphy, who notes that peaking recognition rates leave data extraction as the capture function with the most room for improvement.

But Murphy warns that achieving significant improvements in data extraction results will require organizations and their corresponding integrators to combine capabilities such as higher image quality, database lookups/verification, auto-classification, and physical sorting into an integrated business process. This is a big change from traditionally siloed capture functions that often resulted in simple picture-taking with downstream exceptions, Murphy notes.

Here's how Murphy sees the pieces fitting together:

• Image Quality -- There's no question that higher quality and cleaner images translate into higher read rates. While image quality can be affected by forms design (e.g. clear zones), the type of scanner an organization uses can also play a pivotal role. For instance, many scanners don't actually scan at their advertised rate; the manufacturers are referring to their output, not their real scan rate. "Image enhancement can only do so much," Murphy notes. In this scenario, a fuzzy image translates into a poor black and white image for recognition. To ensure high quality images, as a starting point Murphy recommends that users look for scanners that meet the 300 dpi (dots per inch) scan level (not output) suggested by most recognition vendors.

• Database verification -- Murphy sees increasing demand for database verification -- the process of using logic to utilize existing data to match up ICR results. These results can be used to automatically populate data entry fields or correct intelligent character recognition (ICR) misreads. Database verification is gaining traction in applications such as invoice processing and remittance lockbox processing, Murphy notes. So what's the appeal of this technology? Murphy says that while recognition read rates can now top 90 percent, the remaining misreads may require the manual keying of a lot of fields on a document. Database verification can automatically verify then populate/correct these fields, reducing manual keying as well as the potential for errors that it introduces.

• Classification -- Combining character recognition with sophisticated logic, auto-classification groups documents, reducing the time necessary to organize information and fill in data gaps. Better document classification drives improvements in data extraction rates, Murphy said.

• Physical sorting -- Despite the industry's push towards electronification, there are times when physically sorting documents still makes sense, particularly in complex data extraction environments, Murphy said. "If a document can't pass your validation or quality assurance processes, chances are, you will need to rescan the document," Murphy said. "The time to determine this is early in the capture process, and not after the documents have been boxed up or moved to another location while you have batches of work awaiting completion."

Taken together, Murphy believes these functions will help provide the improvements in data extraction results that respondents to the TAWPI Question of the Week cited as being critical to their capture process.

What do you think?

No comments: