Straight Through Processing for Document Automation (33/42)

Character-level andWord-level AccuracyLimitations The focus on character-level and word-level accuracy generally results in good accuracy for converting images into searchable text. However, it provides very limited capability to output reliable confidence scores at a data field level versus at a character or word level. In other words, use of off-the-shelf OCR tools may get you accurate page-level data. And yet, withoutsignificantmodificationof theOCRtoolsthemselvesandadditional development on top of the OCR results, a project (that requires knowing when data is accurate or not) cannot attain straight through processing. Anotherproblemisthatmanysolutionsimplementalotofrulesthatfocus on validation of data, but these rules are run only after receiving output fromOCR.Sosolutionsmightcheckoutputagainstadictionaryorotherlist of expected values, or process the output using pattern recognition to detect if the output is accurate. These efforts don’t do anything for the confidence score itself; scores from OCR are not changed and therefore, they cannot be used to establish a reliable threshold. The only way to potentially improve reliability of confidence scores is to use this type of validation during the process of recognition, which can help the OCR enginemakeabetterselectionofpresentingthecorrect answer. 33 Straight Through Processing - Document Automation

Straight Through Processing for Document Automation - Page 33

Straight Through Processing for Document Automation Page 32 Page 34