AI Content Chat (Beta) logo

Straight Through Processing - Document Automation Degree ofVariance Whiletherearealwaysdifferencesbetweentwodifferentdocumenttypes,therearealsopotentialdifferences between two documents of the same type. For instance, the document class of “Credit Report” can be considered a single type. However, within that type, there are as many variations in terms of data and layout as there are organizations providing credit reports. That is, there is no single format where we can always anticipatethe same data. Asaresult,therearedifferentkeyattributesthatmightindicateacreditreportfromExperianversusonefrom Transunion. Just like the potential for error when we’re dealing with multiple document types, the degree of variancewithinadocumenttypeintroducesthepossibilityoferror. Available Information Somedocumentsareeasyto classify just by looking at them. For instance, receipts have a typical shape and data. Invoices typically include tables somewhere in the middle of the page. Other documents require more analysis to determine the correct document class assignment such as text-heavy agreements. As a general rule, the more attribute-based information that is distinct to a particular document class the better. When document classes combine many different and distinct attributes, we can realize fairly reliable results. Examples of Document Types For instance, an invoice can be distinct based on the layout (table in themiddle, numeric data on the bottom right and address block on the tophalf), text (presence of the word invoice), and non-text data such as a logo. For an agreement, we rely much more on text that might be shared with other document types so the ability to correctly assign the document class is hampered. Generally, classifiers of all types do better when a document class has a distinct set of attributes and the more, the better. Most document classification projects—even complex ones such as mortgage classification—can get 70% or more STP with enough sampledata, time for analysis, configuration and refinement. Generally speaking,your classification results drop by a fraction of a percentage with each new document type, but the calculation is not 25 linear. A few documenttypes can achieve 90% or more while 500 to 800 may get somewherearound 70%STP.

Straight Through Processing for Document Automation - Page 25 Straight Through Processing for Document Automation Page 24 Page 26