Intelligent Document Processing Stack | Cognitive RPA

INTELLIGENT DOCUMENT PROCESSING STACK

OVE R VIEW This eBook provides a primer in intelligent document processing (IDP) technologies from the advent of capture to where it is today. It also addresses the differences between intelligent capture and Optical Character Recognition (OCR), when to use OCR and when to use intelligent capture, the interpretation methods and what matters most depending on your organization’s needs and objectives. Machine learning is an essential component in the most advanced intelligent capture. Here techniques are reviewed that don’t require machine learning as well as when machine learning is best applied in intelligent document processing. Supervised and unsupervised learning as well as artificial neural networks are addressed. Understanding each technology in the IDP stack applied to document preprocessing, data location and extraction as well as data verification provides the essence of what the intelligent document processing stack can be for your organization. Building a customer IDP stack, buying out - of - the - box or leveraging off - the - shelf SDKs / APIs are also explored here. These options along with how to futureproof your document processing for moving forward conclude this eBook.

1. What is intelligent document processing ? ....................................................................... 4 Intelligent Capture : It all started with taking pictures Advent of Forms Processing Arrival of Intelligent Capture Expert Systems and Their Challenges Intelligent Capture Gains Cognition Portability of Inferences True Machine Learning 2. What role does OCR play within intelligent document processing ? ....................................................................... 9 Extracting Good Data from Bad Intelligent Document Processing : Interpretation Methods Focusing on What Matters Most 3. Is Machine Learning all the same? ................................ 11 Techniques Outside the World of Machine Learning Machine Learning Applied to Intelligent Capture Supervised and Unsupervised Learning Artificial Neural Networks Machine Learning Models Moving Forward 4. What Technologies Are Involved in Intelligent Document Processing ? .................................................. 14 Document Preprocessing Data Location and Extraction Intelligent Document Processing : Its Essence Building Intelligent Document Processing from Scratch Machine Learning Models Moving Forward Table of Contents 3

These days, it’s harder to find a technology solution, whether hardware or software, that doesn’t wrangle the words artificial intelligence, machine learning or cognitive into its description. It’s easy to understand why progress in digital assistants and other automated capabilities has led to an intense interest by organizations to select solutions with the ability to learn. Unfortunately, as with any trend in technology, while organizations rush to avail themselves of these new capabilities, there is the tendency by the solution providers to confuse the market with too many buzzwords and claims while overselling capabilities. All of this leads to the dreaded Gartner trough of disillusionment . Advanced document capture is no different. Visit any vendor website (including Parascript’s site), and you come across words related to artificial intelligence. So, how is one to truly understand what is meant by applying AI to Intelligent Capture and what does it really mean in terms of benefits? To provide an answer, it helps to understand the history of Intelligent Capture . What is intelligent document processing ? 4

Intelligent Capture : It all started with taking pictures Several decades ago, the document scanner came into being with the benefit largely focused on the ability to make documents portable and easier to store. The benefits were greatly enhanced with the increased use of email and then with the public Internet and Web. Organizations could easily scan, store and share document - based information. However, this was hardly Intelligent Capture as we know it today. In the mid - 1990s, with all of these documents digitized, businesses were eager to automate the process of describing them to improve access. Up until then, most organizations manually created the equivalent of the library card catalog, providing index data (known as metadata) to each document to support the ability to better organize and retrieve the data. Advent of Forms Processing Enter forms processing where software introduced the ability to designate the location of data on a document by supplying X/Y coordinates and applying OCR to these locations. The result was the ability to efficiently add metadata automatically to larger volumes of documents, only relying upon staff to deal with the review of the metadata and occasional corrections. However, there are only so many instances of standardized forms. Organizations increasingly needed to more efficiently manage other non - standardized, more complex documents. 5

Arrival of Intelligent Capture Enter Intelligent Capture . Intelligent Capture is designed to tackle documents known as semi - structured and unstructured. Examples include invoices, bills of lading and other document types where the data is similar from document - to - document, but the format and location of data are highly variable. Intelligent Capture introduced techniques such as using keywords and pattern matching algorithms like regular expressions to both classify documents and locate the information held within them. These techniques, which inform the software how to operate on specific data, were part of a realm of AI known as expert systems. Expert systems require a subject matter expert to provide the system with specific instructions which are stored in a knowledge base. As documents are presented to the system, the software uses the knowledge base to determine the correct course of action. When results are reviewed and either verified or corrected, the knowledge base is updated. Expert Systems and T heir Challenges While Intelligent Capture based on expert systems provides a leap forward in terms of the ability to work with more complex types of data, the problem with expert systems of any type is that the techniques on which they have been built has created a significant increase in complexity. Gone are the simple templates that mapped the precise location of data in favor of algorithms encoded by one or more people to manage classification and data location. All these rules require a means of storage, which meant use of a database. Over years of use, the database can become very large by amassing more and more rules. Not only are the systems more complex, but so, too, are the documents on which they operate. So the effort required to build rules also is significantly more complex and error - prone. In order to have a system that can reliably classify documents and locate data requires analysis of a large set of representative data and a lot of time to encode each rule. Once encoded, the system can be fairly brittle. New documents or new variants of known documents means system updates. With these systems, it is not unrealistic to spend two to three times as much on configuration as the cost of the software itself. Once in production, these systems often degrade over time as new document variants are encountered and become more expensive to manage. 6

Intelligent Capture Gains Cognition In practically any technology solution, the word cognition equates to the area of AI known as machine learning. From here on out, I will refer to cognitive as the machine learning branch of AI. With machine learning, the aim is to let the computer take over the process of the development of rules approaching them as inferences based upon reasoning and extrapolation instead of hard - coded sets of if - then statements. The benefit is obvious: no more tedious and brittle rule - making. Another benefit is that machine learning can easily parse significant amounts of data to develop the inferences leading to more comprehensive and reliable rules. The rules are often more abstract and flexible, more closely emulating the process in which humans solve problems. For instance, using expert systems if I encode a rule to identify a purchase order by the presence of the words “purchase order” in the upper right - hand portion of the document, then purchase orders that do not have those precise words in that precise location will be left out. With machine learning crunching on a large sample set, it develops a more abstract view of purchase orders that can contain many different hints or clues about how to discern a purchase order from a remittance and vice versa. Just as importantly, the same machine learning process used to configure the system can be run again and again, allowing the system to adapt and improve. All of this results in the ability to manage a much larger variety of document - based information, increases the likelihood of a new variant of a purchase order being correctly identified, and allows the system to adapt and improve. Unlike expert systems - based approaches that increase technology burden over time b ecom ing more costly and less valuable, machine learning - based systems — with their ability to adapt and improve — grow more valuable over time. 7 7

True Machine Learning While machine learning is a part of artificial intelligence , not all AI systems involve true machine learning. Many are based on the reliable, but the complex and brittle expert systems form of AI. T here is a place for expert systems . However, as data becomes more complex and requirements increase in terms of precision of the location and accuracy of resulting output, machine learning - based systems become necessary. They will be required in order to truly achieve both high levels of accuracy and automation. Portability of Inferences Another key difference with machine learning - based Intelligent Capture is that the inferences are often very portable, both physically and logically . Results of machine learning inferences, often called models, are stored abstractly in a different form than a traditional database . This means that the resulting model can be exported from one system and imported into another fairly easily (provided both systems are the same type . ) Additionally, the trained model can operate on a similar corpus of document - based data on which it was trained. This means that a project using a system trained on one set of documents within a department or organization can be used to process a similar set of documents, but within another department or organization. It can truly be a case of the rising tide lifting all of the boats. 8

What role does OCR play within intelligent document processing? Is OCR the hard part? Is text parsing the same as intelligent document processing ? To answer these questions, let’s start with a popular meme. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. This meme while it is not entirely accurate . (T here is no record of Cambridge University staff doing this research . In addition, a mple examples of preserving the first and last letters resulting in words that are unreadable exist . ) However, the above meme does provide a good illustration of the difference between OCR and cognitive capabilities. Simply stated, while OCR certainly has machine learning at its core, its job is to simply transcribe the text in an image into machine - readable formats. If you were to run OCR on an image of the above, you get the following: By design, OCR doesn’t attempt to make corrections because making corrections implies knowledge of the context of the information. If the information above isn’t embedded within an image, you don’t even need OCR to provide a transcription. 9

Extracting Good Data from Bad So how do we extract good data from the above meme? First, there isn’t a cognitive solution that can take 100% misspellings and make instantaneous corrections like the human brain can. However, in the domain of Intelligent Capture , most of the effort is placed on what to do WITH the information contained within the document. You might say, “hey, we can run a spell - checker to correct the words.” Nice idea, and this is typically Step 1. And yet, this won’t solve all problems as many words just aren’t contained within standard vocabularies. Intelligent Document Processing : Interpretation Methods Interpretation methods using fancy names like n - gram can determine probabilistically the next word or words in a sequence. These techniques are especially useful with complex multi - word data. Using this and other techniques, intelligent document processing deals with the presence of various specialized words and phrases often contained within a given structured form field or unstructured document where use of specialized vocabularies or general dictionaries fall short. Further distancing intelligent document processing from OCR software is that intelligent document processing also attempts to reduce or obviate the use of OCR, using it only when necessary. Just like a human would, these systems learn where needed information is located and what clues help to find it. Instead of reading an entire document, intelligent document processing focuses directly on the information. Focusing on What Matters Most So, instead of performing OCR or parsing the entire document text, the system skips over irrelevant information and focuses on specific sections. This helps to avoid unnecessary OCR or full - text parsing that can slow - down the entire process. If documents are born - digital, then OCR can be skipped altogether to immediately interpret the document and extract the needed data. By now, it should be clear that while OCR is an important step within intelligent document processing for scanned documents, it delivers only text, not interpretation. If the documents are digital, OCR is not required at all. To move to a capability where document - based information can be used within an automated transaction, several levels of capabilities are required in order to go from text to real structured data. 10

Is Machine Learning all the same? Techniques Outside the World of Machine Learning First, It is easiest start with the types of techniques that do not belong in the world of machine learning. Rules - based approaches developed by humans are not machine learning. When applied to Intelligent Capture , rules - based approaches generally fall into two categories: explicit rules — such as supplying the actual location of expected data on a given page (often referred to as “templates”) — and more - lenient rules using regular expressions and other types of pattern matching techniques such as “find the value that has nine numerals and label it as a social security number.” There are many variations of these rules - based approaches. However, in every case, these rules require a person to construct them. There are novel ways to construct these rules that don’t incur as much up - front effort such as the use of a knowledge base where individual staff make corrections. These corrections turn into specific rules. In this case, there is no machine actually doing any “learning.” Machine Learning Applied to Intelligent Capture With that out of the way, we can focus on machine learning techniques as applied to Intelligent Capture . The two most common areas to apply machine learning are in document classification (or document ID) and data extraction tasks. For document classification, the objective is to supply the software with examples of each document type. The system goes about identifying key unique attributes for each document type so that it can reliably perform class assignments. For data extraction, the objective is to provide the software with tagged examples of document - based data that need to be found and presented. Again, the system analyzes the samples and derives its own methods of reliably parsing documents to find needed data. Let’s delve into the most common machine learning techniques, explore how they are used and where. As covered in a the previous section, “cognitive” more than likely refers to a branch of artificial intelligence called machine learning. Machine learning has been around for decades in one form or another. There are many different machine learning techniques, each with its own strengths and weaknesses. 11

The other type that can be used in Intelligent Capture is called unsupervised learning. In reality these algorithms do not learn and create logic. Instead, they are employed to find structure in data such as grouping documents by likeness. There is no need for training data because there is no function to be learned or preserved. These algorithms can be used in Intelligent Capture to segment documents based on likeness prior to using other machine learning algorithms to reduce the number of potential variables. Reinforcement learning is a third area of research that is growing in awareness in the industry, but there is no real practical application for Intelligent Capture as of yet. These algorithms are more suitable to problems such as autonomous vehicles and game theory. The DeepMind Go program is a good example of reinforcement learning. Supervised learning requires input sample data along with the “answer key” that describes the desired output. Together, these are often referred to as the training data. For document classification, it would be a set of documents along with the actual class to which each belongs. For data extraction, it might be the document along with the location and value of each data field that needs to be extracted. From here, the software develops its own models for how to optimize to achieve the desired output. The most common types of machine learning algorithms are classification and regression. Classification is commonly used in (wait for it!) document classification where the class assignment options are limited. Regression is used to handle scenarios where there could be many potential answers. There are many variations of classification and regression algorithms that can be used and/or combined to optimize results. Supervised and Unsupervised Learning Within machine learning for Intelligent Capture , there are two common categories: supervised learning and unsupervised learning. 12

Artificial Neural Networks Digging deeper, within the supervised learning branch, there are several underlying machine learning models, the first of which is perhaps the best known. It is called an artificial neural network (ANN), this type of model is loosely inspired by the human brain in which a network of neurons is involved. ANNs take input training data and process it; each node can communicate with other nodes to influence the final output. Over a large amount of data, some nodes become “stronger” while others get “weaker,” based on successful output. Over time (and a lot of data), the ANN can become better at providing output. Machine Learning Models A form of ANN that has garnered increased excitement is the Deep Learning Artificial Neural Network. These networks are roughly similar in design as traditional ANNs, but have the ability to process more data. Therefore, they are better at more complex problems. Arguably, deep learning networks can perform much better than other machine learning models. However, the weakness is that they require a significant amount of training data so they are not always the best at particular tasks. Another popular machine learning model is the support vector machine (SVM). SVMs are used mostly for classification and regression analysis where the goal is to assign an input to one or another group. In many respects, SVMs work best at document classification. Bayesian networks are a third model that can be applied to Intelligent Capture . This type of model is probabilistic . It can deduce from input data, the probability that a given set of features belongs to a particular document type or if the amount at the bottom of the page is the total amount . By now you might be asking “where is NLP in this discussion?” The answer is that NLP or Natural Language Processing is an area of AI devoted to building systems that can interpret language. This task may or may not implement machine learning, but increasingly, machine learning is involved because there is often too much data to process. Ultimately, NLP is an area of applied AI, not a specific technology or technique . As such, it is another approach that can be used to aid with classification or data extraction to automate document - oriented tasks. Moving Forward As the adoption of various models grow s , Intelligent Capture systems will become much more self - sufficient at configuring themselves using a variety of training data inputs and adapting to gradual changes to documents. Just as with any application of machine learning, the most important prerequisite is training data . Without it, there is no learning and ultimately , no automation. As such, use of real machine learning in Intelligent Capture is still in its infancy with much of it used to tackle specific tasks such as document classification or handwriting recognition rather than a black box that does everything in automated fashion. But this is just the beginning. 13

Delving into the key technologies involved in intelligent document processing and what areas they support requires some context. We will use the standard document capture workflow of input, preprocessing, document classification and separation, data location and extraction, validation, verification and then output. That’s a lot so we’ll cover these areas at a higher level and limit the discussion to areas that are specific to intelligent document processing , leaving out the input and output stages. Document Preprocessing This Document Preprocessing stage is traditionally limited to handling documents that are captured with cameras such as scanners or mobile phones. The reality is that any time a document moves from digital (presumably created by Microsoft Word or another application), to analog (printed out), and then back to digital again (via a camera), there is loss of fidelity. This loss in fidelity reduces the amount of automation that can be achieved. If a system must use OCR or another form of recognition, it performs best when the data on which it operates is in pristine condition. To be in pristine condition means that data is very clear, and there is no “noise” in the form of blotches or even tiny speckles that can be introduced by the camera or by problems with the paper such as bends or folds, dirt or ink smudging , or the occasional spilled beverage. The ideal scenario is that the data to be processed is unadulterated. There are other problems associated with the image quality such as low contrast, excessive tilting (often called “skew”) in one direction or another, images that are upside - down, or images that are stretched, not the proper size or resolution. What Technologies a re Involved in Intelligent Document Processing ? 14

Preprocessing Technologies All the technologies at this stage are designed to rectify common (or uncommon) problems to get the document to as close to that of the original, pristine version as possible . Specific technologies or techniques involved include : 1 De - skewing: This is the process of taking an image of a document that is scanned at an angle and reorienting it so that all text is horizontal. The algorithms involved can be simple, relying upon the image boundaries, or it can involve analyzing the text in order to orient the image. 2 De - speckling : This is the process of analyzing an image and removing pixels that, through image analysis, do not appear to be part of the original document. 3 Scaling: This is the process of changing the size of the image as it is displayed. This function allows a document to resemble the same size as what is expected. Some software does a better job than others in terms of correcting the degree of scaling. 4 Binarization: This is the process of converting a color or grayscale image to black - and - white. Black - and - white images provide the highest level of contrast for the recognizers. 5 Resolution setting: This is the process of changing the resolution of the image so that it can conform to the parameters of a given recognition process. 15

Document Classification and Separation If you have more than one document type that you need to process within a single workflow, some type of document classification is required to allow for documents to be identified and routed to different processes . F or instance , an accounts receivable process may involve a remittance document and check, each of which has data that needs to be extracted, validated and exported into an accounting system. Most document classification in production uses a rules - based approach where a subject matter expert analyzes key attributes of each document type, such as presence of keywords or specific data, and then constructs rules that dictate document type assignment. 16

Document Separation Closely related to document classification is document separation. Traditionally, documents (which were scanned) were separated by the presence of blank pages, barcodes or some other identifier that the system could use to discern between one document and another. These identifiers are typically applied manually during what is called batch preparation. Increasingly, more documents arrive already digitized to an organization , whether already scanned or born - digital. For documents that exist as individual files (e.g., a Word or PDF file), there is really no need to separate them. Many cases exist where multiple documents are stored as a single file. For instance, a patient claim often has the claim form and supplemental documentation . Another example is a mortgage loan file that can have from 50 to 500 or more documents stored within a single PDF. In these cases, it is impractical for manual insertion of document separators so something else must be done. A rules - based method is often the favored approach because it is simple to understand and implement. However, as with any rules - based system, there is an unfortunate tradeoff between comprehensiveness and cost with most organizations opting to minimize cost s. This typically result s in a lot of errors. Text and Visual Classification The intelligent document processing variant of document classification involves machine learning where instead of someone manually encoding rules, a set of algorithms parse and analyze documents to identify key “features” that are reliable enough to distinguish one document type from another . There are two basic types of classifiers : text and visual . For text classification, the algorithms analyze different characteristics beyond just keywords. They analyze frequency of terms, proximity of one term to another and more. For visual classifiers, they evaluate the graphical elements of the document, ignoring text altogether (OCR is not required here). Aspects such as pictures, layout of paragraphs or tabular data and even logos can be considered. Overall, the benefit of machine learning algorithms apart from the fact that they relieve us from manual work and upkeep, is that it can analyze far more data and identify features that you or I might easily miss. Also, the algorithms can be continuously updated resulting in more stable, reliable performance. Enter machine learning - based separation. Just as with document classification, we hand - over the tasks of analyzing documents to computer algorithms. This time, instead of finding attributes that identify a document type, the underlying analysis focuses on features that indicate first, middle and last pages. Page numbers, titles, headers and footers all come into play in addition to other attributes. The result is a higher level of fidelity due to a more comprehensive analysis without the significant attendant costs. 17

Data Location and Extraction The next technology area involved in intelligent document processing is data location and extraction. While many organizations are thrilled to automate document classification and separation, organizations often also require additional metadata that is stored within the documents themselves. For instance, an insurance claim requires the claimant data such as name, social security number, address, services rendered and so on. There may also be the need to locate and verify information regarding the claim in supporting documentation such as a provider invoice. Handling Unstructured Data The objective in any process is to take unstructured data in these documents and use them in a more structured manner to shepherd a process from beginning to end. Historically, metadata was manually entered with other techniques introduced in the intervening years such as using templates for forms and regular expressions or keyword/value pairs for more complex data that is typically found in invoices, remittances, receipts and explanation of benefits documents. As with document classification, these data location and extraction techniques rely on the manual creation of rules . More recently, software vendors have introduced “loopback” mechanisms that allow for the creation of rules gradually during production by having staff handle errors and tell the system exactly where the data is located; often called a knowledge base. This method, while reducing the amount of upfront effort, has the same limitations of any rules - based system. Here machine learning can improve the process. Instead of creating rules manually, the system comprehensively analyzes documents to create a data model that reliably locates needed information, at a field or data element level, and then extracts and validates it. The result is a more flexible model than a brittle rules - based approach that also allows continuous updates to occur. Document Separation You might be wondering about OCR or where other forms of recognition such as handwriting (often referred to as “ICR”) come into play within the bevy of technologies involved in intelligent document processing . The reality is that OCR or ICR is a necessary prerequisite for image - based documents where either content - based classification or data extraction is required. However, the use of either is not mandatory. For example, if digital documents are involved, such as a searchable PDF, there is no need for OCR (or image preprocessing for that matter). OCR and ICR are simply the transcription of image - based document information into something that a computer can read. This information is produced as plain text without any concept of document types, document boundaries or data fields. OCR and ICR may be a necessary prerequisite for image - based documents, but the real heavy - lifting involves the critical steps of document classification, document separation and data location. Also, t he majority of OCR and ICR packages come fully - baked. This means that even though they use machine learning techniques to perform image - to - text transcription, the run - time software used in intelligent capture cannot continue to learn. There are some cases of deploying OCR/ICR that is able to learn in production environments, but these deployments are not the norm. 18

Intelligent Document Processing : Its Essence By now you’re probably realizing that there is a theme to all of this “cognitive stuff” – the use of machine learning applied to specific tasks. The reality is that machine learning itself is a tool just like any other type of tool. Without the proper application, it is meaningless. A hammer in a drawer is just a bunch of atoms. When held correctly and used with force against the head of a nail, it becomes extraordinarily useful. It’s the same with machine learning. No vendor, except for purveyors of machine learning toolkits, offer machine learning without applying it to a specific problem. One of the most popular applications for intelligent capture is with automating the identification and sorting of documents. There are a lot of processes that involve many different documents, sometimes even several hundred, that can be submitted without any organization. These include claims adjudication, loan origination and commercial logistics. If organizations are not manually processing these documents (and most are), they are undoubtedly using a rules - based process that attempts to identify incoming documents based uon specific, identified attributes. For instance, with mortgage documentation, a rules - based approach attempt s to mimic a manual process . B ut instead of looking at the overall document including the graphical orientation, specific keywords might be used to discern between a document establishing proof of income from a document providing information on assets. Even though a person might easily distinguish between a W - 2 and a bank statement, the rules - based approach relies upon the presence (or absence) of specific words or other textual data. So rules - based automation might look for instances of “W - 2” or “Total Income” for the W - 2 document while identifying presence of the words like “account balance” along with “account number” and “statement” might establish that a document is a bank statement. As you might suspect, the power of rules - based classification is directly tied to the amount of time spent by a subject matter expert (SME) reviewing available data, identifying key characteristics of each and then encoding the rules. For some needs, where there are only a few document types, a rules - based approach might make sense because it is typically simpler to implement. I n a case where there are a lot of document types, like 30 or more, and where characteristics of each might overlap, a rules - based approach will fall short. In a case where 50 document types are involved, and where there can be different versions of any particular document type, it is very probable that the rules identified for one type will overlap rules for another. It really isn’t practical (mostly due to the time required , but also because of the ongoing maintenance) to analyze each type and version and then verify that there is no overlap and to test and tune each one. 19

The Power of Machine Learning One of the strongest benefits of machine learning - based solutions, or as the industry is increasingly using, cognitive systems , is the ability to analyze a very large size of sample data to identify and record key attributes (often called “features”) of each document type that are compared against other document attributes to arrive at the most reliable set of features with which to reliably apply automation. Machine learning systems can detect even the slightest variances that might go unnoticed by SMEs. They can record a larger number and frequency of these key features to use the most reliable inferences to produce high quality results. This ability obviously reduces the associated costs , complexity and risk associated with manual analysis and configuration of rules, including upkeep. Cognitive classification turns potentially several hundred hours of effort into a “compute - time exercise . ” Better, more reliable performance at a much lower level of effort. 20

Building Intelligent Document Processing from Scratch What it takes to build a intelligent document processing solution from scratch is an important discussion because even after reviewing all of the various moving parts, some IT groups within organizations prefer to develop their own solution. They look to develop a tailored solution to address their organization’s needs vs. implementing an off - the - shelf product. With the availability of cloud - based capabilities that provide elements like OCR, classification and some level of data extraction – all offered as discrete capabilities – many organizations are seduced into believing the process of design ing their own solution will be simple. There are many aspects to the Build/Buy decision process that won’t be covered here because they are too generic. Let’s focus instead on the hidden elements of creating your own solution. 21

Hidden Elements to Custom Solutions There are two primary hidden costs associated with developing a custom capture solution: staff skills and OCR performance. The staff skills issue might seem like a traditional development skills acquisition problem. When it comes to creating software where the primary objective is high levels of comprehensive data accuracy, knowledge of software development is a necessary prerequisite, but is only a small factor. It may be easy to develop software that uses third - party capabilities such as Google Document Understanding or Amazon’s Textract to perform certain operations. There are two primary hidden costs associated with developing a custom capture solution: staff skills and OCR performance. Data Science & Machine Learning Expertise Most decisions to go with a customized solution are based on the need to handle specific problems where no ready - made solutions exist. In this case, the most offerings on the market force a trade - off between out - of - the - box capabilities and a custom project to deliver specific capabilities. There is no in - between. This results in development projects, which start out seemingly small, turning into large custom projects that often cost more than commercial software alternatives. The skills required to bring these complex projects to fruition require expertise in data science and an in - depth understanding of machine learning algorithms including when to choose one technique over another. Commercial alternatives offer flexibility with configuring the systems to meet very specific needs without the same significant investment in data science and machine learning skills. 22

OCR Performance Unbeknownst to most – even those with solid technical backgrounds – are the relative peculiarities associated with OCR toolkits and their cloud - based brethren. OCR is largely designed and used to convert image - based text into machine readable form. In order to perform that function, OCR software has been tuned at the character and word level to achieve high levels of reliability. The problem arises when an organization needs to find specific data within documents and output it in a structured format. There is a lot to consider. Start with the ability to reliably locate data. Many programmers might assume that it is simply a matter of applying regular expressions to the text. If you need a date, simply look for a format of XX/XX/XXX. However, what if there are many different date formats? Going down these obvious routes neglects a lot of key contextual data that significantly aids with this task such as spatial proximity of targeted data to other data, fonts of needed data and many other typically visual aspects. And then, there are the issues with the data output, especially with data called, “confidence scores.” Confidence Scores for OCR are different from those in intelligent document processing solutions. OCR provides confidence scores at the character and word level while intelligent document processing solutions provide confidence scores at the data field level. Analyzing scores at the field level is essential to successful intelligent document processing projects. There are even intelligent document processing solutions that cannot overcome the OCR confidence score problem when it comes to data field level outputs. This results in the need to manually verify every single data output. Where It Makes Sense to Build A Solution or Purchase One There are many use cases where it makes sense to build a solution vs. purchase one. While many different toolkits, SDKs and Web Services focused on OCR, classification and handwriting recognition are available to developers, the reality is that a intelligent document processing solution is more than the sum of its parts. Rather a lot goes into creating a solution that converts document - based information into structured data in a reliable, accurate manner. Most of these services perform better when they can be applied as out - of - the - box capabilities that do not require significant data science skills. Th is means that where organizations require solutions to specific problems of their organization, an off - the - shelf intelligent document processing software solution is almost always the best option. 23