How Intelligent Capture Utilizes Machine Learning for Template-free Training
There has recently been a lot of chatter in our industry about machine, and I want to share a few thoughts about how ImageSource’s ILINX Advanced Capture platform tackles this topic. Our approach is to deploy a neural network-based document processing model that does not rely on templates. Our machine-learning platform supports custom-developed content classification projects with much faster turnaround than traditional rules-based models. The result is significantly faster time-to-production with more reliable and accurate results for our customer partners. ILINX machine learning offers:
- Dynamic data location and extraction of information on complex documents
- Image comparison to support check fraud applications
- Feature extraction and classification to support medical imaging diagnostics
Our solutions leverage machine learning to create pre-built form classification algorithms in the lab, which provides a more-flexible and efficient way to develop new document processes.
For the sake of this article, I’d like to focus on the goals of almost every AP automation project:
- Reduce paper handling and workload
- Simplify processes down to one system for all invoices and other types of documents whether paper or digital
- Gain visibility into where each invoice is in the process
For the initial document discovery, a technique called “clustering” can be used to automate the logical grouping of like documents. Clustering, in this example, refers to different categories of like invoices, checks, receipts and remittances. Documents can be organized automatically. Invoices from one vendor can be grouped together, such as receipts to travel documents. The result is a set of documents grouped by likeness that can then be further evaluated.
Next, each cluster, if part of a required document, can be given a document type (or class). The training set can then be imported into the machine learning ILINX solution designed to automatically identify key characteristics of each document type (often called “feature extraction”). This trains the neural network for each document type. When performance is not ideal for a specific class, the customer can add those misclassified or unclassified documents to the class sample set to “re-train” the neural network.
Data extraction is simplified by taking sample invoices that have been processed, along with the data required for each document. Together, these automatically train the software to locate the matching data and derive positional algorithms for each data field. The software uses the processed data for each page and locates the corresponding data on every document. The solution will do this for each sample and then automatically create algorithms based upon exact location, changes in placement across each example and relative position to other data, among other elements. The knowledge worker simply examines the results.
The technology used to configure the system also makes real-time adjustments. Complicated projects that typically would take weeks, if not months, are significantly reduced. Machine learning technology streamlines the manual processes used in production and helps reduce overall labor costs.
This effort can be applied to automate both paper-based and electronic document-based processes in a single workflow.
by Terry Sutherland, CEO, ImageSource
If you would like to learn more about how ILINX machine learning can automate your business please contact us at inforequest@imagesourceinc.com or