Vetting ABBYY ‘Keen Eye’ FlexiCapture at ImageSource
First off, ABBYY means “keen eye”, an apt name for a product that dynamically and automatically captures and processes widely disparate documents. Powerful document recognition separates and classifies docs, and state-of-the art optical character recognition rips the data from the images. I like the motto that pops up on screen – “take the data, leave the paper”. I love doing just that, sending paper briskly off to start its next recycled life. It’s the greenest thing to do, especially when compared to filling endless cabinets and long-term off-site storage facilities.
When you want to recommend, sell, support, and solve major customer problems with ECM software at ImageSource, due diligence mandates a thorough feature review and testing. I’ll describe some of the steps I was involved with in this process for ABBYY FlexiCapture – but mine is but a single slice of the vet team pie. Development teams and other engineering teams performed specific examinations to answer questions about integration, APIs, and more narrow capabilities to solve unique problems faced by eager customers. Also, ImageSource staff with a variety of titles took a week-long training course with intensive labs. Unfortunately I missed the class but was given the opportunity to spin up for a pre-sales demo last year, which was a lot of fun.
So here’s a peek at our process:
Laptop Install
First things first! I like to be able to run new software on my laptop whenever possible. This frees me from all bandwidth and location constraints. I can easily focus on the vet effort on a plane, down by the river, wherever and whenever. ABBYY FlexiCapture has a convenient ‘Standalone Installation’ which gives you access to all the key components on one box.
Obtain Sample Images from Client
In this case we gathered dozens of hardcopy invoices from a large international corporation. The images were not pretty and included originals, copies, printed faxes, you name it.
Ascertain Server Needs
After reviewing the ABBYY documentation we set the requirements for our labs – memory per server, disk space, software required, scan station requirements, scanner requirements, and required operating systems.
Spin Up VMs
Thanks to Mike Peterson we had three servers up in no time.
Convening the Team , Locking Down the ‘War Room’
Gene Eckhart, Jeff Doyle and I met in our Olympia office for a week. Gene secured the war room where we periodically met with developers, project managers, engineers, and principals. Most of the time it was the three of us banging away.
Lab Software Install
Now we installed ILINX Capture on one server, ABBYY ‘s ‘Distributed Installation’ on another server, and SQL server on the last. This architecture would mimic what we’d encounter in the field – and also the standalone install wouldn’t cut it as it doesn’t scale and it uses SQL Express as a support database. As installed, we can easily add more servers for high-volume stress testing. By running a WebEx all week we were able to record every moment of each day’s work, easily pass the focus from machine to machine, and allow others a view of what we were doing who were remote. We involved ABBYY tech support when we had a question and felt we could speed up an installation process. Turns out we could, and it was great to have the technician join our session without delay and see what was up. Also, as we installed we meticulously kept a running log of any issues – however minor – we encountered. At the end of each day Gene led a review session where we discussed and polished the invaluable ‘Lessons’ doc.
End-To-End Test
This was our ‘Hello World’ moment – we set up communication between ILINX Capture and ABBYY, and created an appropriate ILINX Capture workflow. Then we created a simple FlexiLayout, exported it, imported it into FlexiCapture, and created a document definition and an export. We configured the scanner and the scan station and established we had end-to-end connectivity.
Building Generic Flexilayouts
One of the many goals of our week was to share baseline knowledge as well as advanced techniques for capturing documents. We identified two forms that were relatively easy to identify and constituted a large amount of the total paper volume. In short order we had FlexiLayouts and document definitions configured. Then it was time to tweak and refine. The ability to chain elements together worked outstandingly – find a keyword, then find the nearest zip code with the help of regular expressions. Then using out-of-the-box settings we could find the state, city, address, and addressee. Wow, powerful.
Building an Uber FlexiLayout
Now it was time to roll the sleeves and build a smarter FlexiLayout that could capture invoices from a variety of sources. We used advanced features such as FlexiLayout alternatives, element groups, object collection elements, and other settings to start recognizing semi-structured forms from a wide variety of sources. Then we added a little bit of FlexiLayout language code to help us “crawl” around the identified forms to find dates and monetary amounts that could sometimes be below keywords, or to the right, etc. We didn’t need to script any validation rules for our purposes, but I showed some script I had created prior to our meeting . A quick unit test showed great results – we now had stepped away from a model where each form had to have its own FlexiLayout.
Running Recognition Tests
We changed our lab coat to testing hazmat suits and ran many batches of documents we had used in development as well as documents we had never looked at before.
Recording Results
While never a thrill, here we benefitted from a spreadsheet created by Jeff Martin, Gene Eckhardt and Brandon Konen that allowed easy entry of recognition results. This is known as our “Advanced Capture Analysis and Comparison Tool”, highly regarded in our ranks. The data was automatically crunched allowing us to very quickly establish baselines, compare our scan results with other products, share our results with coworker and principals, etc.
Lessons Learned Doc Revisited
It’s a privilege to be able to work with industry veterans such as Jeff Doyle and Gene Eckhardt on a project such as this. They brought years of experience with them to improve every process we covered. While evaluating the Lessons Learned doc, they were able to extrapolate possible impacts in environments and scenarios they have seen in the field. They also add fresh mitigation alternatives to work through problems encountered. Our Lessons Learned docs are part of a valuable and large knowledge base that has been added to at ImageSource for year after year.
Findings and Conclusions Write-Up
After a demonstration to some coworkers needing to ramp-up on our configuration, we collaborated to create a summary document and here Gene took the lead. We were able to draw on the Lessons Learned doc, the Advanced Capture Analysis and Comparison Tool, and meeting notes to piece together our findings and quantify our conclusions. The summary outlined the scope of our efforts, including excluded activities, our environment and products tested, results, conclusions, general observations, and Best Practice recommendations.
It’s one thing to kick the tires on a car before purchase. But a methodical, thorough and thoughtful approach is the norm for analogous software tasks at ImageSource.