Document Scanning and Some Best Practices

There are a number of variables to consider before determining the document capture best practices for an organization. The first question I would ask is: Is this a departmental/workgroup or enterprise scanning solution? The second question I would ask is:  Is this an intelligent capture solution? The definition of ‘intelligent capture’ I use is: the ability to automatically separate and extract data from documents. Those variables should always be in the forefront of your mind when designing the capture solution. Setting those variables aside, let’s think about why we are even scanning in the first place. The goal of any capture solution should be to save money, where that money is saved varies from organization to organization but the goal is the same. Scanning documents has a cost inherently associated with it; the goal of any solution design should be to walk that line between the cost associated with input and the savings associated with retrieval/Business Process and come up on the winning side. There are many things that could be considered a blanket ‘best practice’ techniques for document capture but I believe the end goal (where you see the savings) determines the priority of those best practices.

For any scanning solution, we have to make it easy. We need the application to be easy to install, we need training of end users to be simple; we basically need that ‘big red scan button’.  Certain software suites are more conducive to meeting some of these goals. Kofax Capture for example has a ‘big blue scanning button’, it is also quite easy for end-users. However Kofax Capture has a very cumbersome installation that can be difficult to deploy. Another product I deal with is ILINX Capture . ILINX Capture has a straightforward interface that is easy to train end users on. ILINX capture is a web based solution so deployment is far simpler than a standard client/server application. Determining which product is right for the needs of the scanning solution can be the first hurdle to cross over in the creation of a capture system.

Along the same lines of ‘ease of use’ is the idea of the metadata schema. For any ECM system the old adage ‘garbage in, garbage out’ applies. When capturing documents, it is best to make the association of the metadata (often referred to as indexes) to the image as simple as possible. Things such as pick-lists, database lookups, and programmatically enforcing business rules come in to play here. The goal in capturing the metadata is to make the documents easy to retrieve down the line so anything that can be done to enforce quality data can be beneficial. The goal of cost savings must be thought of here as well because the time associated with ‘indexing’ a document is a system cost that must be mitigated as best as possible.

When dealing with intelligent capture solutions image quality is of further importance. In standard archival scanning a 200 DPI image should suffice; when trying to extract data and separate documents automatically, 300 DPI or higher should be used. Think of it like this: just because you can read it, doesn’t mean a computer can. The more data an OCR engine has to work with the better the results. With an increase in DPI there is a slowdown in the capture process: scanners run slower, images process slower, required storage space on the back end increases. These are all things to consider.

In the end any ECM solution should save money and increase efficiency. Spending a little extra time when designing the capture portion with all the project goals in mind should pay dividends. A few of the things to keep in mind are:

  • Ease of use
  • Quality of metadata
  • Quality of image

Many of the breakout sessions at the Nexus ‘09 will touch on these points and many more. In Michelle Semple’s break out session she will discuss the value of in-process capture and the benefits of a web-based capture solution like ILINX Capture. In Sophia Marchi’s break out session she will discuss production scanning, some of the pain points involved, and how to maximize productivity. There are many more sessions at Nexus ‘09 that will touch on document capture and how to effectively implement or improve existing systems.

John Linehan
ImageSource, Inc.