KTM TDS Model Building

Are you tired of separator sheets?  Tired of wasted paper and countless hours of flipping through pages and inserting a barcode sheet at the start of a new document just to take it out after the batch is scanned or leave it in the batch and have more paper to store?  Why not have the computer do the work for you?  That’s the idea behind the Project Planner module in KTM.  There is a standard separation functionality built into KTM that works very well on structured and semi-structured documents but when you have more complex separation rules the Project Planner component of KTM is what you need.

This program is designed to create a template or “model” for the automatic separation which is then used by the KTM Server during the normal batch workflow.  This is why you might also hear this process referred to as “model building.” I want to give you a brief look at the setup of a TDS (Trainable Document Separation) model and how to integrate it with a KTM project.

The first thing that you need to do is collect lots and lots of samples.  The program requires that each class or document type have at least 50 samples.  Each document needs to be in tiff format and have its own folder.  Furthermore, documents that are multiple pages should be separated into single page tiff images and placed into their own folder.  The next step is to take your collection of document folders and group them into folder of different document types (these will become your classes in your KTM project).  This is a very time consuming process but it will help when you go to import in documents and you will see why.

Another thing that you should be aware of is that Project Planner requires an additional license and does not get installed with the normal KTM install.  Following the standard KTM install, you can find the Project Planner setup.exe located in the install media under the “Kofax Transformation Modules” and “Project Planner.”

After you have project planner installed and you have created a new project, you need to import those documents that were just sorted.  Once imported, there is a handy tool that allows you to select where the separation for each class, document, and page are.  This will allow the system to automatically create the classes and import the appropriate documents into each class

After the files are imported you will see the classes automatically created for this model.  The next step is to run all of the documents through the OCR engine in order for the system to be able to read the documents.  This process can take hours for larger sample sets so there have been times that I just let it run overnight.

When all the documents have completed running through the OCR engine, you can begin the cleanup process.  This is simply a matter of confirming documents are part of a class or not by checking the checkbox on each document.  You only need confirm enough documents so that the system is confident on the classification based on the samples provided.  As you can see from the screen shot, the bar across the bottom is color coordinated to show the confidence of a particular class.

Blue are the documents that you have confirmed, green means confident and red means unconfident.  As documents are confirmed the red bar will get smaller and eventually go away.  This cleanup process is complete when enough documents have been confirmed for each document class so that all of the red is gone.

The next step is to compile the information into a TDS model which the KTM project can use for separation.  This is done by creating two files in Project Planner.  The first is a classification file, or the mod file, that the system will use to distinguish what class each page belongs to.  You can either use a text classification or image classification.  The second file that needs to be compiled is the document separation file, or the ads file.  This allows the KTM Server to use the training provided in the cleanup step to know where to separate each document.

The final step is to link the model to your KTM project.   Open up a project in Project Builder and go to the project setting within Project Builder.  On the Document Separation tab, one of the options is to use the “Trainable Document Separation (TDS)”.  Select this and browse to the folder containing the mod and ads files.

When you click OK you should get a message that tells you that “The TDS project was successfully imported.  New classes were created according to the definition of the document separation model.”  If not already there, classes will be automatically created in your project.  You’re now ready to synchronize the project within Kofax Administration and publish the batch class.

In summary, by using KTM and combining it with the TDS model it will you save time and money by reducing the amount of document preparation required when scanning.  For example, in a recent install I worked with a company that had a whole room of employees (about 20) doing manual separation.  We installed KTM and used the TDS model for separation and now they only have 4-5 people doing the same volume of documents in less time.  This a very powerful tool that I would suggest to anyone who has a need for automatic separation of semi-structured and unstructured documents.

Brandon Konen
Systems Engineer
ImageSource Inc.

