College Transcript Processing
College Transcript Processing refers to converting a paper based transcript into an electronic transcript via software that OCR’s the scanned paper version, locates specific data within the transcript and saves that data for later use. The reason for processing a transcript via software is to improve the rate of data transfer to another system for storage and retrieval versus manual data entry by a data entry specialist. This is a somewhat difficult task due to the following reasons:
- Each and every College presents similar data in a very different format.
- Almost all colleges attempt to prevent the copying of the paper transcript through various copy protection methods. Most of these methods render the data on the transcript almost un-readable.
The data that is similar on a transcript falls into several main areas:
- College Identifying Information
- Student Identifying Information
- Session/Course Information
- Previous Colleges Attended Information
- Degrees Awarded Information
The data is similar but not the same on each college transcript. In addition, the layout of a transcript varies greatly between the various colleges. Session/Course data could take up the entire width of the paper for one college, but be formatted as multiple columns of data for another college. There are many, many variations that need to be taken into consideration when attempting to OCR to find and extract the data.
So far the Abbyy FlexiCapture 9.x software has been able to handle most of these issues out of the box. One of its most powerful features I am finding out is the scripting language to write rule, custom scripts and export scripts that can correct OCR issues and assist the Verification Operator improving efficiency and throughput.
The scripts for rules, custom scripts or export can be written in VBasic or Jscript. There is some documentation on the Abbyy classes and objects, but not a whole lot. Most of what I have done has been through trial and error or in specific cases from examples provided by Tech Support. However, what scripts that have been developed work well for correcting OCR issues and providing automated checks of extracted field data. Through Custom scripts there is even the option to use a Database lookup on extracted data and return other fields from the database to assist in providing a complete set of validated information.
This has been a learning experience but it is proving to be well worth the effort in getting the data off the paper and into the system used to evaluate a student for enrollment by cutting down on the man hours required under the old manual data entry.