You are here: Home Solutions Census Census Methods

Census – Choosing the Right Paper-Based Data Capture Method

Population size is a major factor dictating the scale of a census project but the specific census data required and the method used to capture it will influence costs, schedule, and quality to a much greater degree.

Often in census, a single data capture method is chosen, each with inherent advantages and disadvantages.  Choosing only one method can limit the benefits and impose unwelcome compromises or extra costs, or both.

With almost four decades of practical experience in large-scale data capture, DRS believe that there is no single ‘best’ method.  We advocate combining methods to avoid compromise and most effectively meet cost, schedule and quality needs.  Our products and supporting services reflect this stance.

Review of paper-based data capture methods

Some countries choose manual data entry while others prefer OMR or ICR, scanning data from census forms.  DRS’ focus is on flexibility and efficiency when capturing accurate information from paper-based methods.

Manual data entry

Operators key data directly into a computer from completed census forms.  Some data entry can be computer-assisted with operators selecting predefined options.

Average rates vary from 5,000 to 10,000 keystrokes per hour.  In the 2000 Papua New Guinea census this equated to an average of 15 forms per hour for each operator.

advantages of manual data entry
  • local staff can be employed for the clerical and supervisory roles
  • no separating census pages, scanning or related hardware is needed
  • low software costs (e.g. CSPro, which allows direct data entry, is license-free)
  • disk storage requirements are relatively low – only captured data are stored
disadvantages of manual data entry
  • many staff are required, including data entry clerks, supervisors and IT managers
  • quality control involves costly ‘double keying’ and/or ‘sampling’ techniques
  • staff need to be motivated to maintain high throughput volumes over a long period of time
  • there are considerable logistical challenges: censuses can require several hundred tonnes of paper 

Key From Image (KFI)

KFI involves separating the census forms into single pages, scanning the pages on high-speed scanners and storing a digital image of each page in a central repository.  Data entry operators then key the information from the images into the census system.

advantages of KFI
  • as per manual data entry, except that separating the pages and scanning are now required
  • quality control systems, like those for manual data entry, can be deployed much more effectively
  • costs for double-entry and sampling are greatly reduced as it is far easier to route digital images around a work-flow than paper forms
  • optional expansion to manage backlogs using, for example, specialist off-shore agencies
  • a full digital archive of all census forms is available at the end of the project
disadvantages of KFI
  • paper storage is required until images are all keyed to allow for re-scans of problem forms
  • a sophisticated computer network with significant storage space is required

Optical Mark Recognition (OMR)

OMR requires specially designed and printed forms with tick-box response areas marked out for each question. They are scanned using OMR scanners which recognise the marks to immediately generate accurate data output.  No images are captured or retained.

advantages of OMR
  • high accuracy due to the real-time, automatic quality checks during scanning
  • high speed: the DRS PS900 OMR scanner is capable of scanning over 8,500 forms per hour
  • data capture and validation costs are well-defined and predictable
  • problem forms are out-sorted at source, not discovered later in the process
disadvantages of OMR
  • forms require accurate printing on high-grade paper
  • OMR forms are designed for machine recognition and may not easily accommodate creative form designs
  • tick-box responses are not best suited to all types of data capture (e.g. names)
  • enumerator training may be required to ensure accurate form completion

Intelligent Character Recognition (ICR)

For ICR, forms are scanned and images are captured in the same way as for KFI.  These images are then interpreted by ICR software to recognise handwritten text in predefined response boxes.  An automated workflow supports the character correction and field validation processes.

advantages of ICR
  • forms designed for ICR processing are easy to complete
  • forms can be printed locally as the print quality does not need to be as high as for OMR
  • ICR works very well with numeric characters and comparatively well with alpha-numeric ones
  • more varied information can be electronically captured, compared to OMR
disadvantages of ICR
  • ICR software will not automatically recognise all hand writing
  • more clerical staff are required than OMR, although fewer than KFI
  • a correction workflow is needed, sensitive to form content and handwriting quality
  • ICR software and the necessary computer infrastructure is needed
  • IT staff will need to be trained to support the ICR system

The value of combining OMR & ICR

Sometimes referred to as Image Mark Recognition (IMR), capturing OMR and ICR together requires a scanner that can process both OMR data capture and imaging at the same time.  Forms are designed to the same quality standards as OMR forms, but response areas present a mixture of OMR and ICR fields.

Where a simple answer is required, such as short numbers and ‘yes/no’, OMR is used.   More complex responses such as names, addresses and large numbers are captured using ICR response areas.

In a single pass, OMR data are immediately and accurately recognised and captured while images are saved to a central repository.  OMR data are ready for immediate export while images go to a separate workflow for capture of ICR data.

Optionally, images can be used for direct keying (KFI) of selected fields.

The OMR quality checks ensure stored images are fit for purpose, so downstream error handling is much reduced.  If a form fails quality checks, it is out-sorted for manual entry.  Moreover, poor images are not allowed into the workflow.  This is a significant improvement over pure image scanners which save only a picture without any means to test its validity.  

advantages of combining OMR & ICR
  • the accuracy and speed of OMR can be combined with the flexibility of ICR and/or KFI - OMR techniques add significant quality to the ICR workflow
  • the compromises are limited – almost any type of data can be captured
  • reduced correction workloads by using ICR only when required
  • ability to ‘fast track’ selected fields, leaving more complex and/or less important data to be processed later (e.g. early reporting of household totals, while leaving ‘occupation’ to be provided later)
disadvantages of combining OMR & ICR
  • scanners are needed which can process OMR and ICR data at the same time   
  • greater care and planning are needed for the form design
  • multiple workflows are needed to manage OMR, ICR and KFI

Conclusions

Significant benefits have been proven when data capture methods have been combined to capture census data from paper-based forms accurately and efficiently.

Combining OMR and ICR methods limits the compromises inherent in using only one of the methods, while offering a greater range of solutions potentially suitable to local requirements and environments.

DRS appreciate that no single data capture method supports all the needs of a complex project such as a census.  The DRS PS960 Image Mark Recognition scanner has been designed to combine OMR and ICR technologies and has been proven to deliver cost-effective, accurate and efficient results for census projects around the world.

For more information about our work in census, see our Census case studies.