Optical Mark Recognition (OMR) systems have been studied since 1970. It is widely accepted as a data entry technique. OMR technology is used for surveys and multiple-choice questionnaires. Due to its ease of use, OMR technology has grown in popularity over the past two decades and is widely used in universities and colleges to automatically grade and grade student responses to questionnaires. The accuracy of OMR systems is very important due to the environment in which they are used. The OMR algorithm relies on pixel projection or Hough transform to determine the exact answer in the document. These techniques rely on majority voting to approximate a predetermined shape. The performance of these systems depends on precise input from dedicated hardware. Printing and scanning OMR tables introduces artifacts that make table processing error-prone. This observation is a fundamental limitation of traditional pixel projection and Hough transform techniques. Depending on the type of artifact introduced, accuracy is affected differently. We classified the types of errors and their frequency according to the artifacts in the OMR system. As a major contribution, we propose an improved algorithm that fixes errors due to skewness. Our proposal is based on the Hough transform for improving the accuracy of bias correction mechanisms in OMR documents. As a minor contribution, our proposal also improves the accuracy of detecting markers in OMR documents. The results show an improvement in accuracy over existing algorithms in each of the identified problems. This improvement increases confidence in OMR document processing and increases efficiency when using automated OMR document processing.
Optical Mark Recognition (OMR) is a technique used for automatically reading the user input in surveys, questionnaires, interviews and examination systems. It is also called a “mark sensing” technique, in which marks of the users on paper are sensed automatically by counting the black pixels of the area inside the circle on a document. Being an automated system, it is very fast, accurate and time efficient as compared to manual evaluation of a user input. An OMR system is very important for an organization and the efficiency and accuracy of the staff is greatly increased by adopting this system [
Mark recognition technology has been studied since 1950s. In early techniques, users were required to fill circles on a document using graphite pencils, while sensing brushes were used to sense the number of graphite particles in an area. An improvement in technology came in 1970s, when dedicated OMR machines were designed to project light on special sheets [
As technology improved with the passage of time, OMR was introduced using ordinary image scanner and using the ordinary desktop PC, which was used to scan the OMR sheet and convert into digital image [
There is need of some preprocessing of OMR sheets after scanning them from optical scanner. Preprocessing of the OMR sheets involves skew correction, deviation removal and noise removal. Then by using image processing techniques on the document, filled and empty circles are identified by the algorithms. In this method OMR template can be created ordinary word processor software like Microsoft Word (Windows), Pages (Mac OS X) or Libre Office word processor (Linux) or it can also be created using image processor program like Adobe Photoshop, Gimp etc. This technique was very dynamic, changing of the template does not require changing in the hardware, and hence it is very easy to design template and use that custom designed template in OMR.
A digital image is a combination of pixels, Pixel projection method was introduced in 1999 in order to detect user input during the OMR by using ordinary scanner and a desktop PC [
Another technique in an OMR system is the use of OMR sheet image registration with the correct answer sheet image registration [
OMR can be implemented using a phone camera instead of optical image scanners [
A framework for OMR using Octave script was implemented [
Open CV (an open source image processing library) was used to extract ROIs (Regions of Interest) [
Field Programmable Gate Array (FPGA) was being to improve the speed of OMR [
Issues came up and how they affect accuracy of OMR is discussed in the literature [
In this work we have address problems, how to detect skew angle if OMR sheets have missed information? Information may be missed due to folded, torn, poorly printed and artifacts induced due to mishandling of paper during the scanning and printing process. How to perform skew correction in the above case? How to detect circles with distortion, and how to reduce the error rate with the sheet missing information? We have shown the comparison of related works with their limitations in
Year | Proposed method | Limitations |
---|---|---|
2021 [ |
Their work mostly based on Canny Edge Detector method [ |
Only smudged circles or the de-shaped circles can be corrected by their application. Their application does not deal with the correction of errors introduced during the process of printing and scanning. Their claiming accuracy is about 99.24%. |
2019 [ |
In this work they have perform quadrilateral transformation using a reference document as a guide for the transformation algorithm. | Error rate claimed by the author is 1.25%. The proposed method may be hindered due to failed or strained printing, thus increasing the error rate. Limitations mentioned by authors that their proposed method can handle skew not more than 20 degrees and translations not more than 100 pixels and also scale not more than 10% of the total image size. |
2019 [ |
Vertical bold line is included on the OMR sheet in order to perform skew correction. In their work they have focused on improving the speed of detecting and correcting skew in the document. | A bold vertical bar needs to be added in the template. Their focus is to optimize speed of the process regardless of the accuracy of the process. If we don’t have vertical line printing on the sheet, we are not able to adopt their method. The whole process depends on the line, if that line got artifact during printing or scanner, whole sheet will not be readable. |
2019 [ |
Tracing pattern on the margins of the OMR sheet have been used, and also performs some image processing techniques to extract the required information from the image. | If tracing line is damaged during scanning or printing, document will not be readable by the software. No discussion of translation and scaling issues. |
2018 [ |
A new method for detection of filled circles was introduced in this work. Artifacts introduced during printing and scanning process were also addressed in this work. | Skew detection, correction, scaling issues and translation issues are not discussed. Their paper only focused to discuss detection of mark inside the bubble. For a sheet to process accurately skew detection and correction is the very first step we need to do. A sheet with skewed angle will have poor accuracy unless we correct the skew angle by an algorithm. |
2020 [ |
Canny edge detector was being used in order find contours in the OMR image in this work. | Accuracy of their proposed method is a question as they didn’t discuss accuracy of their experiments in their paper. They also did not discuss how to handle skew, translations, scaling issues and artifacts introduced during printing the OMR sheets. |
This paper is organized in such a way that introduction is described in Section 1. Complete process of OMR system and the possible reasons of the issues coming in the OMR system is explained in Section 2. Two of the most popular methods of OMR with minor variations in the literature are explained in Subsections 2.6 and 2.7. Section 3 is explaining the proposed system. Section 4 will discuss how the experiments are performed and their results.
In this section we will discuss the complete process of an OMR system in detail. Following steps are involved during the process of an OMR
In first step OMR template is created by using word processor software or an image processor program like Adobe Photoshop, Gimp etc. Printing press or a printer can be used to print OMR sheets. One of the basic requirements of OMR template in literature is to add special shapes at four corners of the OMR document. This shape can be angle bracket, solid square, thick circles, or a special pattern. Instead of four corners, dotted border or thick border can also be used. These corner points are used in stage D or during the processing of images. Usually, size of bubble is 30 pixels, which is not mandatory.
We noticed two types of issues during the printing of OMR sheets at larger scale. Special quality control is required when printing at large scale. Due to normal wear and tear in the mechanical parts of the printer and rolling mechanism in the printer, some deviation is introduced in the printed OMR sheets. Current methods of OMR in literature need printing to be pixel perfect for an acceptable level of accuracy in the OMR system. When printing OMR sheets at larger scale printer’s drum may get dirty and hence leaving noise or lines on the OMR sheet. Another issue we found was miss-printed areas on the OMR sheets when printing at large scale.
In this step, the printed sheets are ready to distribute to the users to fill the marks. The circles are filled by students. There are three possible cases where student can fill the circles.
All circles are filled by student. Circles are not filled completely (partially filled circles) Circles are not filled accurately; the mark either moved to right or left side of circle. All circles are empty by students. Normal filled circles by students.
After filling the marks by users, sheets are collected and send to the scanners where operators will scan the sheets and convert them to digital images. OMR sheets are handled by humans, so OMR sheets can be folded and torn during the whole process of OMR. Current research methods/techniques are not able to process those OMR sheets that are having sufficient information to read the marks.
In this step, some preprocessing of images is done. First of all, skew angle is detected and later on corrected. After skew angle correction the image is converted into black and white using a threshold value i.e., 127. Image is converted into black and white to reduce the complexity of the processing of images. Performance of the algorithm also increased due to this conversion. Some other corrections will also be done at this stage like deviation correction, noise removal.
Current research methods used in the OMR system are not able to identify skew angle and hence they can’t correct the skew angle, in case if the OMR document is worn out or tilted or it if the OMR sheet is skewed more than an acceptable angle. The document that cannot be detected by the skew detection mechanism used in OMR system is not able to extract student’s responses from the document. Most of the researchers have used skew detection, and calculation method as shown in
Let
A digital image is a collection of pixels horizontally and vertically aligned. Each pixel is denoted by an address like P (X, Y), where P is representing the pixel and X is the number of pixels on X-Axis and Y is the number of pixels on Y-Axis.
During the processing of OMR sheet, current research methods scans anchor points (circles or angle brackets) in the image and their address is used by the algorithm as a starting point. These anchor points are used in skew detection and correction mechanism.
At this step, location of the circle in the image is scanned by the algorithm in order to determine the actual response of the user. The accuracy of the OMR system actually depends on this step, how the algorithm efficiently calculates the position of the circle. If an algorithm scans an area which is not a circle, however it is supposed to be, then the accuracy of the OMR system is compromised.
If we classify the methods used in OMR system, there are two methods for calculating the positions of the circles are being used in the literature.
First method basically uses predetermined positions of the circles. In this method position of all circles are stored in a file at the time of creating a new template. This position is stored as P (X, Y) in the file, where X is the number of pixels on horizontal axis, and Y is the number of pixels on vertical axis. Positions of these circles can be stored in XML (Extensible Markup Language) file, JSON (JavaScript Object Notation) file or it can also be store in database. The position P (X, Y) is relative to the anchor point. In most of the cases top left anchor is used for the relative positioning. For example, a circle position is P (10,15). This means that the distance of circle from top left anchor point is 10 pixels on X axis and the distance of circle from top left anchor point is 15 pixels on Y axis.
At the time of processing of an individual circle, position of the four anchor points on the corners of the document are search by algorithm. Pattern matching techniques are used for this purpose. The processing algorithm will consider top left anchor as the starting point and the position of top left anchor is considered as P (0, 0). In this case, the readability of the OMR system heavily depends on the searching of anchor points. If algorithm fails to search the anchor point, further steps cannot be completed. In order to check the response on a circle from the user, position of that circle is retrieved from the file that is containing positions of all circles. Once the position of the circle is retrieved, the algorithm moves its cursor to that position and the algorithm scans
In this method of calculation of the position of circles is done by the Hough transform Algorithm. This method basically works with the pattern matching technique. Usually in this method whole image is searched for the circles with 30 pixels diameter. This method does not require storing predetermined positions of the circles in a file. All positions of the circles found by the Hough transform Algorithm is stored in memory for further calculations. Once position of all circles in the image are retrieved by the algorithm, it can scan number of black pixels in the circle and if number of black pixels in the circle more than given threshold value, it is considered to be filled by the student.
User response for each individual circle is calculated in the previous step. In this step circles are grouped together in the form of questions. As we mentioned before an MCQ type question typically requires 4 circles and a digit question needs 10 circles. In this step user input is calculated with respect to defined questions. In case of MCQ, user input is compared with the answer key to generate score automatically for each OMR image. In case of surveys and votes, graphs are generated and the results can be stored in database for future use.
When we use conventional methods in the literature of OMR, images having following errors are not readable or having inaccuracy in their results.
This type of error came up when one of the four anchor points are either torn due to mishandling of the papers by students or staff in the examination hall. Anchor point can also be missed due to folding of paper during the printing and the scanning process, or the anchor point can be miss print during the process of printing. Example for E1 is shown in
This type of error came up when image is tilted too much and algorithms used to find anchor points in the corners of the image will fail because the anchor point’s shapes get skewed beyond an acceptable angle or the anchor points will be missed during scan due to skew in the document.
When a document is scanned, due to mechanical nature of the scanner, a document scanned by a scanner is not pixel perfect with template. Some pixels are translated horizontally or vertically. Printing mechanism in printers is also responsible for translations in OMR system.
Translation of pixels is showing in
Artifacts can be introduced due to students, staff, faulty printer or scanner. In
Some researchers have studied to perform optical mark recognition by finding circles in the document, so if a circle is overfilled it will de-shape the circle, i.e., make it oval. In that case the circle detection algorithm failed to detect the circle and hence accuracy is reduced using the algorithms in literature. Example of overfilled/de-shaped circle is shown in
This type of error came up when a circle is not printed; this usually happens when a printer is having a defective drum or low toner. Example of miss printing is shown in
First step during the process of OMR is to find and fix the skew angle introduced during the process of printing and scanning. In our proposed system the anchor points on the four corners of the document are ignored because if one of the four anchor points are damaged, we cannot find the skew angle. The OMR document is passed through the Hough transform Algorithm. In our case the Hough transform Algorithm is used to identify the position of circles with radius of 15 pixels. After having the position of all circles in the OMR document, an algorithm is used to draw lines from center of the circles. Angle of these lines with X axis will help the algorithm to calculate skew angle using
In next step our algorithm tried to find the regions of the circles in the OMR document. A region is usually an area consisting of questions whose circles are located close to each other. In other words, circles in a particular area are grouped together. We divided our OMR document into 4 major regions. First region is for MCQs 1 to 12, second region is for MCQs 13 to 24, third region is for Paper Code, and fourth region for Questions 1 to 10 and Total Marks. Once the regions are defined, positions of circles are found by the Hough transform Algorithm that separates each region. Also, it finds top left circle, top right circle, bottom left circle and bottom right circle in each region, and hence algorithm is able to mark the boundaries of the region. We call these four circles as boundary circles. This is called region detection phase.
Position calculated for the boundary circles in a region are used as reference points for other circles in that region. Now positions of other circles are referred with respect to these four boundary circles. By doing this we have solved the problem of translation for that region. For every circle in the region, we have estimated position of that circle calculated by the algorithm using the JSON/Database file created in first step.
So there came up two types of situations that we need to handle separately.
If a circle from the Hough transform Algorithm exists within the estimated area then we are having a position of the circle having hundred percent accuracy. The estimated area is given by ± 5 pixels. If none of the circle positions are detected by the Hough transform Algorithm within the estimated area in the document, it means the circle is missed by the Hough transform Algorithm. This may happen due to miss printing, de-shaped circle when filling by the student, or the artifact introduced during the process of scanning and printing. In this case we will use the estimated position of the circle calculated by algorithm using JSON/Database. Issues of translation, miss-printing are also improved by the position of boundary circles detected by Hough transform, and hence we are having a better accuracy for the position of that circle.
Once the position of each circle is calculated, the next step performed by the algorithm is to scan the area of 12 each circle to detect whether it is filled by the student or it is an empty circle. For this purpose, our algorithm calculates number of black pixels in that circle. If number of black circles is more than 70 percent of the area of circle in pixels, the algorithm considers the circle to be filled, and if number of black pixels is less than 70 percent then it is considered as empty circle. In the final step, our algorithm matches the student’s filled circles with the correct key of the questions and hence the score for that student is calculated and stored in database.
We have performed experiments on 86,913 sheets filled by the students during an examination. To perform our experiments, we have scanned OMR documents using high speed Canon image FORMULA DR-G1100 scanner, which can scan at more than 200 pages per minute in grey scale mode. Feeding to scanner was in bundles for about 250 sheets per bundle which were not properly aligned in order to test the accuracy of the system. To compare the accuracy of the OMR document we have processed each document through Algorithm 1, Algorithm 2 and finally through our proposed Algorithm 3. Algorithms were implemented using Open CV image library and using command line PHP scripts. A flavor of Linux (Ubuntu) operating system was used during the experiments. Core i3 3rd generation CPU was used coupled with 8 GB of RAM.
An OMR document is marked as invalid if sum of number of correct MCQs is not equal to MCQs or the or if sum of Question 1 to Question 10 is not equal to Total Marks. In other words, an OMR document is valid only if both
Types of all possible manual errors in our experiments are given below.
If the number of true answers from MCQs is not equal to Q1. If sum of Q1 to Q10 is not equal to total marks. If two circles are filled in MCQs or any question. Circles are filled less than 70% area of a circle.
Types of possible software errors are explained in Section 3. We are more focused on software errors and hence ignoring the manual errors. The details of the results are sown in
Algorithm used | Total documents | Total errors | Percentage of software errors |
---|---|---|---|
OMR using method of predetermined positions of circles | 86,913 | 28,501 | |
OMR using method of Hough transform | 86,913 | 23,122 | |
Proposed algorithm | 86,913 | 17,225 |
Our proposed system is divided into two parts, in the first part of our proposed system, we introduce a new mechanism for bias correction in OMR documents. We use the Hough transform algorithm to correct skew in the document. The second part of our proposed system is to compute the exact position of the circle in the OMR document so that accurate user input can be identified. In this part of the proposed system, we introduce the method of the Hough transform algorithm and a fallback method for pixel projection in case of errors to improve the accuracy of calculating circle positions. The results show that we obtain a minimum error rate of 7.22% using the existing algorithms in the dataset. The proposed algorithm improves the accuracy of the system. By using the proposed algorithm, we can achieve a maximum error rate of 0.44%. This is a huge improvement over the OMR system. By using our proposed algorithm, the OMR system can be more accurate, reliable and practical. The software error in our proposed system is 0.44%. If the algorithm is fine-tuned later, the accuracy of the OMR system can be improved to 100%. This work can also be ported to mobile applications, allowing higher-precision OMR systems to be used with cell phone cameras.
The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project number (RSP2022R426), King Saud University, Riyadh, Saudi Arabia.