Individuals with special needs learn more slowly than their peers and they need repetitions to be permanent. However, in crowded classrooms, it is difficult for a teacher to deal with each student individually. This problem can be overcome by using supportive education applications. However, the majority of such applications are not designed for special education and therefore they are not efficient as expected. Special education students differ from their peers in terms of their development, characteristics, and educational qualifications. The handwriting skills of individuals with special needs are lower than their peers. This makes the task of Handwriting Recognition (HWR) more difficult. To overcome this problem, we propose a new personalized handwriting verification system that validates digits from the handwriting of special education students. The system uses a Convolutional Neural Network (CNN) created and trained from scratch. The data set used is obtained by collecting the handwriting of the students with the help of a tablet. A special education center is visited and the handwritten figures of the students are collected under the supervision of special education teachers. The system is designed as a person-dependent system as every student has their writing style. Overall, the system achieves promising results, reaching a recognition accuracy of about 94%. Overall, the system can verify special education students’ handwriting digits with high accuracy and is ready to integrate with a mobile application that is designed to teach digits to special education students.
Handwriting Digit Recognition (HDR), verification, and identification are frequently confused. It is a common subject that implies both digit identification and verification. On one side, the goal of handwriting digit identification is to identify a digit based on the written digit. On the other side, handwriting digit verification is a kind of one-to-one matching that is related to validating a requested identity according to the written digit, and either accepting or rejecting the identity claim. Although recognizing a handwritten digit is easy for humans, it is still a challenge for the machine because everyone’s handwriting is unique and different [
HDR is an area of machine learning, in which a machine is trained to identify handwritten digits [
In the literature, handwriting digit verification studies using CNN have been rarely encountered. However, there are many studies in different fields that verify the correctness of an image, face, fingerprint, or traffic sign using CNN. For example, in the field of biometrics in 2019, a study proposed by Mohapatra, was performed Offline Handwritten Signature Verification using CNN [
Most children who need special education, do not have the perceptual-motor and intellectual potential for handwriting. Also, many special education teachers have not been trained to teach handwriting or they cannot create enough time for each individual. This causes students’ handwriting to differ from normal individuals, especially in terms of conformity with the slope, format, or writing rules. Because of these reasons, most of them cannot learn good handwriting. Also, while developing an HDR, application for such individuals, it may be more appropriate to work with a dataset prepared for special education. There is a Mobile Device Application Design with Handwriting Recognition to make learning easy for students who have learning disabilities by Yılmaz in 2014 [
The implemented application in this study is handwritten digit verification. Many people think of HDR and handwriting digit verification as different problems and many of them think the same. The answer is that the idea behind both is the same, just the application area is different. Although it is explained in more detail below, the difference between these two methods can be briefly explained as follows:
Handwriting digit verification answers: Is this digit x?
Whereas, HDR answers: What is this digit?
HDR which is also known as digit identification is the process of conversion of handwritten digits into machine-readable form. The image of the written digit may be recognized online or offline. Online character recognition is the process of recognizing handwriting recorded with a digitizer as a time sequence of pen coordinates. Off-line HWR is the process of recognizing a scanned digit and is stored digitally in the greyscale format as in the proposed system [
It can be considered a sub-branch of HDR. As the name suggests, the system tries to authenticate a handwritten digit. It is a one-to-one comparison. Here, the purpose of the system is not to guess the entered number correctly but to verify whether the entered number is correct. As seen in
The proposed method is based on the CNN model and it is person-dependent. CNN is a class of deep learning and is often used in analyzing visual information. It is a model created based on the functioning of the human brain. Generally, a CNN consists of an input layer, multiple hidden layers, and one output layer. Inside the structure of hidden layers have convolutional layers, activation function layers, pooling layers, fully connected layers, and normalization layers. If we compare it with other classification algorithms, CNN needs much less preprocessing with better results as the number of training increases [
The algorithm has been developed for a special education application that supports the teacher during the teaching of the digits. This algorithm aims to recognize the digit written by a user with high percentage accuracy. In one section of the application, the user should answer by writing the question. Because of that, before starting to learn a digit, each student’s handwriting samples are collected as input. For example, if digit 2 is to be learned, the student is directed by the application to write down this digit several times. These data use in the designed CNN together with the student’s handwriting while learning the mentioned digit in the application. What the student writes at the beginning of the application becomes a training sample, and what the student writes during the application becomes a test sample.
The person-dependent training is done in this study. That means each subject must have a separate CNN. If there are n subjects, there should be nx10 CNN architectures. This is the result of having 10 different digits (0–9) to verify. Below, in
In this section, the structure of the architecture will be explained in more detail. First of all, some pre-processing operations are done on the images, like resizing images, etc. When necessary pre-processing is done, data can be fed to the system. This part is will be explained in the dataset section. A CNN consists of several layers, and each layer transforms a volume of activations to the other across a differentiable function. We used 7 different layers and a total of 15 layers while building the network. These are, convolutionLayer, poolingLayer, reluLayer, softmaxLayer, classificationLayer imageInputLayer and batchNormalizationLayer. The order of layers is finalized and the architecture of the system is shown in
Because it works as a feature extractor, the initial block builds the feature of that kind of Neural Network (NN). To do so, it makes template matching by implementing convolution filtering operations. The initial layer filters the image with some convolution kernels and returns “feature maps”, which are then normalized with an activation function and/or resized. First of all, we need to give input to the system. So, it is used as an input layer as imageInputLayer to input 2-D images to a network and to apply data normalization. Its dimension is 100 × 186 × 1 with ‘zero center’ normalization. Here 100 × 186 is the resolution of the input image and 1 represents a grayscale image. Then we put a Convolution layer to put the input images through a set of convolutional filters, each of which activates certain features from the images. Here the number of filters is 8 and the filter size is 3 × 3. Also, convolutions with stride [1 1] and padding [0 0 0 0]. After convolution, to normalize each input channel across a mini-batch we put a batch normalization layer. Which is used to speed up the training of CNNs and reduce the sensitivity to network initialization by putting it between convolutional and relu layers. Our fourth layer is reluLayer as the activation layer. A ReLU layer performs a threshold operation on each element of the input, where any value less than zero is set to zero as shown in
After that, we need to make down-sampling by dividing the input into rectangular pooling regions and computing the maximum of each region by adding a max-pooling layer with pool size [2 2] and stride [2 2]. Layers other than the input layer are repeated in the second block. Here, the only difference is the convolution layer. Its parameters are different than the first one. This time is created a 2-D convolutional layer with 32 filters of size [3 3] and ‘same’ padding. At training time, the software calculates and sets the size of the zero paddings so that the layer output has the same size as the input. In the last block convolution layer like the second block, batchNormalizationLayer and reluLayer are repeated. Then we put a fully connected layer after the reluLayer to multiply the input by a weight matrix and add a bias vector with an output size of 2. The last two layers are output layers. The first one is the softmax layer to apply a softmax function to the input as shown in
The last one is the classificationLayer to compute the cross-entropy loss for multi-class classification problems with mutually exclusive classes.
Since supervised learning is applied in this study, a labeled dataset is required for training [
After collecting sample forms as shown in
To implement our CNN, we used ConvNet which is a network architecture for deep learning at MATLAB. ConvNet shows the building blocks of CNN by using MATLAB functions, assuring routines for computing linear convolutions with filter banks, feature pooling, and much more. Hereby, ConvNet enables rapid prototyping of new CNN architectures; nevertheless, ConvNet promotes effective computation on the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) consenting to train complicated systems on big datasets. CNN is the current state-of-art architecture for the image classification task. We trained our person-dependent handwriting digit verification system with the pixel images small enough not to slow the system down, but large enough not to reduce the recognition rate. There are 400 samples which are 360 from training and 40 from testing by a training dataset separately. While doing this, we created two classes, true and false. We put the training samples that the system will train in the false class, and in the true class the samples that the system will test. For example, if we aim to find out whether the written number is 5, in the true part we put the samples of 5’s that were taken from the user as shown in
Our system was trained using the Stochastic Gradient Descent with Momentum (SGDM), with a learning rate of 0.001. The system was trained for 10 epochs and 30 iterations as is shown in
Name | Value |
---|---|
Training time | 31 s |
Number of epochs | 10 |
Validation frequency | 30 iterations |
Learning rate | 0.001 |
Among 160 test cases for 4 subjects and 10 digits, the best training and validation accuracy of our handwritten digit recognition system is 100%. Each subject’s handwriting digit from 0 to 9 is tested 4 times. In
Subject/Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
Subject 1 | 90% | 90% | 100% | 90% | 90% | 90% | 90% | 90% | 80% | 100% |
Subject 2 | 90% | 80% | 90% | 100% | 100% | 100% | 100% | 100% | 100% | 90% |
Subject 3 | 100% | 100% | 90% | 90% | 100% | 90% | 100% | 90% | 100% | 100% |
Subject 4 | 100% | 100% | 90% | 90% | 90% | 80% | 100% | 90% | 90% | 90% |
In
Subject/Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
Subject 1 | 100% | 90% | 100% | 90% | 90% | 90% | 100% | 90% | 80% | 100% |
Subject 2 | 90% | 90% | 90% | 100% | 100% | 90% | 100% | 100% | 90% | 100% |
Subject 3 | 100% | 100% | 90% | 100% | 100% | 90% | 100% | 90% | 100% | 100% |
Subject 4 | 100% | 100% | 90% | 90% | 90% | 80% | 100% | 100% | 90% | 90% |
Although some digits are not good handwriting, our system will be able to classify them correctly. The above results can be considered very successful because in a normal HDR system this success rate is even below 50% for some poorly written numbers. For example, as is shown in
The handwriting of individuals with special education needs is illegible compared to their peers. In
Many studies in the literature do HDR and many of them give very successful results. Two of the examples mentioned in the related work are “Hybrid CNN-SVM Classifier for Handwritten Digit Recognition” published in 2019 and “A Robust Handwritten Digit Recognition System Based on Sliding window with Edit distance” published in 2020. In these two studies, well-written samples were used to train and test the systems developed. However, since there are no systems designed for corrupted handwriting such as handwriting samples of special education students, we realized that there is a gap in this area, and we developed a person-dependent system for special education students to recognize often corrupted handwriting. As a result, we have achieved high success in recognizing this type of handwriting.
Authors | Method | Dataset | Accuracy |
---|---|---|---|
Ciresan et al. [ |
A handwriting digit recognition system based on a sliding window with edit distance for handwritten digit images (PCA). | 1016 train and 1016 for validation of each digit (0–9). | 97.91% |
Lauer et al. [ |
A hybrid model of a powerful Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) for recognition of handwritten digits. | 6000 train and 1000 for validation of each digit (0–9). | 99.28% |
Proposed method | The novel feature selection procedure, spectrogram transformation, and CNN with transfer learning. | 36 train and 4 for validation of each digit (0–9). | 94.5% |
Because the computer used for the test is called low-level in terms of GPU and also the system is designed for an educational application for tablets, the resolution of the images has been reduced and some system parameters have been changed to run the system quickly. For this reason, it is thought that the system will be more successful if it is tested on a better computer. Also, the training set size affects the accuracy and it increases as the number of data increases. Here we used a very small dataset for training and testing. The more data in the training set, the smaller the impact of training error and test error, and ultimately the accuracy can be improved.
In this study, a novel person-dependent handwriting verification system is proposed to verify the digits from the handwriting of the students who need special education. Because the handwriting of individuals with special needs is different from their peers, the proposed system will fill this gap in the literature as an enhanced application of handwriting verification. The learning system is employed to cover all possible translations and transformations of the images of subjects supported by person-dependent training. The system employs a CNN that is created and trained from the scratch. The dataset is accomplished by collecting the handwriting of the students with the help of a tablet under the supervision of special education teachers. The system recognizes the poor handwriting with a very high percentage with the proposed method. Overall, the system reaches an average accuracy of 94.5% and can verify the handwriting of the special education students with acceptable accuracies. It is ready to be integrated with the mobile application that we are designing that is supporting the teaching of special education students. The proposed system can be further improved by applying additional feature selection procedures and transfer learning approaches as well as re-training for one