This paper presents a handwritten document recognition system based on the convolutional neural network technique. In today’s world, handwritten document recognition is rapidly attaining the attention of researchers due to its promising behavior as assisting technology for visually impaired users. This technology is also helpful for the automatic data entry system. In the proposed system prepared a dataset of English language handwritten character images. The proposed system has been trained for the large set of sample data and tested on the sample images of user-defined handwritten documents. In this research, multiple experiments get very worthy recognition results. The proposed system will first perform image pre-processing stages to prepare data for training using a convolutional neural network. After this processing, the input document is segmented using line, word and character segmentation. The proposed system get the accuracy during the character segmentation up to 86%. Then these segmented characters are sent to a convolutional neural network for their recognition. The recognition and segmentation technique proposed in this paper is providing the most acceptable accurate results on a given dataset. The proposed work approaches to the accuracy of the result during convolutional neural network training up to 93%, and for validation that accuracy slightly decreases with 90.42%.
Character recognition is the field where many machine learning techniques are widely applied. The world is advancing towards paperless communication, but there are many fields where handwritten document sharing still exists in daily communication. A significant challenge in handwritten document recognition: the processing of distorted shapes of character and various writing styles. Secondly, proper segmentation techniques are required for this character-by-character processing. These handwritten words may generate many challenging tasks for researchers. The dataset consists of handwritten characters that may not necessarily be sharp enough and write perfectly in a straight line. Another issue is the curve of characters may not consistently smooth enough like printed characters. Different orientations and sizes of handwritten characters can also generate problems in processing. Finally, characters may not always be in their complete shape, which can fall into different categories and generate improper recognition in the recognition process.
Since the last few years, deep learning techniques have successfully performed their role in many fields, like, speech recognition, image and textual classification, face and facial expressions recognition, semantic-based video searching and many other areas. Many of experimented problems are re-experimented using deep learning to acquire significant results. The proposed method addresses the previously defined issues by developing an intelligent and efficient handwritten scripts recognition system. The development of such kinds of systems always demands extensive image processing and pattern recognition techniques to be part of these systems. This study reports the deep convolutional neural network technique to recognize handwritten scripts. The deep convolutional neural network consists of many hidden layers, and these layers comprise many neurons. An extensive dataset of 65000 handwritten characters’ images is first prepared to train and test the proposed approach. Then, a deep convolutional neural network is properly trained on the defined dataset, and a separate testing phase applies to these handwritten scripts to check the accuracy of recognition. This handwritten script is given to the recognition system; it demands the proper segmentation of written lines, words and characters. The proposed approach develops three kinds of segmentation algorithms: line-based segmentation, word-based segmentation, and character-based segmentation of handwritten words. These segmentation techniques will separate each character of the script that will be recognized by the deep convolutional neural network later. This recognized handwritten script is immediately converted into an electronic text document. This proposed technique will generate a valuable contribution to the field of handwritten document recognition systems.
Since the 1950s, handwriting recognition has been under investigation. For this, the application for new digital computer technology became the subject of interest. In 1968, Eden suggested the technique known as analysis-by-synthesis. In this proposed method, the author formally proved that all characters are consist of an infinite number of schematic features. As a result, many researchers have added their valuable contributions later in this field.
The work done by Grimsdale and Bullingham describes how the process of handwriting recognition can be simplified and speeded up. In this paper, the technology of the flying-spot uses a high-resolution scanner as a spot of light to read or scan an image [
In [
In research [
The proposed approach describes the effects of changing the models of Artificial Neural Networks to recognize the characters in the input document [
In [
In [
Image acquisition is the primary phase of a handwritten character recognition system. The proposed recognition system takes a scanned handwritten character image as an input. To start the process, the user should upload the image of the handwriting
Pre-processing is used to increase the quality of the image. It involves the following operations.
The original image is firstly converted into a greyscale image [
Noise means unwanted information which disturbs the quality of the image. It means pixels in the image have different intensity values than the actual pixel values. Filters are the way in image processing to eliminate noise from the input image. For example, using a median filter s an efficient technique used to remove salt and pepper noise.
This proposed method is using a median filter to remove salt and pepper noise [
It is a part of the pre-processing process. This will convert a gray-scale image into a black and white image [
Segmentation of characters is a crucial step in handwriting recognition because it directly affects the accuracy of the system [
In the line segmentation, the input image has dark background pixels and white foreground pixels. Therefore, there is a possibility that text can touch the top and bottom lines. Removing such errors requires first pad the image to make a black space at the top and bottom in the image. Then, it will help to calculate the dark centroids in the image. The method proposed for line segmentation is based on the idea of projection problem taken from the Algorithm 2.1 “Projection Profile-Based Algorithm” presented in [
In the skew correction, the algorithm “Skew Detection using Center of Gravity” presented in [
The Line segmentation further demands the segmentation of words. In the word segmentation, morphology is applied to the image [
In handwritten document generation, sometimes the negative or positive slant occur in the written words, which demands the slant correction [ Cursive character segmentation Untouched character segmentation
In cursive segmentation, the characters are segmented from the word image of cursive handwriting. In this segmentation, the main challenge is to avoid miss-segmentation and over-segmentation. Mis-segmentation means the characters that had to be segmented are not segmented properly. Over-segmentation means that a single character is segmented as two characters, just like ‘m’ can be segmented as two n’s, and we can be segmented as two v’s. This problem can be avoided through the combination of algorithms presented in [
In the first step, the word image is skeletonized, and the vertical projection of the word is calculated by summing all the columns. The vertical projection can determine ligatures between characters as they will have only one foreground pixel in the perspective column. Sometimes, over-segmentation can occur in these characters ‘m’, ‘n’, ‘u’, ‘v’, ‘w’, etc. The over-segmentation of ‘m’ and ‘n’ can be avoided through the midpoint of the height of the image is calculated. The image is scanned vertically for each column; if the white pixel finds, its position is determined. In that position, if the pixel is below the midpoint of height, then that column may be a segmentation column. In the end, if the sum of that column is more significant than one, then this column is discarded. In another case, it may not be the joining point, and it is stored as a potential segmentation column. The over-segmentation problem for these characters (‘u’, ‘v’, ‘w’) can be avoided through the distance-based approach. If the distance between this column and the previous segmentation column is less than the given threshold value, then it is discarded, and this process is repeated for the next columns.
Feature extraction is the process of detecting the features of interest from an image and storing them for further processing. In image processing, feature extraction is a critical step that allows moving from pictorial to data representation. The proposed work is using a Convolutional Neural Network for feature extraction [
The proposed method is using a Convolutional neural network (CNN). CNN contain neurons with learnable weights and biases. CNN can contain multiple layers, which are also known as deep learning. CNN is a feed-forward neural network that can contain one or more convolutional layers. One or more fully connected layers follow CNN, just like in a simple multiple layer neural network. The architecture of CNN is modeled so that it can use the 2D structure of the image as an input of a Neural Network. The proposed configuration of the neural network is using the back-propagation technique for the training of a CNN. Traditional CNN consist of layers defines in
The configuration of the neural network then demands the training of this neural network. The overall dataset set is split into two sections, training data, and validation data to achieve this step. The sample data for training consist of handwritten images of different writing styles. The dataset of 65000 handwritten characters is prepared for the training and validation of the neural network. There are 2500 sample images of each alphabet, and the total sample images are 2500 * 26 = 65000. Sample image is a gray-scale image that has a white foreground and black background. The neural network is trained by the initial learning rate of 0.005. The maximum training accuracy of CNN is recorded as 97%. The trained neural network is saved, and then it is used for the recognition process.
For image recognition, the neural network comprises four primary operations defined in the next sections.
CNN depends on the following things:
In the convolutional layer, the convolution (•) is the dot product between inputs image M and the filter matrix N. The output of the whole process is a convoluted features matrix represented as
The mathematical model of the ReLU function consists of the piecewise nonlinearity operator that defines the maximum output indication. Thus, the ReLU function is represented as ReLU(•). The output of the ReLU function will be the rectified feature map,
Mathematically the max-pooling function can be Pool (•) defined as shown in
The softmax function Softmax(•), is a multiclass classifier. The ith probabilistic output of that function can be calculated using
Matlab 2017b is being used to find the overall accuracy of the proposed work and dataset. In this study, the accuracy of the proposed method and dataset is demonstrated by using the CNN algorithm.
The application view is defined in
Character | Test parameters | |||
---|---|---|---|---|
Number of instances | Correctly |
Over |
Miss |
|
A | 50 | 45 (90%) | 3 (6%) | 2 (4%) |
B | 50 | 48 (96%) | 1 (2%) | 1 (2%) |
C | 50 | 47 (94%) | 2 (4%) | 1 (2%) |
D | 50 | 48 (96%) | 1 (2%) | 1 (2%) |
E | 50 | 46 (92%) | 1 (2%) | 3 (6%) |
F | 50 | 47 (94%) | 1 (2%) | 2 (4%) |
G | 50 | 45 (90%) | 3 (6%) | 2 (4%) |
H | 50 | 42 (84%) | 6 (12%) | 2 (4%) |
I | 50 | 35 (70%) | 2 (4%) | 13 (26%) |
J | 50 | 46 (92%) | 1 (2%) | 3 (6%) |
K | 50 | 43 (86%) | 5 (10%) | 2 (4%) |
L | 50 | 40 (80%) | 1 (2%) | 9 (18%) |
M | 50 | 35 (70%) | 13 (26%) | 2 (4%) |
N | 50 | 40 (80%) | 8 (16%) | 2 (4%) |
O | 50 | 49 (98%) | 0 (0%) | 1 (2%) |
P | 50 | 48 (96%) | 2 (4%) | 0 (0%) |
Q | 50 | 42 (84%) | 6 (12%) | 2 (4%) |
R | 50 | 41 (82%) | 7 (14%) | 2 (4%) |
S | 50 | 44 (88%) | 4 (8%) | 2 (4%) |
T | 50 | 40 (80%) | 3 (6%) | 7 (14%) |
U | 50 | 41 (82%) | 8 (16%) | 1 (2%) |
V | 50 | 40 (80%) | 8 (16%) | 2 (4%) |
W | 50 | 38 (76%) | 10 (20%) | 2 (4%) |
X | 50 | 42 (84%) | 6 (12%) | 2 (4%) |
Y | 50 | 42 (84%) | 6 (12%) | 2 (4%) |
Z | 50 | 45 (90%) | 4 (8%) | 1 (2%) |
Total average accuracy | Total average over segmented | Total average miss segmented | ||
86% | 8.615% | 5.308% | ||
Maximum accuracy |
Maximum over |
Maximum miss |
||
98% | 26% | 26% |
The solution’s average accuracy reached 90% in the proposed solution, which is an excellent result obtained in the handwritten document recognition on the given dataset. This study is helpful to the design and development of handwritten Optical Character Recognition Systems in future. The comparison between the recognition accuracy of previous studies and the proposed method is demonstrated in
In this study, the essential contribution by the authors are:
Designing a valid dataset that is used to train the systems efficiently can be trained for both printed and handwritten documents. Designing of new algorithms for line, word and character segmentation of cursive and non-cursive handwriting. Find every possible writing style for every alphabet, joining style with other alphabets in English language alphabets.
Characters | Test parameters | |||
---|---|---|---|---|
Train instances | Validation instances | Correctly recognized |
Incorrectly recognized |
|
A | 2400 | 100 | 90 (90%) | 10 (10%) |
B | 2400 | 100 | 92 (92%) | 8 (8%) |
C | 2400 | 100 | 94 (94%) | 6 (6%) |
D | 2400 | 100 | 92 (92%) | 8 (8%) |
E | 2400 | 100 | 91 (91%) | 9 (9%) |
F | 2400 | 100 | 88 (88%) | 12 (12%) |
G | 2400 | 100 | 90 (90%) | 10 (10%) |
H | 2400 | 100 | 92 (92%) | 8 (8%) |
I | 2400 | 100 | 72 (72%) | 28 (28%) |
J | 2400 | 100 | 90 (90%) | 10 (10%) |
K | 2400 | 100 | 85 (85%) | 15 (15%) |
L | 2400 | 100 | 83 (83%) | 17 (17%) |
M | 2400 | 100 | 97 (97%) | 3 (3%) |
N | 2400 | 100 | 88 (88%) | 12 (12%) |
O | 2400 | 100 | 94 (94%) | 6 (6%) |
P | 2400 | 100 | 92 (92%) | 8 (8%) |
Q | 2400 | 100 | 92 (92%) | 8 (8%) |
R | 2400 | 100 | 87 (87%) | 13 (13%) |
S | 2400 | 100 | 94 (94%) | 6 (6%) |
T | 2400 | 100 | 97 (97%) | 3 (3%) |
U | 2400 | 100 | 92 (92%) | 8 (8%) |
V | 2400 | 100 | 83 (83%) | 17 (17%) |
W | 2400 | 100 | 88 (88%) | 12 (12%) |
X | 2400 | 100 | 91 (91%) | 9 (9%) |
Y | 2400 | 100 | 86 (86%) | 14 (14%) |
Z | 2400 | 100 | 97 (97%) | 3 (3%) |
Algorithms | Training | Validation | ||
---|---|---|---|---|
Average accuracy |
Miss rate |
Average accuracy |
Miss rate |
|
Acharjya et al. [ |
92.4% | 7.6% | 94% | 16% |
Simayi et al. [ |
89.67% | 10.33% | 86.04% | 13.96% |
Sasipriyaa et al. [ |
93.4% | 6.6% | 95.36% | 4.64% |
Konkimalla et al. [ |
91.6% | 8.4% | 89.7% | 10.3% |
Honey et al. [ |
88.67% | 11.33% | 86.55% | 13.45% |
Proposed technique | 93% | 7% | 90.423% | 9.577% |
This paper proposed a Convolutional Neural Network and different segmentation approaches for Recognition of Handwritten English documents. The proposed technique is to train and test on the standard user-defined dataset prepared for the proposed system. From experimental results, it is observed that the proposed technique provides the best accuracy rate. The proposed system is currently achieving 90.42% accuracy in the validation phase. This decrease in accuracy due to many factors like a distorted stroke in writing, multiple sizes and thickness of characters, different writing styles, illumination of writing and many others. In future, the accuracy level can be further improved by modifying segmentation techniques in line, word and character segmentation and indulging more intermediate layers and filters in convolution neural networks.