Open Access
ARTICLE
Recurrent Convolutional Neural Network MSER-Based Approach for Payable Document Processing
1 Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia
2 Ainfinity Algorythma, Abu Dhabi, United Arab Emirates
3 Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
4 Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan
* Corresponding Author: Ali Mustafa Qamar. Email:
(This article belongs to the Special Issue: Emerging Trends in Artificial Intelligence and Machine Learning)
Computers, Materials & Continua 2021, 69(3), 3399-3411. https://doi.org/10.32604/cmc.2021.018724
Received 19 March 2021; Accepted 20 April 2021; Issue published 24 August 2021
Abstract
A tremendous amount of vendor invoices is generated in the corporate sector. To automate the manual data entry in payable documents, highly accurate Optical Character Recognition (OCR) is required. This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement. For text localization, the maximally stable extremal region is used, which extracts a word or digit chunk from an invoice. This chunk is later passed to the deep learning model, which performs text recognition. The deep learning model utilizes both convolution neural networks and long short-term memory (LSTM). The convolution layer is used for extracting features, which are fed to the LSTM. The model integrates feature extraction, modeling sequence, and transcription into a unified network. It handles the sequences of unconstrained lengths, independent of the character segmentation or horizontal scale normalization. Furthermore, it applies to both the lexicon-free and lexicon-based text recognition, and finally, it produces a comparatively smaller model, which can be implemented in practical applications. The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model. The model is thus generic and can be used for other similar recognition scenarios.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.