Open Access
ARTICLE
Image Captioning Using Detectors and Swarm Based Learning Approach for Word Embedding Vectors
1 CSE Department, Sethu Institute of Technology, Pulloor, Kariapatti, 626115, India
2 CSE Department, National Engineering College, K.R. Nagar, Kovilpatti, 628503, India
* Corresponding Author: B. Lalitha. Email:
Computer Systems Science and Engineering 2023, 44(1), 173-189. https://doi.org/10.32604/csse.2023.024118
Received 05 October 2021; Accepted 16 December 2021; Issue published 01 June 2022
Abstract
IC (Image Captioning) is a crucial part of Visual Data Processing and aims at understanding for providing captions that verbalize an image’s important elements. However, in existing works, because of the complexity in images, neglecting major relation between the object in an image, poor quality image, labelling it remains a big problem for researchers. Hence, the main objective of this work attempts to overcome these challenges by proposing a novel framework for IC. So in this research work the main contribution deals with the framework consists of three phases that is image understanding, textual understanding and decoding. Initially, the image understanding phase is initiated with image pre-processing to enhance image quality. Thereafter, object has been detected using IYV3MMDs (Improved YoloV3 Multishot Multibox Detectors) in order to relate the interrelation between the image and the object, and then it is followed by MBFOCNNs (Modified Bacterial Foraging Optimization in Convolution Neural Networks), which encodes and provides final feature vectors. Secondly, the textual understanding phase is performed based on an image which is initiated with preprocessing of text where unwanted words, phrases, punctuations are removed in order to provide a healthy text. It is then followed by MGloVEs (Modified Global Vectors for Word Representation), which provides a word embedding of features with the highest priority towards the object present in an image. Finally, the decoding phase has been performed, which decodes the image whether it may be a normal or complex scene image and provides an accurate text by its learning ability using MDAA (Modified Deliberate Adaptive Attention). The experimental outcome of this work shows better accuracy of shows 96.24% when compared to existing and similar methods while generating captions for images.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.