Samar Elbedwehy1,*, T. Medhat2, Taher Hamza3, Mohammed F. Alrahmawy3
CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 1483-1500, 2022, DOI:10.32604/cmc.2022.029313
Abstract Image captioning is an emerging field in machine learning. It refers to the ability to automatically generate a syntactically and semantically meaningful sentence that describes the content of an image. Image captioning requires a complex machine learning process as it involves two sub models: a vision sub-model for extracting object features and a language sub-model that use the extracted features to generate meaningful captions. Attention-based vision transformers models have a great impact in vision field recently. In this paper, we studied the effect of using the vision transformers on the image captioning process by evaluating… More >