Komal Rani Narejo1, Hongying Zan1,*, Kheem Parkash Dharmani2, Orken Mamyrbayev3,*, Ainur Akhmediyarova4, Zhibek Alibiyeva4, Janna Alimkulova5
CMC-Computers, Materials & Continua, Vol.84, No.2, pp. 3407-3429, 2025, DOI:10.32604/cmc.2025.065872
- 03 July 2025
Abstract While automatic image captioning systems have made notable progress in the past few years, generating captions that fully convey sentiment remains a considerable challenge. Although existing models achieve strong performance in visual recognition and factual description, they often fail to account for the emotional context that is naturally present in human-generated captions. To address this gap, we propose the Sentiment-Driven Caption Generator (SDCG), which combines transformer-based visual and textual processing with multi-level fusion. RoBERTa is used for extracting sentiment from textual input, while visual features are handled by the Vision Transformer (ViT). These features are More >