Open Access iconOpen Access

ARTICLE

Text-Image Feature Fine-Grained Learning for Joint Multimodal Aspect-Based Sentiment Analysis

by Tianzhi Zhang1, Gang Zhou1,*, Shuang Zhang2, Shunhang Li1, Yepeng Sun1, Qiankun Pi1, Shuo Liu3

1 School of Data and Target Engineering, Information Engineering University, Zhengzhou, 450001, China
2 Information Engineering Department, Liaoning Provincial College of Communications, Shenyang, 110122, China
3 School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450000, China

* Corresponding Author: Gang Zhou. Email: email

Computers, Materials & Continua 2025, 82(1), 279-305. https://doi.org/10.32604/cmc.2024.055943

Abstract

Joint Multimodal Aspect-based Sentiment Analysis (JMASA) is a significant task in the research of multimodal fine-grained sentiment analysis, which combines two subtasks: Multimodal Aspect Term Extraction (MATE) and Multimodal Aspect-oriented Sentiment Classification (MASC). Currently, most existing models for JMASA only perform text and image feature encoding from a basic level, but often neglect the in-depth analysis of unimodal intrinsic features, which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features. Given this problem, we propose a Text-Image Feature Fine-grained Learning (TIFFL) model for JMASA. First, we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text, which addresses the context interference problem of identifying different aspect terms. Then, the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive, which addresses the ambiguous semantic extraction problem during image feature learning. Thereby, the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced. Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA, MATE and MASC, thus validating the effectiveness of our proposed methods.

Keywords


Cite This Article

APA Style
Zhang, T., Zhou, G., Zhang, S., Li, S., Sun, Y. et al. (2025). Text-image feature fine-grained learning for joint multimodal aspect-based sentiment analysis. Computers, Materials & Continua, 82(1), 279-305. https://doi.org/10.32604/cmc.2024.055943
Vancouver Style
Zhang T, Zhou G, Zhang S, Li S, Sun Y, Pi Q, et al. Text-image feature fine-grained learning for joint multimodal aspect-based sentiment analysis. Comput Mater Contin. 2025;82(1):279-305 https://doi.org/10.32604/cmc.2024.055943
IEEE Style
T. Zhang et al., “Text-Image Feature Fine-Grained Learning for Joint Multimodal Aspect-Based Sentiment Analysis,” Comput. Mater. Contin., vol. 82, no. 1, pp. 279-305, 2025. https://doi.org/10.32604/cmc.2024.055943



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 203

    View

  • 79

    Download

  • 0

    Like

Share Link