Open Access iconOpen Access

REVIEW

crossmark

A Comprehensive Survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and Applications

by Tianzhe Jiao, Chaopeng Guo, Xiaoyue Feng, Yuming Chen, Jie Song*

Software College, Northeastern University, Shenyang, 110819, China

* Corresponding Author: Jie Song. Email: email

Computers, Materials & Continua 2024, 80(1), 1-35. https://doi.org/10.32604/cmc.2024.053204

Abstract

Multi-modal fusion technology gradually become a fundamental task in many fields, such as autonomous driving, smart healthcare, sentiment analysis, and human-computer interaction. It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities. Under complex scenes, multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions. However, achieving outstanding performance is challenging because of equipment performance limitations, missing information, and data noise. This paper comprehensively reviews existing methods based on multi-modal fusion techniques and completes a detailed and in-depth analysis. According to the data fusion stage, multi-modal fusion has four primary methods: early fusion, deep fusion, late fusion, and hybrid fusion. The paper surveys the three major multi-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields. Finally, it discusses the challenges and explores potential research opportunities. Multi-modal tasks still need intensive study because of data heterogeneity and quality. Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology. Invalid data fusion methods may introduce extra noise and lead to worse results. This paper provides a comprehensive and detailed summary in response to these challenges.

Keywords


Cite This Article

APA Style
Jiao, T., Guo, C., Feng, X., Chen, Y., Song, J. (2024). A comprehensive survey on deep learning multi-modal fusion: methods, technologies and applications. Computers, Materials & Continua, 80(1), 1-35. https://doi.org/10.32604/cmc.2024.053204
Vancouver Style
Jiao T, Guo C, Feng X, Chen Y, Song J. A comprehensive survey on deep learning multi-modal fusion: methods, technologies and applications. Comput Mater Contin. 2024;80(1):1-35 https://doi.org/10.32604/cmc.2024.053204
IEEE Style
T. Jiao, C. Guo, X. Feng, Y. Chen, and J. Song, “A Comprehensive Survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and Applications,” Comput. Mater. Contin., vol. 80, no. 1, pp. 1-35, 2024. https://doi.org/10.32604/cmc.2024.053204



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1993

    View

  • 811

    Download

  • 0

    Like

Share Link