UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model

Jiakang Sun; Ke Chen; Xinyang He; Xu Liu; Ke Li; Cheng Peng

doi:10.32604/cmc.2025.059745

Open Access icon Open Access

ARTICLE

UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model

Jiakang Sun^1,2, Ke Chen^1,2, Xinyang He^1,2, Xu Liu^1,2, Ke Li^1,2, Cheng Peng^1,2,*

1 Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, 610213, China
2 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 101499, China

* Corresponding Author: Cheng Peng. Email: email

Computers, Materials & Continua 2025, 83(1), 219-238. https://doi.org/10.32604/cmc.2025.059745

Received 16 October 2024; Accepted 08 January 2025; Issue published 26 March 2025

Abstract

With the advancements in parameter-efficient transfer learning techniques, it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions. However, applying this technique to multimodal knowledge transfer introduces a significant challenge: ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation. This paper introduces UniTrans, a framework aimed at facilitating efficient knowledge transfer across multiple modalities. UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead. To further enhance modality alignment, we introduce two key components: the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network, specifically optimized for scenarios with extremely limited trainable parameters. Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5% of their trainable parameters. Additionally, it achieves superior performance compared to fully fine-tuned models on certain benchmarks.

Keywords

Parameter-efficient transfer learning; multimodal alignment; image captioning; image-text retrieval; visual question answering

Cite This Article

APA Style

Sun, J., Chen, K., He, X., Liu, X., Li, K. et al. (2025). Unitrans: unified parameter-efficient transfer learning and multimodal alignment for large multimodal foundation model. Computers, Materials & Continua, 83(1), 219–238. https://doi.org/10.32604/cmc.2025.059745

Vancouver Style

Sun J, Chen K, He X, Liu X, Li K, Peng C. Unitrans: unified parameter-efficient transfer learning and multimodal alignment for large multimodal foundation model. Comput Mater Contin. 2025;83(1):219–238. https://doi.org/10.32604/cmc.2025.059745

IEEE Style

J. Sun, K. Chen, X. He, X. Liu, K. Li, and C. Peng, “UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model,” Comput. Mater. Contin., vol. 83, no. 1, pp. 219–238, 2025. https://doi.org/10.32604/cmc.2025.059745

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model

Abstract

Keywords

Cite This Article

315

108

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link