Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (20)
  • Open Access

    ARTICLE

    Improving VQA via Dual-Level Feature Embedding Network

    Yaru Song*, Huahu Xu, Dikai Fang

    Intelligent Automation & Soft Computing, Vol.39, No.3, pp. 397-416, 2024, DOI:10.32604/iasc.2023.040521

    Abstract Visual Question Answering (VQA) has sparked widespread interest as a crucial task in integrating vision and language. VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions. The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively. However, it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details, which is the advantage of grid-based features. In… More >

  • Open Access

    ARTICLE

    Enhancing Multi-Modality Medical Imaging: A Novel Approach with Laplacian Filter + Discrete Fourier Transform Pre-Processing and Stationary Wavelet Transform Fusion

    Mian Muhammad Danyal1,2, Sarwar Shah Khan3,4,*, Rahim Shah Khan5, Saifullah Jan2, Naeem ur Rahman6

    Journal of Intelligent Medicine and Healthcare, Vol.2, pp. 35-53, 2024, DOI:10.32604/jimh.2024.051340

    Abstract Multi-modality medical images are essential in healthcare as they provide valuable insights for disease diagnosis and treatment. To harness the complementary data provided by various modalities, these images are amalgamated to create a single, more informative image. This fusion process enhances the overall quality and comprehensiveness of the medical imagery, aiding healthcare professionals in making accurate diagnoses and informed treatment decisions. In this study, we propose a new hybrid pre-processing approach, Laplacian Filter + Discrete Fourier Transform (LF+DFT), to enhance medical images before fusion. The LF+DFT approach highlights key details, captures small information, and sharpens… More >

  • Open Access

    ARTICLE

    Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset, Methodology and Evaluation

    Shiwen Song, Rui Zhang, Min Hu*, Feiyao Huang

    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5243-5271, 2024, DOI:10.32604/cmc.2024.050879

    Abstract Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security. Currently, with the emergence of massive high-resolution multi-modality images, the use of multi-modality images for fine-grained recognition has become a promising technology. Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples. The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features. The attention mechanism helps the model to pinpoint the key information in the image, resulting in a… More >

  • Open Access

    ARTICLE

    A Hand Features Based Fusion Recognition Network with Enhancing Multi-Modal Correlation

    Wei Wu*, Yuan Zhang, Yunpeng Li, Chuanyang Li, Yan Hao

    CMES-Computer Modeling in Engineering & Sciences, Vol.140, No.1, pp. 537-555, 2024, DOI:10.32604/cmes.2024.049174

    Abstract Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities. Additionally, it leverages inter-modal correlation to enhance recognition performance. Concurrently, the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features. Nevertheless, two issues persist in multi-modal feature fusion recognition: Firstly, the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities. Secondly, during modal fusion, improper weight selection diminishes the salience of crucial modal features, thereby diminishing the overall recognition performance. To address these two issues, we introduce an… More > Graphic Abstract

    A Hand Features Based Fusion Recognition Network with Enhancing Multi-Modal Correlation

  • Open Access

    ARTICLE

    Fake News Detection Based on Text-Modal Dominance and Fusing Multiple Multi-Model Clues

    Lifang Fu1, Huanxin Peng2,*, Changjin Ma2, Yuhan Liu2

    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4399-4416, 2024, DOI:10.32604/cmc.2024.047053

    Abstract In recent years, how to efficiently and accurately identify multi-model fake news has become more challenging. First, multi-model data provides more evidence but not all are equally important. Secondly, social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical. Unfortunately, existing approaches fail to handle these problems. This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues (TD-MMC), which utilizes three valuable multi-model clues: text-model importance, text-image complementary, and text-image inconsistency. TD-MMC is… More >

  • Open Access

    ARTICLE

    Generative Multi-Modal Mutual Enhancement Video Semantic Communications

    Yuanle Chen1, Haobo Wang1, Chunyu Liu1, Linyi Wang2, Jiaxin Liu1, Wei Wu1,*

    CMES-Computer Modeling in Engineering & Sciences, Vol.139, No.3, pp. 2985-3009, 2024, DOI:10.32604/cmes.2023.046837

    Abstract Recently, there have been significant advancements in the study of semantic communication in single-modal scenarios. However, the ability to process information in multi-modal environments remains limited. Inspired by the research and applications of natural language processing across different modalities, our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos. Specifically, we propose a deep learning-based Multi-Modal Mutual Enhancement Video Semantic Communication system, called M3E-VSC. Built upon a Vector Quantized Generative Adversarial Network (VQGAN), our system aims to leverage mutual enhancement among different modalities by using text as the main More >

  • Open Access

    ARTICLE

    Explainable Conformer Network for Detection of COVID-19 Pneumonia from Chest CT Scan: From Concepts toward Clinical Explainability

    Mohamed Abdel-Basset1, Hossam Hawash1, Mohamed Abouhawwash2,3,*, S. S. Askar4, Alshaimaa A. Tantawy1

    CMC-Computers, Materials & Continua, Vol.78, No.1, pp. 1171-1187, 2024, DOI:10.32604/cmc.2023.044425

    Abstract The early implementation of treatment therapies necessitates the swift and precise identification of COVID-19 pneumonia by the analysis of chest CT scans. This study aims to investigate the indispensable need for precise and interpretable diagnostic tools for improving clinical decision-making for COVID-19 diagnosis. This paper proposes a novel deep learning approach, called Conformer Network, for explainable discrimination of viral pneumonia depending on the lung Region of Infections (ROI) within a single modality radiographic CT scan. Firstly, an efficient U-shaped transformer network is integrated for lung image segmentation. Then, a robust transfer learning technique is introduced… More >

  • Open Access

    ARTICLE

    Multi-Modal Military Event Extraction Based on Knowledge Fusion

    Yuyuan Xiang, Yangli Jia*, Xiangliang Zhang, Zhenling Zhang

    CMC-Computers, Materials & Continua, Vol.77, No.1, pp. 97-114, 2023, DOI:10.32604/cmc.2023.040751

    Abstract Event extraction stands as a significant endeavor within the realm of information extraction, aspiring to automatically extract structured event information from vast volumes of unstructured text. Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data. Although researchers have proposed various methods to accomplish this task, most existing event extraction models cannot address these challenges because they are only applicable to text scenarios. To solve the above issues, this paper proposes a multi-modal event extraction method based on… More >

  • Open Access

    ARTICLE

    Multi-Modal Scene Matching Location Algorithm Based on M2Det

    Jiwei Fan, Xiaogang Yang*, Ruitao Lu, Qingge Li, Siyu Wang

    CMC-Computers, Materials & Continua, Vol.77, No.1, pp. 1031-1052, 2023, DOI:10.32604/cmc.2023.039582

    Abstract In recent years, many visual positioning algorithms have been proposed based on computer vision and they have achieved good results. However, these algorithms have a single function, cannot perceive the environment, and have poor versatility, and there is a certain mismatch phenomenon, which affects the positioning accuracy. Therefore, this paper proposes a location algorithm that combines a target recognition algorithm with a depth feature matching algorithm to solve the problem of unmanned aerial vehicle (UAV) environment perception and multi-modal image-matching fusion location. This algorithm was based on the single-shot object detector based on multi-level feature… More >

  • Open Access

    ARTICLE

    DCRL-KG: Distributed Multi-Modal Knowledge Graph Retrieval Platform Based on Collaborative Representation Learning

    Leilei Li1, Yansheng Fu2, Dongjie Zhu2,*, Xiaofang Li3, Yundong Sun2, Jianrui Ding2, Mingrui Wu2, Ning Cao4,*, Russell Higgs5

    Intelligent Automation & Soft Computing, Vol.36, No.3, pp. 3295-3307, 2023, DOI:10.32604/iasc.2023.035257

    Abstract The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms. Image and text descriptions added to the knowledge graph enrich the node information, which accounts for the advantage of the multi-modal knowledge graph. In the field of cross-modal retrieval platforms, multi-modal knowledge graphs can help to improve retrieval accuracy and efficiency because of the abundant relational information provided by knowledge graphs. The representation learning method is significant to the application of multi-modal knowledge graphs. This paper proposes a distributed collaborative vector retrieval platform (DCRL-KG) using… More >

Displaying 1-10 on page 1 of 20. Per Page