    MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering

    Feng Yan1, Wushouer Silamu2, Yanbing Li1,*

    CMC-Computers, Materials & Continua, Vol.76, No.1, pp. 65-80, 2023, DOI:10.32604/cmc.2023.038177

    Abstract Visual question answering (VQA) requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately. However, existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images, which limits the understanding depth of the image content. The images contain more than just visual objects, some images contain textual information about the scene, and slightly more complex images contain relationships between individual visual objects. Firstly, this paper proposes a model using image description for feature enhancement. This model encodes images and their descriptions separately… More >

  • Open Access


    3D Object Detection with Attention: Shell-Based Modeling

    Xiaorui Zhang1,2,3,4,*, Ziquan Zhao1, Wei Sun4,5, Qi Cui6

    Computer Systems Science and Engineering, Vol.46, No.1, pp. 537-550, 2023, DOI:10.32604/csse.2023.034230

    Abstract LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box (BBox). However, under the three-dimensional space of autonomous driving scenes, the previous object detection methods, due to the pre-processing of the original LIDAR point cloud into voxels or pillars, lose the coordinate information of the original point cloud, slow detection speed, and gain inaccurate bounding box positioning. To address the issues above, this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++, which effectively preserves the original point cloud coordinate information. To improve the detection… More >

  • Open Access


    Profiling of Urban Noise Using Artificial Intelligence

    Le Quang Thao1,2,*, Duong Duc Cuong2, Tran Thi Tuong Anh3, Tran Duc Luong4

    Computer Systems Science and Engineering, Vol.45, No.2, pp. 1309-1321, 2023, DOI:10.32604/csse.2023.031010

    Abstract Noise pollution tends to receive less awareness compared to other types of pollution, however, it greatly impacts the quality of life for humans such as causing sleep disruption, stress or hearing impairment. Profiling urban sound through the identification of noise sources in cities could help to benefit livability by reducing exposure to noise pollution through methods such as noise control, planning of the soundscape environment, or selection of safe living space. In this paper, we proposed a self-attention long short-term memory (LSTM) method that can improve sound classification compared to previous baselines. An attention mechanism will be designed solely to… More >

  • Open Access


    A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition

    Peizhu Gong1, Jin Liu1, Zhongdai Wu2, Bing Han2, Y. Ken Wang3, Huihua He4,*

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 4203-4220, 2023, DOI:10.32604/cmc.2023.028291

    Abstract Speech emotion recognition, as an important component of human-computer interaction technology, has received increasing attention. Recent studies have treated emotion recognition of speech signals as a multimodal task, due to its inclusion of the semantic features of two different modalities, i.e., audio and text. However, existing methods often fail in effectively represent features and capture correlations. This paper presents a multi-level circulant cross-modal Transformer (MLCCT) for multimodal speech emotion recognition. The proposed model can be divided into three steps, feature extraction, interaction and fusion. Self-supervised embedding models are introduced for feature extraction, which give a more powerful representation of the… More >

  • Open Access


    Image Color Rendering Based on Hinge-Cross-Entropy GAN in Internet of Medical Things

    Hong’an Li1, Min Zhang1,*, Dufeng Chen2, Jing Zhang1, Meng Yang3, Zhanli Li1

    CMES-Computer Modeling in Engineering & Sciences, Vol.135, No.1, pp. 779-794, 2023, DOI:10.32604/cmes.2022.022369

    Abstract Computer-aided diagnosis based on image color rendering promotes medical image analysis and doctor-patient communication by highlighting important information of medical diagnosis. To overcome the limitations of the color rendering method based on deep learning, such as poor model stability, poor rendering quality, fuzzy boundaries and crossed color boundaries, we propose a novel hinge-cross-entropy generative adversarial network (HCEGAN). The self-attention mechanism was added and improved to focus on the important information of the image. And the hinge-cross-entropy loss function was used to stabilize the training process of GAN models. In this study, we implement the HCEGAN model for image color rendering… More > Graphic Abstract

  • Open Access


    Vehicle Density Prediction in Low Quality Videos with Transformer Timeseries Prediction Model (TTPM)

    D. Suvitha*, M. Vijayalakshmi

    Computer Systems Science and Engineering, Vol.44, No.1, pp. 873-894, 2023, DOI:10.32604/csse.2023.025189

    Abstract Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India. The video obtained from such surveillance are of low quality. Still counting vehicles from such videos are necessity to avoid traffic congestion and allows drivers to plan their routes more precisely. On the other hand, detecting vehicles from such low quality videos are highly challenging with vision based methodologies. In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India, which is mostly an un-attempted entity by most available sources. In this work profound Detection Transformer (DETR)… More >

  • Open Access


    An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

    Muhammad Babar Kamal1, Arfat Ahmad Khan2, Faizan Ahmed Khan3, Malik Muhammad Ali Shahid4, Chitapong Wechtaisong2,*, Muhammad Daud Kamal5, Muhammad Junaid Ali6, Peerapong Uthansakul2

    CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 5547-5562, 2022, DOI:10.32604/cmc.2022.024590

    Abstract The deep learning advancements have greatly improved the performance of speech recognition systems, and most recent systems are based on the Recurrent Neural Network (RNN). Overall, the RNN works fine with the small sequence data, but suffers from the gradient vanishing problem in case of large sequence. The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data. Generally, in speech recognition, the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities. The image is classified by the machine learning mechanism to generate a classification transcript. However, the audio… More >

  • Open Access


    WMA: A Multi-Scale Self-Attention Feature Extraction Network Based on Weight Sharing for VQA

    Yue Li, Jin Liu*, Shengjie Shang

    Journal on Big Data, Vol.3, No.3, pp. 111-118, 2021, DOI:10.32604/jbd.2021.017169

    Abstract Visual Question Answering (VQA) has attracted extensive research focus and has become a hot topic in deep learning recently. The development of computer vision and natural language processing technology has contributed to the advancement of this research area. Key solutions to improve the performance of VQA system exist in feature extraction, multimodal fusion, and answer prediction modules. There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly. In this paper, a novel feature extraction network that combines multi-scale convolution and self-attention branches to solve the above… More >

  • Open Access


    Chinese Q&A Community Medical Entity Recognition with Character-Level Features and Self-Attention Mechanism

    Pu Han1,2, Mingtao Zhang1, Jin Shi3, Jinming Yang4, Xiaoyan Li5,*

    Intelligent Automation & Soft Computing, Vol.29, No.1, pp. 55-72, 2021, DOI:10.32604/iasc.2021.017021

    Abstract With the rapid development of Internet, the medical Q&A community has become an important channel for people to obtain and share medical and health knowledge. Online medical entity recognition (OMER), as the foundation of medical and health information extraction, has attracted extensive attention of researchers in recent years. In order to further improve the research progress of Chinese OMER, LSTM-Att-Med model is proposed in this paper to capture more external semantic features and important information. First, Word2vec is used to generate the character-level vectors with semantic features on the basis of the unlabeled corpus in the medical domain and open… More >

  • Open Access


    Leverage External Knowledge and Self-attention for Chinese Semantic Dependency Graph Parsing

    Dianqing Liu1,2, Lanqiu Zhang1,2, Yanqiu Shao1,2,*, Junzhao Sun3

    Intelligent Automation & Soft Computing, Vol.28, No.2, pp. 447-458, 2021, DOI:10.32604/iasc.2021.016320

    Abstract Chinese semantic dependency graph (CSDG) parsing aims to analyze the semantic relationship between words in a sentence. Since it is a deep semantic analysis task, the parser needs a lot of prior knowledge about the real world to distinguish different semantic roles and determine the range of the head nodes of each word. Existing CSDG parsers usually use part-of-speech (POS) and lexical features, which can only provide linguistic knowledge, but not semantic knowledge about the word. To solve this problem, we propose an entity recognition method based on distant supervision and entity classification to recognize entities in sentences, and then… More >

