Open Access
ARTICLE
RoBGP: A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
1 School of Information Science and Technology, Beijing Forestry University, Beijing, 100083, China
2 Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing, 100083, China
3 Division of Biostatistics, Department of Population Health, Grossman School of Medicine, New York University, New York, 10016, USA
* Corresponding Author: Dongmei Li. Email:
# These authors contribute equally to this work
(This article belongs to the Special Issue: Transfroming from Data to Knowledge and Applications in Intelligent Systems)
Computers, Materials & Continua 2024, 78(3), 3603-3618. https://doi.org/10.32604/cmc.2024.047321
Received 01 November 2023; Accepted 12 January 2024; Issue published 26 March 2024
Abstract
Named Entity Recognition (NER) stands as a fundamental task within the field of biomedical text mining, aiming to extract specific types of entities such as genes, proteins, and diseases from complex biomedical texts and categorize them into predefined entity types. This process can provide basic support for the automatic construction of knowledge bases. In contrast to general texts, biomedical texts frequently contain numerous nested entities and local dependencies among these entities, presenting significant challenges to prevailing NER models. To address these issues, we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer (RoBGP). Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors. It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information, effectively addressing the issue of long-distance dependencies. Furthermore, the Global Pointer model is employed to comprehensively recognize all nested entities in the text. We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models. This research confirms the effectiveness of RoBGP in Chinese biomedical NER, providing reliable technical support for biomedical information extraction and knowledge base construction.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.