Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement

Meijing Li; Runqing Huang; Xianxian Qi

doi:10.32604/cmc.2024.053630

icon Open Access

ARTICLE

Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement

Meijing Li^*, Runqing Huang, Xianxian Qi

College of Information Engineering, Shanghai Maritime University, Shanghai, 200306, China

* Corresponding Author: Meijing Li. Email: email

Computers, Materials & Continua 2024, 80(2), 2283-2299. https://doi.org/10.32604/cmc.2024.053630

Received 06 May 2024; Accepted 28 June 2024; Issue published 15 August 2024

Abstract

Chinese Clinical Named Entity Recognition (CNER) is a crucial step in extracting medical information and is of great significance in promoting medical informatization. However, CNER poses challenges due to the specificity of clinical terminology, the complexity of Chinese text semantics, and the uncertainty of Chinese entity boundaries. To address these issues, we propose an improved CNER model, which is based on multi-feature fusion and multi-scale local context enhancement. The model simultaneously fuses multi-feature representations of pinyin, radical, Part of Speech (POS), word boundary with BERT deep contextual representations to enhance the semantic representation of text for more effective entity recognition. Furthermore, to address the model’s limitation of focusing just on global features, we incorporate Convolutional Neural Networks (CNNs) with various kernel sizes to capture multi-scale local features of the text and enhance the model’s comprehension of the text. Finally, we integrate the obtained global and local features, and employ multi-head attention mechanism (MHA) extraction to enhance the model’s focus on characters associated with medical entities, hence boosting the model’s performance. We obtained 92.74%, and 87.80% F1 scores on the two CNER benchmark datasets, CCKS2017 and CCKS2019, respectively. The results demonstrate that our model outperforms the latest models in CNER, showcasing its outstanding overall performance. It can be seen that the CNER model proposed in this study has an important application value in constructing clinical medical knowledge graph and intelligent Q&A system.

Keywords

CNER; multi-feature fusion; BiLSTM; CNN; MHA

1 Introduction

The medical field has witnessed a rapid growth in medical information technology, leading to a significant focus on the informatization of Electronic Medical Record (EMR) [1]. During hospital visits, EMR are commonly utilized to record the patient’s physical health status and capture the entire process of medical diagnosis. It is an indispensable medical data resource in healthcare services, as it provides patients with reliable medical evidence, assists doctors in grasping patients’ physical health status, and supports clinical experiments and research [2]. It is usually stored as various data types, including unstructured free text that computers cannot automatically extract and recognize [3]. In order to effectively utilize the unstructured free texts, it is essential to employ entity extraction methods, such as Named Entity Recognition (NER) [4]. NER aims to extract valuable information from the text. While NER has achieved considerable success in English, Chinese Named Entity Recognition (CHNER) [5] poses greater complexity and difficulties. The complexity of CHNER is largely attributed to the abundance of homophones and the absence of clear boundaries in the language. These factors pose significant challenges, distinguishing it from other languages when it comes to recognizing named entities in Chinese text.

Previous research in CHNER has explored various approaches, including dictionary-based [6], rule-based [7], and machine learning-based methods [8]. These approaches have shown a degree of achievement in CHNER tasks. Although very accurate, these methods rely significantly on manual annotation and feature engineering, which can be a laborious and resource-intensive job [9]. With the ongoing advancements in science, technology, and computing power, there is a growing trend toward utilizing deep learning techniques in CHNER. Deep learning techniques have demonstrated superior performance across various domains. Particularly, Long Short-Term Memory (LSTM) [10] based neural network models have gained significant popularity in CHNER tasks. Among these models, the BiLSTM-CRF (Bidirectional LSTM with Conditional Random Fields) [11] model has emerged as a prominent approach and achieved noteworthy results in CHNER tasks. Nevertheless, the majority of existing CHNER models rely on character-based [12] or word-based [13] vector models. On the one hand, character-based models alone may not capture sufficient semantic information compared to word vectors. On the other hand, relying solely on unique word vector-based models may result in inadequate representation due to inaccuracies in the word-splitting tool, leading to subpar performance [14]. Moreover, existing CHNER methods frequently focus on global context information and overlook the importance of local context information, which are also essential for accurate entity recognition. Because of these reasons, CHNER models cannot fully consider semantic information when extracting the named entities from Chinese text. To solve these problems, we propose a multi-feature fusion and multi-scale local context enhancement for CNER model. Our contribution can be summarized as follows:

1. We propose a new feature extraction method based on multi-feature fusion and multi-scale local context enhancement, which comprehensively considers the multi-feature semantic of Chinese characters and simultaneously extracts deep global and local semantic information from Chinese Electronic Medical Record (CEMR) text.

2. We propose a multi-scale local context enhancement method based on multiple Convolutional Neural Networks (CNNs) with different kernel sizes to capture local contextual features from various scales, ranging from fine-grained to coarse-grained. This enables the model to delve deeper into the semantic information of the text.

3. We conduct extensive experiments on the publicly available CEMR datasets CCKS2017 and CCKS2019. The experimental results prove the validity of the model and verify the importance of each component in the model.

2 Related Work

In the initial phases of NER, dictionary-based and rule-based approaches were mainly used. The dictionary-based and rule-based methods primarily rely on manual formulation, where entities are recognized by domain experts, formulating specific rules, and then combining them with a dictionary using pattern matching. Still, due to the specificity of rule formulation and the incompleteness of the dictionary, the limitations of this method are significant, not only consuming a lot of effort and time but also not making it easy to expand the dataset. Machine learning-based approaches use supervised learning to convert NER tasks into sequence labeling tasks or classification tasks, in which the process often involves building many feature projects, typically Hidden Markov Models (HMMs) [15] models and Conditional Random Fields (CRFs) [16] models. Although this approach substantially improves over previous methods, it still requires extensive labeling by experts in the specialized domain. In addition, it still has a high time cost for training.

Deep learning technology has quickly advanced in recent years, making it the dominant strategy in NER research. Deep learning utilizes neural networks to automatically extract features of objects and has demonstrated success [17]. Huang et al. [11] proposed the BiLSTM-CRF model for sequence annotation tasks, which has significantly enhanced the accuracy of NER. Zhang et al. [18] proposed a lattice LSTM model specifically designed for NER, which incorporates the words’ meaning into the word vector model, significantly improving Chinese boundary segmentation’s ability. Wu et al. [19] proposed a network structure for NER based on CNN-LSTM-CRF, which jointly trained the NER and word segmentation models, enhancing the ability to accurately recognize entity boundaries in Chinese text. Xue et al. [20] proposed a centralized attention model that integrated BERT pre-training and collaborative learning to enhance feature representation in the parameter sharing layer, resulting in improved accuracy in extracting medical text entities and relations. Zhao et al. [21] proposed an adversarial training-based lattice LSTM model that integrated character and word embeddings, which incorporated adversarial perturbations into the lattice LSTM structure, with the specific goal of enhancing recognition performance in the context of Chinese clinical texts. Li et al. [22] enhanced CNER by including dictionary data and radical properties of Chinese characters to improve the contextualized representation of words in their model. Kong et al. [23] introduced a CNN model that employs a multi-level CNN structure to capture contextual information, making use of GPU parallelism and enhancing model performance. An et al. [24] enhanced CNER by incorporating a multi-head attention mechanism, which integrates a multi-head attention mechanism with a medical dictionary, enabling more efficient capturing of the relationship between Chinese characters and multi-level semantic features. Guo et al. [25] used the Transformer layer of the soft-dictionary structure to replace the traditional LSTM, and the soft-dictionary system of the Transformer layer not only supports parallel computation to save a lot of time but also captures more contextual dependencies and correlations.

In summary, deep learning-based NER has shown promising results and is gaining traction for practical NER tasks. Therefore, we propose a multi-feature fusion and multi-scale local context enhancement method for CNER. By extracting multi-feature embedding and fusing multi-scale local contextual features, our approach enables the model to better comprehend Chinese clinical text information, thereby improving overall model performance.

3 Proposed Method

The model’s architecture, shown in Fig. 1, consists of four neural network layers: a feature embedding layer, a feature extraction layer, a multi-head attention (MHA) mechanism layer, and a CRF layer. Below are the detailed details of each layer:

images

Figure 1: The overall model architecture

3.1 Feature Embedding Layer

To obtain as much rich semantic information as possible from the sequence, we extract sequence features from five different perspectives: pinyin feature, radical feature, Part of Speech (POS) feature, word boundary feature, and deep contextual word-level feature BERT.

3.1.1 Pinyin Feature

Pinyin, as the official standard for the pronunciation of Chinese characters, contains a wealth of semantic information. For dealing with low-frequency and unknown words, pinyin can also provide valuable pronunciation information. This feature is particularly important in medical texts that contain a large number of specialized clinical terms, because the same Chinese character may have completely different meanings under different pinyin. For example, although the Chinese character “ images ” in “ images ” (ENG: Traditional Chinese Medicine) and “ images ” (ENG: stroke) have the same character form, they have different pinyin, which leads to obvious differences in their semantics. The former refers to traditional Chinese medicine, while the latter represents an acute cerebrovascular disease. Therefore, simply converting Chinese characters into word-level vectors may lose this semantic information. In order to distinguish the different meanings expressed by polyphonic and homophonic characters in Chinese, we can assist the word-level semantic expression of Chinese characters by integrating pinyin features. Specifically, the pinyin vector consists of 27 dimensions: the first 26 dimensions correspond to the 26 letters in the pinyin system, while the last dimension is used to represent the tones of the Chinese character. To construct a pinyin vector, we first obtain the pinyin of each Chinese character in the corpus. Then, we count the number of occurrences of each pinyin letter in the first 26 dimensions of the vector, and combine it with the tone information in the last dimension to construct a comprehensive 27-dimensional pinyin vector. The construction process of the pinyin vector is as follows:

$Qpinyin=Epinyin[Fpinyin(X)]$ (1)

where $X$ represents the input sequence, the $Fpinyin$ function maps the input sequence to a sequence of pinyin sequence, $Epinyin$ represents the mapping table between pinyin and the input sequence, and $Qpinyin$ represents the pinyin vector.

3.1.2 Radical Feature

The radicals of Chinese characters are important in understanding the composition and meaning of Chinese characters. Radicals are often presented as specific symbols or shapes that help us decipher the pronunciation and meaning of a Chinese character. In the Chinese character system, characters sharing the same radicals are often semantically related. For example, “ images ” (ENG: flower) and “ images ” (ENG: grass) both contain the radical “ images ”, which is also commonly associated with herbs. In the medical field, many specialized clinical terminology naming entities also show consistent patterns of radicals. For example, many disease terms carry the radical “ images ” such as “ images ” (ENG: rash) and “ images ” (ENG: itch). Traditional BERT models may not be able to capture subtle internal feature differences when dealing with low-frequency or unknown words, and thus extracting radical feature vectors can be helpful for vector bias of low-frequency or unknown words. To construct the radical vectors, we first create the radical of each character from the corpus and count the set of all occurrences of the radical. Then, we extract the radical of the current character and obtain the subscript of the position of the radical in the set as a one-dimensional radical vector for that character. The process of constructing its radical vector is as follows:

$Qradical=Eradical[Fradical(X)]$ (2)

where $X$ represents the input sequence, the $Fradical$ function maps the input sequence to a sequence of radical sequence, $Eradical$ represents the mapping table between radical and the input sequence, and $Qradical$ represents the radical vector.

3.1.3 Pos Feature

POS is the grammatical property or lexical type that words have in a sentence. It describes the role and grammatical characteristics of a word in a sentence. In CNER, POS provides great help in extracting named entities. In addition, medical texts also contain a large number of terms with the same word form but different POS with very different meanings. For example, “ images ” (ENG: infection), when used as a noun, denotes a concept or state that refers to the process of a pathogen spreading into an organism and causing an abnormal reaction. When used as a verb, it denotes an action or process that refers to the invasion of an organism by a pathogen that causes an infection. POS helps us to distinguish these terms and determine their roles and functions in the sentence. Therefore, extracting POS features helps the model to understand the text more deeply. To construct the POS vector, we first create the set of all POS, then extract the POS of each character and obtain the subscript of the position of that POS in the set as the one-dimensional POS vector of that character. The procedure of constructing the POS vector is as follows:

$QPOS=EPOS[FPOS(X)]$ (3)

where $X$ represents the input sequence, the $FPOS$ function maps the input sequence to a sequence of POS sequence, $EPOS$ represents the mapping table between POS and the input sequence, and $QPOS$ represents the POS vector.

3.1.4 Word Boundary Feature

In general domain datasets, names of places and organizations usually have distinct word boundaries, such as containing distinct boundary words like “ images ” (ENG: province) and “ images ” (ENG: city). However, in CNER, many entities do not have distinct boundaries, such as “ images images ” (ENG: rhabdomyosarcoma) and “ images ” (ENG: tumor tissue). Therefore, using the word boundary feature is crucial to addressing the issue of entity boundary ambiguity. To construct the word boundary vector, we first divide the sequence into words to obtain the word boundary sequence. We then encode the word boundary sequence using 3 dimensions one-hot encoding to obtain the final word boundary vector. Table 1 illustrates an example of constructing word boundary vector. The construction process of the word boundary vector is as follows:

$Qboundary=Eboundary[Fboundary(X)]$ (4)

where $X$ represents the input sequence, the $Fboundary$ function maps the input sequence to a sequence of word boundary sequence, $Eboundary$ represents the mapping table between word boundary and the input sequence, and $Qboundary$ represents the word boundary vector.

images

3.1.5 Deep Context Word-Level Feature

To more accurately represent semantic information of Chinese clinical texts, we introduce BERT [26], an unsupervised deep bi-directional language model for obtaining the deep context representation of each word. BERT utilizes a deep bi-directional transformer encoder as its core architecture. The transformer architecture incorporates a self-attention mechanism and employs residual concatenation to mitigate network degradation, resulting in notable improvements in both training speed and model expressiveness. The sequence is BERT-encoded with a word embedding representation $QBERT$ .

We splice the five obtained features to get the final fused representation vector:

$Qfus=QBERT⊕Qpinyin⊕Qradical⊕QPOS⊕Qboundary$ (5)

3.2 Feature Extraction Layer

In order to obtain structural and semantic information of different levels of data, we simultaneously extract deep semantic features from both global and local perspectives using BiLSTM and multi-scale CNNs, respectively.

3.2.1 BiLSTM

To accurately represent the global semantic information of fusion vectors, we employ an LSTM network for feature extraction. The LSTM structure comprises three gates: the input gate, the output gate, and the forget gate. These gates allow the LSTM to select and utilize important information while handling input sequences. They enable selective storage and discarding of data, efficiently addressing the problem of gradient vanishing or exploding during the processing of lengthy text sequences.

To address the limitation of the hidden vector $ht$ in capturing contextual information in only one direction and learning semantic dependencies solely in a unidirectional sequence, we enhance the traditional LSTM by incorporating a BiLSTM. This modification enables better capture of semantic dependencies over longer distances. The BiLSTM utilizes contextual information from both forward and backward directions, generating two distinct semantic vectors: $ht→$ and $ht←$ . Finally, the hidden vectors from these two opposite directions are concatenated to obtain the complete context semantic vector $Ht=[ht→;ht←]$ .

3.2.2 Multi-Scale CNNs

Due to the large number of clinical terms in CEMR, there may be strong correlations between neighboring characters, e.g., “ images ” (ENG: stomach cancer) and “ images CT” (ENG: stomach CT), the former is a disease and the latter is an examination. To capture local features between characters, multiple CNNs with varying kernel sizes are utilized to extract potential local contextual features within text sequences. This approach makes up for a deficiency in the limitations of BiLSTM, which primarily captures global features.

For multi-feature fusion sequence $Q=(Q1,Q2,…,Qn)$ , we perform a convolution operation using multiple convolution kernels of different sizes. Each convolution kernel is of size $k$ , which means that each convolution kernel captures the local context features between $k$ neighboring characters. By applying multiple convolutional kernels of different sizes, we can obtain multiple sets of local context features of different sizes. The multi-scale convolutional method enhances the model’s feature extraction in sequence data by capturing local semantic and structural features more effectively. The formula for the multi-scale CNNs is as follows:

$Otkn=ReLU(wT⋅Q⌊t−k−12⌋:⌊t+k−12⌋+b)$ (6)

$Ot=Otk1+Otk2+⋯+Otkn$ (7)

where $Q⌊t−k−12⌋:⌊t+k−12⌋$ represents the embedding from $⌊t−k−12⌋$ to $⌊t−k+12⌋$ , $ReLU$ is the activation function, $Otkn$ represents the convolutional output with convolutional kernel size $k$ , the “+” denotes the element summation operation, and $Ot$ denotes the fusion feature embedding.

To improve CNER, we utilize a gate mechanism to effectively combine global and multi-scale local context semantic feature. This gate mechanism is capable of dynamically assigning weights and deciding how to utilize these features to label named entities. Its formula is as follows:

$St=σ(Ws1⋅Ht+Ws2⋅Ot+bt)$ (8)

$Gt=[St∘Ht]⊕[(1−St)∘Ot]$ (9)

where $St$ is used to evaluate the global and local contextual feature encoding, $Ws1$ , $Ws2$ are the trainable matrices, $bt$ is the bias term, $Ot$ is the local context feature input, $Ht$ is the global context feature input, and $Gt$ is the output of the corresponding gate mechanism.

3.3 Multi-Head Attention Mechanism Layer

To better capture important features and correlations in a sequence, we employ a attention mechanism [27]. This mechanism automatically learns the distribution of attention weights at different locations and scales to enable feature selection and generate more expressive feature representations. The formula for the MHA is as follows:

$Attention(Q, K, V)=softmax(QKTdk)V$ (10)

$Ei=Concat(head1,…,headn)Wo$ (11)

$where headi=Attention(QWiQ, KWiK, VWiV)$ (12)

where $Q$ , $K$ and $V$ denote the query, key and value matrices, respectively. $dk$ denotes the scaling factor, which is used to adjust the range of values for the attention weights.

After obtaining specific contextual representations from multiple heads of attention, we use feed-forward neural networks (FNN) to better aggregate and encode features from different spaces.

The formula is as follows:

$Ei=FNN(Ei)$ (13)

3.4 CRF Layer

To assign labels to each character based on the final output vector, we employ CRFs for prediction. CRFs are commonly applied in tasks such as POS tagging and NER, taking advantage of their ability to model label dependencies. CRFs calculate the probability distribution of a certain random variable and utilize Viterbi’s technique for decoding. This algorithm takes into account the relationships between adjacent labels to get the most effective overall label sequence.

Given an input sequence $X$ and the corresponding hidden state sequence obtained from the model $h$ , the conditional probability of the output label sequence $Y$ can be computed using the definition of CRF. The formula is as follows:

$s(h, y)=∑i=0nAyi, yi+1+∑i=1nPi, yi$ (14)

$P(yh)=escore(h, y)∑y′∈Y(h)escore(h, y′)$ (15)

where $A$ represents the transition score matrix between two labels. $Ayi, yi+1$ represents the probability of transitioning from label $yi$ at position $i$ to label $yi+1$ in the sequence. $Pi, yi$ denotes the probability of labeling position $i$ as $yi$ . $P(yh)$ corresponds to the normalized exponential function, and $Y(h)$ represents all possible label sequences.

4 Experiments

4.1 Datasets

We assessed our model’s performance using two datasets: CCKS2017 and CCKS2019. The datasets provide an impartial evaluation of our model. Here are the dataset descriptions.

CCKS20171: The dataset is a collection of CEMR released by the 2017 National Conference on Knowledge Graph and Semantic Computing and donated by Beijing Jimu Cloud Health Technology Co. (Beijing, China).The dataset comprises 1596 labeled samples, divided into 1198 samples for training and 398 samples for testing. The dataset has five categories of entities: Symptom, Disease, Check, Treatment, and Body. The statistics for each entity category are available in Table 2.

images

CCKS20192: The dataset is a CEMR dataset that Yidu Cloud Technology C released as part of the 2019 National Conference on Knowledge Graph and Semantic Computing. The dataset consists of 1379 labeled samples, divided into a training set of 1000 samples and a testing set of 379 samples. The dataset contains six categories of entities: Anatomy, Disease, Exam, Medicine, Operation, and Check. The statistics for each entity category are contained in Table 3.

images

4.2 Evaluation Metrics

We employ common evaluation measures for CNER to evaluate the model’s performance: precision rate (P), recall rate (R), and F1-score (F1). The formulas for each evaluation metric are as follows:

$P=TPTP+FP$ (16)

$R=TPTP+FN$ (17)

$F1=2×P×RP+R$ (18)

where $TP$ denotes the count of entity types correctly predicted by the model, $FP$ denotes the count of irrelevant entities predicted by the model, and $FN$ denotes the count of entity types not successfully predicted by the model.

4.3 Experiment Setting

The parameter configurations utilized in this work are detailed in Table 4, encompassing a maximum word length of 128, a batch size of 16, 25 epochs per training session, and the AdamW optimization algorithm with a learning rate of 2e-5 and a dropout rate of 0.1.

images

4.4 Experiments and Analyses

4.4.1 Models Performance Comparison

This section presents a comparison between our model and other models. The results of the comparison between our model and benchmark models are presented in Table 5. Additionally, we compared our model with the latest models, as shown in Tables 6 and 7. The comparison models we selected include ELMo-lattice-LSTM-CRF [28], ACNN [23], RD-CNN-CRF [5], MKRGCN [29], MUSA-BiLSTM-CRF [24], AT-LatticeLSTM-CRF [21], FT-BERT-BiLSTM-CRF [30], ELMo-ET-CRF [31], RGT-CRF [32].

images

As shown in Table 5, our model exhibits excellent performance on both the CCKS2017 and CCKS2019 datasets compared to all benchmark models. For the CCKS2017 dataset, our model achieves 92.00% precision, 93.55% recall, and 92.74% F1 value. On the CCKS2019 dataset, our model achieves 89.02% precision, 86.78% recall, and 87.80% F1 value. Comparison with the benchmark model BERT-BiLSTM-MHA-CRF reveals a maximum F1 value difference of 3.31% and a minimum of 2.54%. This indicates that the BERT model, after fusing multi-feature embedding and multi-scale local contextual features, outperforms the BERT-only model in terms of feature representations, thereby validating the effectiveness of incorporating multi-feature embedding and extracting multi-scale local contextual features. Additionally, as illustrated in Tables 6 and 7, our model also outperforms the latest models. On the CCKS2017 dataset, our model increases the F1 value by 0.86% compared to the second-highest model and by 3.1% compared to the lowest model. On the CCKS2019 dataset, our model increases the F1 value by 1.11% compared to the second-highest model and by 2.78% compared to the lowest model. The excellent results shown by our model indicate that incorporating multi-feature embedding can greatly enhance the semantic representation of entities and make the model have better contextual representation, while using CNN to extract multi-scale local contextual features makes up for the shortcoming of using BiLSTM alone to extract global contextual features while ignoring local contextual features, and enhances the effectiveness of feature extraction.

4.4.2 The Effect of Different Features on the Model

To investigate the impact of various features on the entity recognition performance of CEMR, we incorporated multi-feature into the BiLSTM-CRF model and carried out experiments. The impact of distinct characteristics on the model’s entity recognition performance is displayed in Table 8.

images

Based on the experimental findings presented in Table 8, incorporating the pinyin feature, radical feature, POS feature, and word boundary feature into the BiLSTM-CRF model yielded comparable impacts on entity recognition, with some slight distinctions remaining. Thus, these four characteristics have a similar level of impact on the model. We combined the four feature vectors and fed them into the BiLSTM-CRF model. The model achieved the highest precision, recall and F1 values. Consequently, the values produced by combining many features in the model surpass the results achieved by using any single feature alone, demonstrating the superior usefulness of using multi-feature.

4.4.3 Impact of Different Number of Heads on the Model of the MHA

The MHA is commonly utilized in tasks like NER to capture inter-sequence relationships by employing many attention heads concurrently. The impact of the number of attention heads on model performance has not been thoroughly investigated. We conducted tests on two datasets, CCKS2017 and CCKS2019, to investigate how the number of heads impacts model efficiency.

Fig. 2 illustrates how increasing the number of attention heads at the beginning may improve the performance of the model. The model’s performance peaks when the number of heads is increased to 8. Increasing the number of attention heads enhances the model’s capacity to characterize complicated patterns and represent input sequences more effectively. Yet, as the number of heads increases further, the performance starts to decline. Increasing the number of attention heads results in more computational complexity, which impacts performance. We must balance performance improvement and computational complexity while selecting the number of attention heads.

images

Figure 2: Impact of different head counts on CCKS2017, CCKS2019

4.4.4 Effectiveness of Multi-Scale CNNs

We performed ablation experiments on CCKS2017 and CCKS2019 datasets to assess the impact of the multi-scale CNNs module in the proposed model. The Table 9 displays the findings from the experiments conducted on CCKS2017 and CCKS2019 dataset. The results demonstrate that our entire model outperforms all others, and eliminating the multi-scale CNNs results in decreased performance. This highlights the necessity for the model to employ multi-scale local features and validates the efficiency of multi-scale CNNs.

images

In addition, we used multiple combinations of convolutional kernels for comparison when extracting context-localized features using multi-scale CNNs. Fig. 3 displays the comparison results of several sets of convolutional kernels on CCKS2019. The precision, recall, and F1-score are highest when using 1, 3, and 5 convolutional kernels, and decrease as the window size increases. This decrease may be attributed to the loss of local contextual information when using larger convolutional kernels, leading to reduced performance. Therefore, in order to extract as many local contextual features as possible, we used convolutional kernels with three window sizes of 1, 3, and 5.

images

Figure 3: Comparison results of several sets of convolutional kernels on CCKS2019

4.4.5 Separate P, R and F1 for Each Entity Category

For a comprehensive evaluation of our model, Fig. 4 illustrates the Precision, Recall, and F1 for each entity category individually across the two datasets.

images

Figure 4: Precision, Recall, and F1 for each entity categories in the CCKS2017, CCKS2019

5 Conclusion and Future Work

This study introduces a CNER model that incorporates a multi-feature fusion and multi-scale local context enhancement approach. The model combines the features of pinyin, radical, POS, and word boundary while also leveraging the deep contextual representation of BERT. In addition, the fusion of multi-scale CNNs solves the limitation that the original BiLSTM can only extract global contextual features and enhances feature extraction, thus realizing a comprehensive understanding of the sentence information. Experimental assessments were carried out on two public datasets, showcasing the model’s robust performance.

In future research, we will focus on exploring more effective fusion strategies and incorporating additional information from different dimensions to further improve recognition.

Acknowledgement: None.

Funding Statement: This study was supported by the National Natural Science Foundation of China (61911540482 and 61702324).

Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Meijing Li, Runqing Huang; data collection: Runqing Huang; methodology: Meijing Li, Runqing Huang; analysis and interpretation of results: Xianxian Qi, Runqing Huang; writing—original draft: Runqing Huang; writing—review and editing: Meijing Li, Xianxian Qi. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Data contained in articles. Code no longer publicly available due to copyright restrictions.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

1CCKS2017: https://www.sigkg.cn/ccks2017 (accessed on 22/03/2024).

2CCKS2019: https://www.sigkg.cn/ccks2019 (accessed on 04/04/2024).

References

1. A. Dash, S. Darshana, D. K. Yadav, and V. Gupta, “A clinical named entity recognition model using pretrained word embedding and deep neural networks,” Decis. Anal. J., vol. 10, pp. 100426, Mar. 2024. doi: 10.1016/j.dajour.2024.100426. [Google Scholar] [CrossRef]

2. A. Boonstra and M. Broekhuis, “Barriers to the acceptance of electronic medical records by physicians from systematic review to taxonomy and interventions,” BMC Health Serv. Res., vol. 10, no. 1, pp. 231, Dec. 2010. doi: 10.1186/1472-6963-10-231. [Google Scholar] [PubMed] [CrossRef]

3. T. Wang, P. Xuan, Z. Liu, and T. Zhang, “Assistant diagnosis with Chinese electronic medical records based on CNN and BiLSTM with phrase-level and word-level attentions,” BMC Bioinform., vol. 21, no. 1, pp. 230, Dec. 2020. doi: 10.1186/s12859-020-03554-x. [Google Scholar] [PubMed] [CrossRef]

4. J. Lei, B. Tang, X. Lu, K. Gao, M. Jiang and H. Xu, “A comprehensive study of named entity recognition in Chinese clinical text,” J. Am. Med. Inform. Assoc., vol. 21, no. 5, pp. 808–814, Sep. 2014. doi: 10.1136/amiajnl-2013-002381. [Google Scholar] [PubMed] [CrossRef]

5. J. Qiu, Y. Zhou, Q. Wang, T. Ruan, and J. Gao, “Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field,” IEEE Trans. Nanobiosci., vol. 18, no. 3, pp. 306–315, Jul. 2019. doi: 10.1109/TNB.2019.2908678. [Google Scholar] [PubMed] [CrossRef]

6. Q. Wang, Y. Zhou, T. Ruan, D. Gao, Y. Xia and P. He, “Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition,” J. Biomed. Inform., vol. 92, pp. 103133, Apr. 2019. doi: 10.1016/j.jbi.2019.103133. [Google Scholar] [PubMed] [CrossRef]

7. P. J. Gorinski et al., “Named entity recognition for electronic health records: A comparison of rule-based and machine learning approaches,” arXiv:1903.03985, 2019. [Google Scholar]

8. M. Jiang et al., “A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries,” J. Am. Med. Inform. Assoc., vol. 18, no. 5, pp. 601–606, Sep. 2011. doi: 10.1136/amiajnl-2011-000163. [Google Scholar] [PubMed] [CrossRef]

9. Y. Hu et al., “Improving large language models for clinical named entity recognition via prompt engineering,” J. Am. Med. Inform. Assoc., vol. 13, pp. ocad259, Jan. 2024. doi: 10.1093/jamia/ocad259. [Google Scholar] [PubMed] [CrossRef]

10. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. doi: 10.1162/neco.1997.9.8.1735. [Google Scholar] [PubMed] [CrossRef]

11. Z. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” arXiv:1508.01991, 2015. [Google Scholar]

12. J. Yin, S. Luo, Z. Wu, and L. Pan, “Chinese named entity recognition with character level BLSTM and soft attention model,” J. Beijing Instit. Technol., vol. 29, no. 1, pp. 1520–1532, 2020. doi: 10.1109/TASLP.2020.2994436. [Google Scholar] [CrossRef]

13. Z. Tang, B. Wan, and L. Yang, “Word-character graph convolution network for Chinese named entity recognition,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 1520–1532, 2020. doi: 10.1109/TASLP.2020.2994436. [Google Scholar] [CrossRef]

14. Z. Zhu, J. Li, Q. Zhao, and F. Akhtar, “A dictionary-guided attention network for biomedical named entity recognition in Chinese electronic medical records,” Expert. Syst. Appl., vol. 231, pp. 120709, Nov. 2023. doi: 10.1016/j.eswa.2023.120709. [Google Scholar] [CrossRef]

15. M. Awad and R. Khanna, Hidden Markov Model, in Efficient Learning Machines. Berkeley, CA, USA: Apress, pp. 81–104, 2015. [Google Scholar]

16. J. D. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Int. Conf. Mach. Learn., San Francisco, CA, USA, 2001, pp. 282–289. [Google Scholar]

17. T. Wang et al., “A hybrid model based on deep convolutional network for medical named entity recognition,” J. Electr. Comput. Eng., vol. 2023, pp. 1–11, May 2023. doi: 10.1155/2023/8969144. [Google Scholar] [CrossRef]

18. Y. Zhang and J. Yang, “Chinese NER using lattice LSTM,” in Proc. Annu. Meet. Assoc. Comput. Linguist., Melbourne, Australia, 2018, pp. 1554–1564. [Google Scholar]

19. F. Wu, J. Liu, C. Wu, Y. Huang, and X. Xie, “Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation,” in Proc. World Wide Web Conf., San Francisco, CA, USA, 2019, pp. 3342–3348. [Google Scholar]

20. K. Xue, Y. Zhou, Z. Ma, T. Ruan, H. Zhang and P. He, “Fine-tuning BERT for joint entity and relation extraction in Chinese medical text,” in 2019 IEEE Int. Conf. Bioinform. Biomed., San Diego, CA, USA, 2019, pp. 892–897. [Google Scholar]

21. S. Zhao, Z. Cai, H. Chen, Y. Wang, F. Liu and A. Liu, “Adversarial training based lattice LSTM for Chinese clinical named entity recognition,” J. Biomed. Inform., vol. 99, pp. 103290, Nov. 2019. doi: 10.1016/j.jbi.2019.103290. [Google Scholar] [PubMed] [CrossRef]

22. D. Li, J. Long, J. Qu, and X. Zhang, “Chinese clinical named entity recognition with ALBERT and MHA mechanism,” Evid. Based Complement. Alternat. Med., vol. 2022, pp. 1–9, May 2022. doi: 10.1155/2022/2056039. [Google Scholar] [PubMed] [CrossRef]

23. J. Kong, L. Zhang, M. Jiang, and T. Liu, “Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition,” J. Biomed. Inform., vol. 116, pp. 103737, Apr. 2021. doi: 10.1016/j.jbi.2021.103737. [Google Scholar] [PubMed] [CrossRef]

24. Y. An, X. Xia, X. Chen, F. X. Wu, and J. Wang, “Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF,” Artif. Intell. Med., vol. 127, no. C, pp. 102282, May 2022. doi: 10.1016/j.artmed.2022.102282. [Google Scholar] [PubMed] [CrossRef]

25. S. Guo, W. Yang, L. Han, X. Song, and G. Wang, “A multi-layer soft lattice based model for Chinese clinical named entity recognition,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 201, Dec. 2022. doi: 10.1186/s12911-022-01924-4. [Google Scholar] [PubMed] [CrossRef]

26. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Conf. N. Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Minneapolis, MN, USA, 2019, pp. 4171–4186. [Google Scholar]

27. A. Vaswani et al., “Attention is all you need,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 6000–6010. [Google Scholar]

28. Y. Li et al., “Chinese clinical named entity recognition in electronic medical records: Development of a Lattice long short-term memory model with contextualized character representations,” JMIR Med. Inform., vol. 8, no. 9, pp. e19848, Sep. 2020. doi: 10.2196/19848. [Google Scholar] [PubMed] [CrossRef]

29. Y. Xiong et al., “Leveraging multi-source knowledge for Chinese clinical named entity recognition via relational graph convolutional network,” J. Biomed. Inform., vol. 128, pp. 104035, Apr. 2022. doi: 10.1016/j.jbi.2022.104035. [Google Scholar] [PubMed] [CrossRef]

30. X. Li, H. Zhang, and X. H. Zhou, “Chinese clinical named entity recognition with variant neural structures based on BERT methods,” J. Biomed. Inform., vol. 107, pp. 103422, Jul. 2020. doi: 10.1016/j.jbi.2020.103422. [Google Scholar] [PubMed] [CrossRef]

31. Q. Wan et al., “A self-attention based neural architecture for Chinese medical named entity recognition,” MBE, vol. 17, no. 4, pp. 3498–3511, 2020. doi: 10.3934/mbe.2020197. [Google Scholar] [PubMed] [CrossRef]

32. J. Li, R. Liu, C. Chen, S. Zhou, X. Shang and Y. Wang, “An RG-FLAT-CRF model for named entity recognition of Chinese electronic clinical records,” Electronics, vol. 11, no. 8, pp. 1282, Apr. 2022. doi: 10.3390/electronics11081282. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Li, M., Huang, R., Qi, X. (2024). Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement. Computers, Materials & Continua, 80(2), 2283–2299. https://doi.org/10.32604/cmc.2024.053630

Vancouver Style

Li M, Huang R, Qi X. Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement. Comput Mater Contin. 2024;80(2):2283–2299. https://doi.org/10.32604/cmc.2024.053630

IEEE Style

M. Li, R. Huang, and X. Qi, “Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement,” Comput. Mater. Contin., vol. 80, no. 2, pp. 2283–2299, 2024. https://doi.org/10.32604/cmc.2024.053630

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement

Abstract

Keywords

References

Cite This Article

910

368

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link