Open Access
ARTICLE
Chinese Q&A Community Medical Entity Recognition with Character-Level Features and Self-Attention Mechanism
1 School of Management, Nanjing University of Posts & Telecommunications, Nanjing, 210023, China
2 Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing, 210023, China
3 School of Information Management, Nanjing University, Nanjing, 210093, China
4School of Business and Economics, Loughborough University, Leicestershire, LE11 3TU, United Kingdom
5School of Basic Medical Sciences, Nanjing Medical University, Nanjing, 210029, China
* Corresponding Author: Xiaoyan Li. Email:
Intelligent Automation & Soft Computing 2021, 29(1), 55-72. https://doi.org/10.32604/iasc.2021.017021
Received 18 January 2021; Accepted 15 March 2021; Issue published 12 May 2021
Abstract
With the rapid development of Internet, the medical Q&A community has become an important channel for people to obtain and share medical and health knowledge. Online medical entity recognition (OMER), as the foundation of medical and health information extraction, has attracted extensive attention of researchers in recent years. In order to further improve the research progress of Chinese OMER, LSTM-Att-Med model is proposed in this paper to capture more external semantic features and important information. First, Word2vec is used to generate the character-level vectors with semantic features on the basis of the unlabeled corpus in the medical domain and open domain respectively. Then, the two character-level vectors are embedded into BiLSTM-CRF as features to construct LSTM-Wiki and LSTM-Med models. Finally, Self-Attention mechanism is introduced into LSTM-Med model, and the performance of the model is validated by using the self-labeled data. The 10-fold cross-validation experiment shows that LSTM-Att-Med with Self-Attention mechanism introduced achieves the best performance and the F-value can be up to 91.66%, which is 0.72% higher than that of BiLSTM-CRF. In addition, the experiment result demonstrates that the improvements of F-value are inconsistent for different corpora based on LSTM-Att-Med. The paper also analyzes the recognition performance and error results of different medical entities.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.