Open Access
ARTICLE
Short Text Entity Disambiguation Algorithm Based on Multi-Word Vector Ensemble
1 College of Computer Science and Information Technology, Central South University of Forestry and Technology, Changsha, 410004, China
2 Department of mathematics and computer science, Northeastern State University, OK, 74464, USA
* Corresponding Author: Xuyu Xiang. Email:
Intelligent Automation & Soft Computing 2021, 30(1), 227-241. https://doi.org/10.32604/iasc.2021.017648
Received 05 February 2021; Accepted 16 April 2021; Issue published 26 July 2021
Abstract
With the rapid development of network media, the short text has become the main cover of information dissemination by quickly disseminating relevant entity information. However, the lack of context in the short text can easily lead to ambiguity, which will greatly reduce the efficiency of obtaining information and seriously affect the user’s experience, especially in the financial field. This paper proposed an entity disambiguation algorithm based on multi-word vector ensemble and decision to eliminate the ambiguity of entities and purify text information in information processing. First of all, we integrate a variety of unsupervised pre-trained word vector models as vector embeddings according to different word vector models’ characteristics. Moreover, we use the classic architecture of long short-term memory (LSTM) combined with the convolutional neural network (CNN) to fine-tune pre-trained Chinese word vectors such as BERT to integrate the output of entity recognition results. Then build the knowledge base and introduce the focal loss function on the basis of CNN and binary classification to improve the effect of entity disambiguation. Experimental results show that the algorithm performs better than the traditional entity disambiguation algorithm based on the single word vector. This method can accurately locate the entity to be disambiguated and has a good performance in disambiguation accuracy.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.