Open Access
ARTICLE
Graph-Based Chinese Word Sense Disambiguation with Multi-Knowledge Integration
School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China.
School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
Department of Computing, Macquarie University, Sydney, NSW 2109, Australia.
Centre for Audio, Acoustics and Vibration, University of Technology Sydney, Sydney, NSW 2006, Australia.
School of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China.
Jinan Intellectual Property Information Center, Jinan, 250099, China.
* Corresponding Author: Wenpeng Lu. Email: .
Computers, Materials & Continua 2019, 61(1), 197-212. https://doi.org/10.32604/cmc.2019.06068
Abstract
Word sense disambiguation (WSD) is a fundamental but significant task in natural language processing, which directly affects the performance of upper applications. However, WSD is very challenging due to the problem of knowledge bottleneck, i.e., it is hard to acquire abundant disambiguation knowledge, especially in Chinese. To solve this problem, this paper proposes a graph-based Chinese WSD method with multi-knowledge integration. Particularly, a graph model combining various Chinese and English knowledge resources by word sense mapping is designed. Firstly, the content words in a Chinese ambiguous sentence are extracted and mapped to English words with BabelNet. Then, English word similarity is computed based on English word embeddings and knowledge base. Chinese word similarity is evaluated with Chinese word embedding and HowNet, respectively. The weights of the three kinds of word similarity are optimized with simulated annealing algorithm so as to obtain their overall similarities, which are utilized to construct a disambiguation graph. The graph scoring algorithm evaluates the importance of each word sense node and judge the right senses of the ambiguous words. Extensive experimental results on SemEval dataset show that our proposed WSD method significantly outperforms the baselines.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.