Open Access
ARTICLE
New Generation Model of Word Vector Representation Based on CBOW or Skip-Gram
HPCL, School of Computer Science, National University of Defense Technology, Changsha, 410073, China.
Department of Computer Science, University of North Carolina at Chapel Hill, 27599, USA.
* Corresponding Author: Zeyu Xiong. Email: .
Computers, Materials & Continua 2019, 60(1), 259-273. https://doi.org/10.32604/cmc.2019.05155
Abstract
Word vector representation is widely used in natural language processing tasks. Most word vectors are generated based on probability model, its bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. Recently, neural-network language models CBOW and Skip-Gram are developed as continuous-space language models for words representation in high dimensional real-valued vectors. These vector representations have recently demonstrated promising results in various NLP tasks because of their superiority in capturing syntactic and contextual regularities in language. In this paper, we propose a new strategy based on optimization in contiguous subset of documents and regression method in combination of vectors, two of new models CBOW-OR and SkipGram-OR for word vector learning are established. Experimental results show that for some words-pair, the cosine distance obtained by the CBOW-OR (or SkipGram-OR) model is generally larger and is more reasonable than CBOW (or Skip-Gram), the vector space for Skip-Gram and SkipGram-OR keep the same structure property in Euclidean distance, and the model SkipGram-OR keeps higher performance for retrieval the relative words-pair as a whole. Both CBOW-OR and SkipGram-OR model are inherent parallel models and can be expected to apply in large-scale information processing.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.