GNEG: Graph-Based Negative Sampling for word2vec

Zheng Zhang, Pierre Zweigenbaum


Abstract
Negative sampling is an important component in word2vec for distributed word representation learning. We hypothesize that taking into account global, corpus-level information and generating a different noise distribution for each target word better satisfies the requirements of negative examples for each training word than the original frequency-based distribution. In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk. We test this hypothesis through a set of experiments whose results show that our approach boosts the word analogy task by about 5% and improves the performance on word similarity tasks by about 1% compared to the skip-gram negative sampling baseline.
Anthology ID:
P18-2090
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
566–571
URL:
https://www.aclweb.org/anthology/P18-2090
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
Poster:
 P18-2090.Poster.pdf