Batch IS NOT Heavy: Learning Word Representations From All Samples

Xin Xin, Fajie Yuan, Xiangnan He, Joemon M. Jose


Abstract
Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the one-sample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remains at the same level as SGD, being determined by the number of positive samples rather than all samples. We evaluate AllVec on several benchmark tasks. Experiments show that AllVec outperforms sampling-based SGD methods with comparable efficiency, especially for small training corpora.
Anthology ID:
P18-1172
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1853–1862
Language:
URL:
https://aclanthology.org/P18-1172
DOI:
10.18653/v1/P18-1172
Bibkey:
Cite (ACL):
Xin Xin, Fajie Yuan, Xiangnan He, and Joemon M. Jose. 2018. Batch IS NOT Heavy: Learning Word Representations From All Samples. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1853–1862, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Batch IS NOT Heavy: Learning Word Representations From All Samples (Xin et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1172.pdf
Video:
 https://aclanthology.org/P18-1172.mp4