Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons

Hwiyeol Jo, Stanley Jungkyu Choi


Abstract
We propose post-processing method for enriching not only word representation but also its vector space using semantic lexicons, which we call extrofitting. The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) on all the word vectors, filling with their representative value. (ii) Transferring semantic knowledge by averaging each representative values of synonyms and filling them in the expanded dimension(s). These two steps make representations of the synonyms close together. (iii) Projecting the vector space using Linear Discriminant Analysis, which eliminates the expanded dimension(s) with semantic knowledge. When experimenting with GloVe, we find that our method outperforms Faruqui’s retrofitting on some of word similarity task. We also report further analysis on our method in respect to word vector dimensions, vocabulary size as well as other well-known pretrained word vectors (e.g., Word2Vec, Fasttext).
Anthology ID:
W18-3003
Volume:
Proceedings of the Third Workshop on Representation Learning for NLP
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Isabelle Augenstein, Kris Cao, He He, Felix Hill, Spandana Gella, Jamie Kiros, Hongyuan Mei, Dipendra Misra
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–29
Language:
URL:
https://aclanthology.org/W18-3003
DOI:
10.18653/v1/W18-3003
Bibkey:
Cite (ACL):
Hwiyeol Jo and Stanley Jungkyu Choi. 2018. Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 24–29, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons (Jo & Choi, RepL4NLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3003.pdf
Code
 HwiyeolJo/Extrofitting +  additional community code
Data
FrameNet