Collective Entity Disambiguation with Structured Gradient Tree Boosting

Yi Yang, Ozan Irsoy, Kazi Shefaet Rahman


Abstract
We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many top-performing natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset.
Anthology ID:
N18-1071
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
777–786
Language:
URL:
https://aclanthology.org/N18-1071
DOI:
10.18653/v1/N18-1071
Bibkey:
Cite (ACL):
Yi Yang, Ozan Irsoy, and Kazi Shefaet Rahman. 2018. Collective Entity Disambiguation with Structured Gradient Tree Boosting. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 777–786, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Collective Entity Disambiguation with Structured Gradient Tree Boosting (Yang et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1071.pdf
Code
 bloomberg/sgtb
Data
AIDA CoNLL-YAGOCoNLL 2003