Ensemble Methods for Native Language Identification

Sophia Chan, Maryam Honari Jahromi, Benjamin Benetti, Aazim Lakhani, Alona Fyshe


Abstract
Our team—Uvic-NLP—explored and evaluated a variety of lexical features for Native Language Identification (NLI) within the framework of ensemble methods. Using a subset of the highest performing features, we train Support Vector Machines (SVM) and Fully Connected Neural Networks (FCNN) as base classifiers, and test different methods for combining their outputs. Restricting our scope to the closed essay track in the NLI Shared Task 2017, we find that our best SVM ensemble achieves an F1 score of 0.8730 on the test set.
Anthology ID:
W17-5023
Volume:
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Joel Tetreault, Jill Burstein, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
217–223
Language:
URL:
https://aclanthology.org/W17-5023
DOI:
10.18653/v1/W17-5023
Bibkey:
Cite (ACL):
Sophia Chan, Maryam Honari Jahromi, Benjamin Benetti, Aazim Lakhani, and Alona Fyshe. 2017. Ensemble Methods for Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 217–223, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Ensemble Methods for Native Language Identification (Chan et al., BEA 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5023.pdf