Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

Taraka Rama


Abstract
We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection. The sd-CRP clustering algorithms do not require any predefined threshold for detecting cognate sets in a multilingual word list. We evaluate the performance of the algorithms on six language families (more than 750 languages) and find that both the sd-CRP variants performs as well as InfoMap and better than UPGMA at the task of inferring cognate clusters. The algorithms presented in this paper are family agnostic and can be applied to any linguistically under-studied language family.
Anthology ID:
K18-1027
Volume:
Proceedings of the 22nd Conference on Computational Natural Language Learning
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Anna Korhonen, Ivan Titov
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–281
Language:
URL:
https://aclanthology.org/K18-1027
DOI:
10.18653/v1/K18-1027
Bibkey:
Cite (ACL):
Taraka Rama. 2018. Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 271–281, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists (Rama, CoNLL 2018)
Copy Citation:
PDF:
https://aclanthology.org/K18-1027.pdf