Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation

Burcu Can, Suresh Manandhar


Abstract
This article presents a probabilistic hierarchical clustering model for morphological segmentation. In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process. Tree hierarchies are learned along with the corresponding morphological paradigms simultaneously. Our model is evaluated on Morpho Challenge and shows competitive performance when compared to state-of-the-art unsupervised morphological segmentation systems. Although we apply this model for morphological segmentation, the model itself can also be used for hierarchical clustering of other types of data.
Anthology ID:
J18-2005
Volume:
Computational Linguistics, Volume 44, Issue 2 - June 2018
Month:
June
Year:
2018
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
349–374
Language:
URL:
https://aclanthology.org/J18-2005
DOI:
10.1162/COLI_a_00318
Bibkey:
Cite (ACL):
Burcu Can and Suresh Manandhar. 2018. Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation. Computational Linguistics, 44(2):349–374.
Cite (Informal):
Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation (Can & Manandhar, CL 2018)
Copy Citation:
PDF:
https://aclanthology.org/J18-2005.pdf
Code
 burcu-can/TreeStructuredDP