Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus

Erwan Moreau, Carl Vogel


Anthology ID:
L18-1180
Volume:
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Month:
May
Year:
2018
Address:
Miyazaki, Japan
Editors:
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
https://aclanthology.org/L18-1180
DOI:
Bibkey:
Cite (ACL):
Erwan Moreau and Carl Vogel. 2018. Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Cite (Informal):
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus (Moreau & Vogel, LREC 2018)
Copy Citation:
PDF:
https://aclanthology.org/L18-1180.pdf