Dimitar Doikoff


2004

pdf bib
A Language Resources Infrastructure for Bulgarian
Kiril Simov | Petya Osenova | Sia Kolkovska | Elisaveta Balabanova | Dimitar Doikoff
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper describes the infrastructure of a basic language resources set for Bulgarian in the context of BLARK initiative requirements. We focus on the treebanking task as a trigger for basic language resources compilation. Two strategies have been applied in this respect: (1) implementing the main pre-processing modules before the treebank compilation and (2) creating more elaborate types of resources in parallel to the treebank compilation. The description of language resources within BulTreeBank project is divided into two parts: language technology, which includes tokenization, morphosyntactic analyzer, morphosyntactic disambiguation, partial grammars, and language data, which includes the layers of the BulTreeBank corpus and the variety of lexicons. The advantages of our approach to a less-spoken language (like Bulgarian) are as follows: it triggers the creation of the basic set of language resources which lack for certain languages and it rises the question about the ways of language resources creation.

2002

pdf bib
Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank
Kiril Simov | Petya Osenova | Milena Slavcheva | Sia Kolkovska | Elisaveta Balabanova | Dimitar Doikoff | Krassimira Ivanova | Alexander Simov | Milen Kouylekov
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)