The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic

Amal Al-Saif, Katja Markert


Abstract
We present the first effort towards producing an Arabic Discourse Treebank,a news corpus where all discourse connectives are identified and annotated with the discourse relations they convey as well as with the two arguments they relate. We discuss our collection of Arabic discourse connectives as well as principles for identifying and annotating them in context, taking into account properties specific to Arabic. In particular, we deal with the fact that Arabic has a rich morphology: we therefore include clitics as connectives as well as a wide range of nominalizations as potential arguments. We present a dedicated discourse annotation tool for Arabic and a large-scale annotation study. We show that both the human identification of discourse connectives and the determination of the discourse relations they convey is reliable. Our current annotated corpus encompasses a final 5651 annotated discourse connectives in 537 news texts. In future, we will release the annotated corpus to other researchers and use it for training and testing automated methods for discourse connective and relation recognition.
Anthology ID:
L10-1332
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/479_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Amal Al-Saif and Katja Markert. 2010. The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic (Al-Saif & Markert, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/479_Paper.pdf