ACL Logo ACL Anthology
A Digital Archive of Research Papers in Computational Linguistics

Google search the Anthology

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04)

L04-1001 : Marilyn Walker
Can We Talk? Prospects for Automatically Training Spoken Dialogue Systems

L04-1002 : Hans Uszkoreit
Strategic Directions of National and International Research Funding

L04-1003 : Gregor Thurmair
Multilingual Content Processing

L04-1004 : Brian MacWhinney
Collaborative Commentary: Opening Up Spoken Language Databases

L04-1005 : Nick Campbell
Getting to the Heart of the Matter; Speech is More than Just the Expression of Text or Language

L04-1006 : Bente Maegaard
Industrial Needs for Language Resources

L04-1007 : Junichi Tsujii
Thesaurus or Logical Ontology, Which do we Need for Mining Text?

L04-1008 : Kamlesh Dutta; Saroj Kaushik; Nupur Prakash
Information Extraction from Hindi Texts

L04-1009 : Cornelis H.A. Koster; Stefan Gradmann
The Language Belongs to the People!

L04-1010 : Paul Schmidt; Sandrine Garnier; Mike Sharwood; Toni Badia; Lourdes Díaz; Martí Quixal; Ana Ruggia; Antonio S. Valderrabanos; Alberto J. Cruz; Enrique Torrejon; Celia Rico; Jorge Jimenez
ALLES: Integrating NLP in ICALL Applications

L04-1011 : George Doddington; Alexis Mitchell; Mark Przybocki; Lance Ramshaw; Stephanie Strassel; Ralph Weischedel
The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation

L04-1012 : Lee Schwartz; Takako Aikawa
Multilingual Corpus-based Approach to the Resolution of English –ing

L04-1013 : Diana Santos; Anabela Barreiro
On the Problems of Creating a Golden Standard of Inflected Forms in Portuguese

L04-1014 : Sebastian Möller; Jan Krebber; Alexander Raake; Paula Smeele; Martin Rajman; Mirek Melichar; Vincenzo Pallotta; Gianna Tsakou; Basilis Kladis; Anestis Vovos; Jettie Hoonhout; Dietmar Schuchardt; Nikos Fakotakis; Todor Ganchev; Ilyas Potamitis
INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control

L04-1015 : Ielka van der Sluis; Emiel Krahmer
Evaluating Multimodal NLG Using Production Experiments

L04-1016 : Nuno Seco; Tony Veale; Jer Hayes
Concept Creation in Lexical Ontologies

L04-1017 : Tony Veale
Polysemy and Category Structure in WordNet: An Evidential Approach

L04-1018 : Susanne Salmon-Alt; Laurent Romary
Towards a Reference Annotation Framework

L04-1019 : Sebastian Möller
A New ITU-T Recommendation on the Evaluation of Telephone-Based Spoken Dialogue Systems

L04-1020 : Dekai Wu; Grace Ngai; Marine Carpuat
Raising the Bar: Stacked Conservative Error Correction Beyond Boosting

L04-1021 : Franca Debole; Fabrizio Sebastiani
An Analysis of the Relative Difficulty of Reuters-21578 Subsets

L04-1022 : Ajay S. Bhaskarabhatla; Sriganesh Madhvanath
Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts

L04-1023 : Hsin-Hsi Chen; Yi-Cheng Yu; Chih-Long Lin
Collocation Extraction Using Web Statistics

L04-1024 : Ajay S. Bhaskarabhatla; Sriganesh Madhvanath
An XML Representation for Annotated Handwriting Datasets for Online Handwriting Recognition

L04-1025 : Christina Alexandris; Stavroula-Evita Fotinea
Reusing Language Resources for Speech Applications involving Emotion

L04-1026 : Eva Navas; Amaia Castelruiz; Iker Luengo; Jon Sánchez; Inmaculada Hernáez
Designing and Recording an Audiovisual Database of Emotional Speech in Basque

L04-1027 : Gaël Dias; Sérgio Nunes
Evaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment

L04-1028 : Hiroshi Nakagawa; Hidetaka Masuda; Dai Sato
Terminal Device Oriented Comparable Corpora and its Alignment- Towards Extracting Paraphrasing Patterns

L04-1029 : Serge Sharoff
Towards Basic Categories for Describing Properties of Texts in a Corpus

L04-1030 : Michael Carl; Ecaterina Rascu; Johann Haller
Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts

L04-1031 : Luciana Bordoni
Investigation on Semantics to Improve the COVAX System

L04-1032 : Wim Peters
Incremental Knowledge Acquisition from WordNet and EuroWordNet

L04-1033 : Vivi Năstase; Rada Mihalcea
Finding Semantic Associations on Express Lane

L04-1034 : Mickel Grönroos; Manne Miettinen
Infrastructure for Collaborative Annotation of Speech

L04-1035 : Diana Maynard; Kalina Bontcheva; Hamish Cunningham
Automatic Language-Independent Induction of Gazetteer Lists

L04-1036 : Nikos Fakotakis
Corpus Design, Recording and Phonetic Analysis of Greek Emotional Database

L04-1037 : Yorick Wilks; Nick Webb; Andrea Setzer; Mark Hepple; Roberta Catizone
Human Dialogue Modelling Using Annotated Corpora

L04-1038 : Stefan Schaden
CrossTowns: Automatically Generated Phonetic Lexicons of Cross-lingual Pronunciation Variants of European City Names

L04-1039 : Hsin-Hsi Chen; Yi-Lin Chu
Pattern Discovery in Named Organization Corpus

L04-1040 : Masumi Narita; Chieko Sato; Masatoshi Sugiura
Connector Usage in the English Essay Writing of Japanese EFL Learners

L04-1041 : Patrick Drouin
Detection of Domain Specific Terminology Using Corpora Comparison

L04-1042 : Wolfgang Minker
Comparative Evaluation of a Stochastic Parser on Semantic and Syntactic-semantic Labels

L04-1043 : Chu-Ren Huang; Ru-Yng Chang; Hshiang-Pin Lee
Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO

L04-1044 : Stephan Bopp; Sandro Pedrazzini; Elisabeth Maier
How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words

L04-1045 : Darinka Verdonik; Matej Rojc; Zdravko Kačič
Creating Slovenian Language Resources for Development of Speech-to-speech Translation Components

L04-1046 : Magnus Sahlgren
Automatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data

L04-1047 : Bojan Kotnik; Zdravko Kačič; Bogomir Horvat
The Development and Integration of the LDA-Toolkit Into COST249 SpeechDat(II) SIG Reference Recognizer

L04-1048 : Özlem Öztürk; Özgul Salor; Tolga Çiloğlu; Mubeccel Demirekler
Duration Modeling For Turkish Text-to-Speech Synthesis System

L04-1049 : Philipp Cimiano; Andreas Hotho; Steffen Staab
Clustering Concept Hierarchies from Text

L04-1050 : Alvin F. Martin; John S. Garofolo; Jonathan C. Fiscus; Audrey N. Le; David S. Pallett; Mark A. Przybocki; Gregory A. Sanders
NIST Language Technology Evaluation Cookbook

L04-1051 : Satoshi Sekine; Chikashi Nobata
Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy

L04-1052 : Beom-mo Kang; Hunggyu Kim
Sejong Korean Corpora in the Making

L04-1053 : Yong-Ju Lee; Bong-Wan Kim; Young-Il Kim; Dae-Lim Choi; Kwang-Hyun Lee; Yongnam Um
Creation and Assessment of Korean Speech and Noise DB in Car Environment

L04-1054 : Alessandro Cucchiarelli; Roberto Navigli; Francesca Neri; Paola Velardi
Automatic Generation of Glosses in the OntoLearn System

L04-1055 : An Vandecatseye; Jean-Pierre Martens; Joao Neto; Hugo Meinedo; Carmen Garcia-Mateo; Javier Dieguez; France Mihelic; Janez Zibert; Jan Nouza; Petr David; Matus Pleva; Anton Cizmar; Harris Papageorgiou; Christina Alexandris
The COST278 Pan-European Broadcast News Database

L04-1056 : Daan Wissing; Jean-Pierre Martens; Ulrike Janke; Wim Goedertier
A Spoken Afrikaans Language Resource Designed for Research on Pronunciation Variations

L04-1057 : Tania Ellbogen; Florian Schiel; Alexander Steffen
The BITS Speech Synthesis Corpus for German

L04-1058 : Florian Schiel
MAUS Goes Iterative

L04-1059 : Mark Stevenson; Paul Clough
EuroWordNet as a Resource for Cross-language Information Retrieval

L04-1060 : Jonas Sjöbergh; Viggo Kann
Finding the Correct Interpretation of Swedish Compounds, a Statistical Approach

L04-1061 : Maya Ando; Satoshi Sekine; Shun Ishizaki
Automatic Extraction of Hyponyms from Japanese Newspapers. Using Lexico-syntactic Patterns

L04-1062 : Karin Kipper; Benjamin Snyder; Martha Palmer
Extending a Verb-lexicon Using a Semantically Annotated Corpus

L04-1063 : J.C.T. Beeken; P.H.J. van der Kamp
The Centre for Dutch Language and Speech Technology (TST Centre)

L04-1064 : Sue Ellen Wright
A Global Data Category Registry for Interoperable Language Resources

L04-1065 : J. G. Kruyt
The Integrated Language Database of 8th - 21st-Century Dutch

L04-1066 : Hans Dybkjær; Laila Dybkjær
From Acts and Topics to Transactions and Dialogue Smoothness

L04-1067 : Hideki Kashioka
Grouping Synonymous Sentences from a Parallel Corpus

L04-1068 : Khurshid Ahmad; Maria Teresa Musacchio
Discovery of (New) Knowledge and the Analysis of Text Corpora

L04-1069 : Harald Höge; Josef G. Bauer; Christian Geißler; Panji Setiawan; Kai Steinert
Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework

L04-1070 : Janez Žibert; France Mihelič
Development of Slovenian Broadcast News Speech Database

L04-1071 : Eckhard Bick
A Named Entity Recognizer for Danish

L04-1072 : M. Teresa Cabré; Carme Bach; Rosa Estopà; Judit Feliu; Gemma Martínez; Jorge Vivaldi
The GENOMA-KB Project: Towards the Integration of Concepts, Terms, Textual Corpora and Entities

L04-1073 : Elisabete Ranchhod; Paula Carvalho; Cristina Mota; Anabela Barreiro 
Portuguese Large-scale Language Resources for NLP Applications

L04-1074 : Umut Özge; Bilge Say
Development of a Corpus Workbench for the METU Turkish Corpus

L04-1075 : Raúl Araya; Jordi Vivaldi
Mercedes, a Term-in-Context Highlighter

L04-1076 : Henrik Selsøe Sørensen
The Bilingual Web Dictionary on Demand

L04-1077 : Tomaž Erjavec; Kristina Hmeljak Sangawa; Irena Srdanović; Anton ml. Vahčič
Making an XML-based Japanese-Slovene Learners' Dictionary

L04-1078 : Tomaž Erjavec
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora

L04-1079 : Lorena Seijo Pereiro; Ana Martínez Ínsua; Francisco Méndez Pazó; Francisco Campillo Díaz; Eduardo Rodríguez Banga
A Galician Textual Corpus for Morphosyntactic Tagging with Application to Text-to-Speech Synthesis

L04-1080 : Salvador España; María José Castro; José Luis Hidalgo
The SPARTACUS-Database: a Spanish Sentence Database for Offline Handwriting Recognition

L04-1081 : Sofia Stamou; Goran Nenadic; Dimitris Christodoulakis
Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing

L04-1082 : Mitsuo Shimohata; Eiichiro Sumita; Yuji Matsumoto
Building a Paraphrase Corpus for Speech Translation

L04-1083 : Yasuhiro Akiba; Eiichiro Sumita; Hiromi Nakaiwa; Seiichi Yamamoto; Hiroshi G. Okuno
Incremental Methods to Select Test Sentences for Evaluating Translation Ability

L04-1084 : Jan Odijk
Reusable Lexical Representations for Idioms

L04-1085 : Daniel Tihelka; Jindřich Matoušek
The Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems

L04-1086 : Janez Stergar; Caglayan Erdem; Bogomir Horvat; Zdravko Kačič
A Data-driven Adaptation of Prosody in a Multilingual TTS

L04-1087 : M. Taulé; M. Civit; N. Artigas; M. García; L. Màrquez; M.A. Martí; B. Navarro
MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

L04-1088 : Johnny Bigert
Probabilistic Detection of Context-Sensitive Spelling Errors

L04-1089 : Andrej Žgank; Tomaž Rotovnik; Mirjam Sepesy Maučec; Darinka Verdonik; Janez Kitak; Damjan Vlaj; Vladimir Hozjan; Zdravko Kačič; Bogomir Horvat
Acquisition and Annotation of Slovenian Broadcast News Database

L04-1090 : Andrej Žgank; Zdravko Kačič; Frank Diehl; Klara Vicsi; Gyorgy Szaszak; Jozef Juhar; Slavomir Lihan
The COST 278 MASPER Initiative - Crosslingual Speech Recognition with Large Telephone Databases

L04-1091 : Reinhard Rapp
Utilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation

L04-1092 : Reinhard Rapp
A Freely Available Automatically Generated Thesaurus of Related Words

L04-1093 : Vincent Vandeghinste; Erik Tjong Kim Sang
Using a Parallel Transcript/Subtitle Corpus for Sentence Compression

L04-1094 : Sofia Stamou; Dimitris Christodoulakis
Handling Subtle Sense Distinctions Through Wordnet Semantic Types

L04-1095 : Athanasios Karasimos; Amy Isard
Multi-lingual Evaluation of a Natural Language Generation System

L04-1096 : Heike Telljohann; Erhard Hinrichs; Sandra Kübler
The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone

L04-1097 : John S. Garofolo; Christophe D. Laprun; Martial Michel; Vincent M. Stanford; Elham Tabassi
The NIST Meeting Room Pilot Corpus

L04-1098 : Dafydd Gibbon; Catherine Bow; Steven Bird; Baden Hughes
Securing Interpretability: The Case of Ega Language Documentation

L04-1099 : Toshiyuki Takezawa; Genichiro Kikui
A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation

L04-1100 : Núria Bel; Cornelis H.A. Koster; Marta Villegas
Cost-effective Cross-lingual Document Classification

L04-1101 : Katrin Erk; Sebastian Padó
A Powerful and Versatile XML Format for Representing Role-semantic Annotation

L04-1102 : Stefan Baumann; Caren Brinckmann; Silvia Hansen-Schirra; Geert-Jan Kruijff; Ivana Kruijff-Korbayová; Stella Neumann; Erich Steiner; Elke Teich; Hans Uszkoreit
The MULI Project: Annotation and Analysis of Information Structure in German and English

L04-1103 : P. H. J. van der Kamp; J. G. Kruyt
Putting the Dutch PAROLE Corpus to Work

L04-1104 : Julie Carson-Berndsen; Robert Kelly
Acquiring Reusable Multilingual Phonotactic Resources

L04-1105 : Moritz Neugebauer; Stephen Wilson
Phonological Treebanks. Issues in Generation and Application

L04-1106 : Pedro Concejero Cerezo; Juan José Rodríguez Soler; Daniel Tapias Merino; Alberto J. Sánchez García
Methodology for Rapid Prototyping and Testing of ASR Based User Interfaces

L04-1107 : Lars Degerstedt; Arne Jönsson
Open Resources for Language Technology

L04-1108 : Marie-Laure Reinberger; Walter Daelemans
Unsupervised Text Mining for Ontology Extraction: An Evaluation of Statistical Measures

L04-1109 : Daniel Aioanei; Julie Carson-Berndsen; Anja Geumann; Robert Kelly; Moritz Neugebauer; Stephen Wilson
A Multilingual Phonological Resource Toolkit for Ubiquitous Speech Technology

L04-1110 : Oscar Corcho; Raúl García-Castro; Asunción Gómez-Pérez
Benchmarking Ontology Tools. A Case Study for the WebODE Platform.

L04-1111 : Bayan Abu Shawar; Eric Atwell
A Chatbot as a Novel Corpus Visualization Tool

L04-1112 : Florentina Vasilescu; Philippe Langlais; Guy Lapalme
Evaluating Variants of the Lesk Approach for Disambiguating Words

L04-1113 : Sergei Nirenburg; Marjorie McShane; Stephen Beale
The Rationale for Building an Ontology Expressly for NLP

L04-1114 : Marjorie McShane; Stephen Beale; Sergei Nirenburg
Some Meaning Procedures of Ontological Semantics

L04-1115 : Eric K. Ringger; Robert C. Moore; Eugene Charniak; Lucy Vanderwende; Hisami Suzuki
Using the Penn Treebank to Evaluate Non-Treebank Parsers

L04-1116 : Hidetsugu Nanba; Manabu Okumura
Comparison of Some Automatic and Manual Methods for Summary Evaluation Based on the Text Summarization Challenge 2

L04-1117 : Anthony McEnery; Zhonghua Xiao
The Lancaster Corpus of Mandarin Chinese: A Corpus for Monolingual and Contrastive Language Study

L04-1118 : H. Folch; B. Habert; M. Jardino; N. Pernelle; M.C. Rousset; A. Termier
Highlighting Latent Structure in Documents

L04-1119 : Dan Tufis; Radu Ion; Nancy Ide
Word Sense Disambiguation as a Wordnets' Validation Method in Balkanet

L04-1120 : Dan Tufis
Term Translations in Parallel Corpora: Discovery and Consistency Check

L04-1121 : Luís Sarmento; Belinda Maia; Diana Santos
The Corpógrafo – a Web-based Environment for Corpora Research

L04-1122 : Daniel Ferrés; Marc Massot; Muntsa Padró; Horacio Rodríguez; Jordi Turmo
Automatic Classification of Geographic Named Entities

L04-1123 : Olivia Sanchez-Graillet; Massimo Poesio
Acquiring Bayesian Networks from Text

L04-1124 : Thanh Bon Nguyen; Thi Minh Huyen Nguyen; Laurent Romary; Xuan Luong Vu
Developping Tools and Building Linguistic Resources for Vietnamese Morpho-syntactic Processing

L04-1125 : Christoph Draxler; Klaus Jänsch
SpeechRecorder - a Universal Platform Independent Multi-Channel Audio Recording Software

L04-1126 : Yasmina Quatrain; Sylvaine Nugier; Anne Peradotto
An Evaluation Protocol for Text Mining Tools : ALCESTE, SAS Text Miner, SPAD-CRM and Temis Text Mining Solutions Testing

L04-1127 : Alessandro Panunzi; Eugenio Picchi; Massimo Moneglia
Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian

L04-1128 : Marco Baroni; Silvia Bernardini; Federica Comastri; Lorenzo Piccioni; Alessandra Volpi; Guy Aston; Marco Mazzoleni
Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-compliant Corpus of Newspaper Italian

L04-1129 : Nancy Ide; David Woolner
Exploiting Semantic Web Technologies for Intelligent Access to Historical Documents

L04-1130 : Marco Baroni; Sabrina Bisi
Using Cooccurrence Statistics and the Web to Discover Synonyms in a Technical Language

L04-1131 : Hiroyuki Shinnou; Minoru Sasaki
Semi-supervised Learning by Fuzzy Clustering and Ensemble Learning

L04-1132 : Nick Campbell
Speech & Expression; the Value of a Longitudinal Corpus

L04-1133 : Salma Jamoussi; Kamel Smaïli; Dominique Fohr; Jean-Paul Haton
A Complete Understanding Speech System Based on Semantic Concepts

L04-1134 : Kiril Simov; Alexander Simov; Hristo Ganev; Krasimira Ivanova; Ilko Grigorov
The CLaRK System: XML-based Corpora Development System for Rapid Prototyping

L04-1135 : Toni Badia; Àngel Gil; Martí Quixal; Oriol Valentín
NLP-enhanced Error Checking for Catalan Unrestricted Text

L04-1136 : Kalina Bontcheva
Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies

L04-1137 : Agnes Lisowska; Andrei Popescu-Belis; Susan Armstrong
User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System

L04-1138 : Borislav Popov; Angel Kirilov; Diana Maynard; Dimitar Manov
Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian

L04-1139 : Andrei Popescu-Belis
Abstracting a Dialog Act Tagset for Meeting Processing

L04-1140 : Andrei Popescu-Belis; Loïs Rigouste; Susanne Salmon-Alt; Laurent Romary
Online Evaluation of Coreference Resolution

L04-1141 : Xavier Carreras; Isaac Chao; Lluís Padró; Muntsa Padró
FreeLing: An Open-Source Suite of Language Analyzers

L04-1142 : Hisami Suzuki
Phrase-Based Dependency Evaluation of a Japanese Parser

L04-1143 : Baden Hughes; Catherine Bow; Steven Bird
Functional Requirements for an Interlinear Text Editor

L04-1144 : Baden Hughes; David Penton; Steven Bird; Catherine Bow; Gillian Wigglesworth; Patrick McConvell; Jane Simpson
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

L04-1145 : Adam Przepiórkowski; Zygmunt Krynicki; Łukasz Dębowski; Marcin Woliński; Daniel Janus; Piotr Bański
A Search Tool for Corpora with Positional Tagsets and Ambiguities

L04-1146 : Peter A. Heeman
The American English SALA-II Data Collection

L04-1147 : Andrew Finch; Yasuhiro Akiba; Eiichiro Sumita
How Does Automatic Machine Translation Evaluation Correlate with Human Scoring as the Number of Reference Translations Increases?

L04-1148 : Slaven Bilac; Timothy Baldwin; Hozumi Tanaka
Evaluating the FOKS Error Model

L04-1149 : Guillaume Gibert; Gérard Bailly; Frédéric Eliséi; Denis Beautemps; Rémi Brun
Evaluation of a Speech Cuer: From Motion Capture to a Concatenative Text-to-cued Speech System

L04-1150 : Nikolaos Nanas; Victoria Uren; Anne de Roeck; John Domingue
Beyond TREC's Filtering Track

L04-1151 : Sanni Nimb
A Corpus-based Syntactic Lexicon for Adverbs

L04-1152 : Carol Peters; Martin Braschler; Khalid Choukri; Julio Gonzalo; Michael Kluck
The Future of Evaluation for Cross-Language Information Retrieval Systems

L04-1153 : Henk van den Heuvel; Phil Hall; Harald Höge; Asunción Moreno; Antonio Rincon; Francesco Senia
SALA II Across the Finish Line: A Large Collection of Mobile Telephone Speech Databases from North and Latin America completed

L04-1154 : Xavier Gómez-Guinovart; Elena Sacau Fontenla
Parallel Corpora for the Galician Language: Building and Processing of the CLUVI (Linguistic Corpus of the University of Vigo)

L04-1155 : Junko Hosaka; Igor V. Kurochkin; Akihiko Konagaya
PBIE: A Data Preparation Toolkit Toward Developing a Parsing-Based Information Extraction System

L04-1156 : Andreas Wagner; Bettina Zeisler
A Syntactically Annotated Corpus of Tibetan

L04-1157 : Montserrat Marimon; Núria Bel
Lexical Entry Templates for Robust Deep Parsing

L04-1158 : Dan Tufis; Liviu Dragomirescu
Tiered Tagging Revisited

L04-1159 : Dan Tufis; Eduard Barbu
A Methodology and Associated Tools for Building Interlingual Wordnets

L04-1160 : Doaa Samy; Antonio Moreno-Sandoval; José M. Guirao
Construction of a Bilingual Arabic-Spanish Lexicon of Verbs Based on a Parallel Corpus

L04-1161 : I. Alegria; A. Gurrutxaga; P. Lizaso; X. Saralegi; S. Ugartetxea; R. Urizar
A XML-Based Term Extraction Tool for Basque

L04-1162 : Manolis Maragoudakis; Nikos Fakotakis; George Kokkinakis
A Bayesian Model for Shallow Syntactic Parsing of Natural Language Texts

L04-1163 : Florbela Barreto; Raquel Amaro
Multifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications

L04-1164 : Jacques Duchateau; Tim Ceyssens; Hugo Van hamme
Use and Evaluation of Prosodic Annotations in Dutch

L04-1165 : Stephan Busemann; Hans-Ulrich Krieger
Resources and Techniques for Multilingual Information Extraction

L04-1166 : Lei Chen; Yang Liu; Mary Harper; Eduardo Maia; Susan McRoy
Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus

L04-1167 : C. Barras; G. Adda; M. Adda-Decker; B. Habert; P. Boula de Mareüil; P. Paroubek
Automatic Audio and Manual Transcripts Alignment, Time-code Transfer and Selection of Exact Transcripts

L04-1168 : V. Guijarrubia; I. Torres; L.J. Rodríguez
Evaluation of a Spoken Phonetic Database in Basque Language

L04-1169 : Yves Lepage; Guilhem Peralta
Using Paradigm Tables to Generate New Utterances Similar to those Existing in Linguistic Resources

L04-1170 : Mohamed Afify; Ossama Emam
Collection and Evaluation of Broadcast News Data for Arabic

L04-1171 : Kiril Simov; Petya Osenova; Sia Kolkovska; Elisaveta Balabanova; Dimitar Doikoff
A Language Resources Infrastructure for Bulgarian

L04-1172 : A. Batliner; C. Hacker; S. Steidl; E. Nöth; S. D'Arcy; M. Russell; M. Wong
"You Stupid Tin Box" - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus

L04-1173 : James Dowdall; Will Lowe; Jeremy Ellman; Fabio Rinaldi; Michael Hess
The Role of MultiWord Terminology in Knowledge Management

L04-1174 : Jörg Tiedemann; Lars Nygaard
The OPUS Corpus - Parallel and Free:

L04-1175 : Javier Farreres; Horacio Rodríguez
Selecting the Correct English Synset for a Spanish Sense

L04-1176 : Asunción Moreno; Khalid Choukri; Phil Hall; Henk van den Heuvel; Eric Sanders; Francesco Senia; Herbert Tropf
Collection of SLR in the Asian-Pacific Area

L04-1177 : Jaroslava Hlaváčová; Jana Klímová
Derivational Relations in Flectional Languages - Czech Case

L04-1178 : David Dalby; Lee Gillam; Christopher Cox; Debbie Garside
Standards for Language Codes: developing ISO 639

L04-1179 : Henk van den Heuvel; Dorota Iskra; Eric Sanders; Folkert de Vriend
SLR Validation: Current Trends and Developments

L04-1180 : Horacio Saggion
Identifying Definitions in Text Collections for Question Answering

L04-1181 : Laura Alonso; Irene Castellón; Jordi Escribano; Xavier Messeguer; Lluís Padró
Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision

L04-1182 : Ben Hutchinson
Mining the Web for Discourse Markers

L04-1183 : Magnus Merkel; Andreas Lange
A Pattern Extraction Workbench Combining Multiple Linguistic Levels

L04-1184 : Anke Holler; Jan Frederik Maas; Angelika Storrer
Exploiting Coreference Annotations for Text-to-Hypertext Conversion

L04-1185 : Laura Hasler
"Why do you Ignore me?" - Proof that not all Direct Speech is Bad

L04-1186 : Costanza Navarretta; Bolette Sandford Pedersen; Dorte Haltrup Hansen
"Human Language Technology Elements in a Knowledge Organisation System - The VID Project"

L04-1187 : Kedar Bellare; Anish Das Sarma; Atish Das Sarma; Navneet Loiwal; Vaibhav Mehta; Ganesh Ramakrishnan; Pushpak Bhattacharyya
Generic Text Summarization Using WordNet

L04-1188 : Natalia V. Loukachevitch; Boris V. Dobrov
Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing

L04-1189 : Natalia V. Loukachevitch; Boris V. Dobrov
Development of Ontologies with Minimal Set of Conceptual Relations

L04-1190 : Maria Fernanda Bacelar do Nascimento; Amália Mendes; Luísa Pereira
Providing On-line Access to Portuguese Language Resources: Corpora and Lexicons

L04-1191 : Bruno Cartoni; Pierrette Bouillon; Yalina Alphonse; Sabine Lehmann
Automatisation of the Activity of Term Collection in Different Languages

L04-1192 : Jorge Vivaldi; Horacio Rodríguez
Automatically Selecting Domain Markers for Terminology Extraction

L04-1193 : Anne Vilnat; Patrick Paroubek; Laura Monceaux; Isabelle Robba; Véronique Gendner; Gabriel Illouz; Michèle Jardino
The Ongoing Evaluation Campaign of Syntactic Parsing of French: EASY

L04-1194 : Kateřina Veselá; Jiří Havelka; Eva Hajičová
Annotators’ Agreement: The Case of Topic-Focus Articulation

L04-1195 : Scott S. L. Piao; Paul Rayson; Dawn Archer; Tony McEnery
Evaluating Lexical Resources for a Semantic Tagger

L04-1196 : Frédéric Landragin; Alexandre Denis; Annalisa Ricci; Laurent Romary
Multimodal Meaning Representation for Generic Dialogue Systems Architectures

L04-1197 : Anna Braasch; Sussi Olsen
STO: A Danish Lexicon Resource - Ready for Applications

L04-1198 : Kalliopi Zervanou; John McNaught
A Domain-Independent Approach to IE Rule Development

L04-1199 : Laurence Devillers; Hélène Maynard; Sophie Rosset; Patrick Paroubek; Kevin McTait; D. Mostefa; Khalid Choukri; Laurent Charnay; Caroline Bousquet; Nadine Vigouroux; Frédéric Béchet; Laurent Romary; Jean-Yves Antoine; J. Villaneau; Myriam Vergnes; J. Goulian
The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems

L04-1200 : Emanuela Cresti; Fernanda Bacelar do Nascimento; Antonio Moreno Sandoval; Jean Veronis; Philippe Martin; Khalid Choukri
The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

L04-1201 : Bodil Nistrup Madsen; Hanne Erdman Thomsen; Carl Vikner
Principles of a System for Terminological Concept Modelling

L04-1202 : Christophe Van Bael; Helmer Strik; Henk van den Heuvel
On the Usefulness of Large Spoken Language Corpora for Linguistic Research

L04-1203 : Dafydd Gibbon; Firmin Ahoua; Eddi Gbéry; Eno-Abasi Urua; Moses Ekpenyong
WALA: A Multilingual Resource Repository for West African Languages

L04-1204 : Sabine Bartsch
Annotating a Corpus for Building a Domain-specific Knowledge Base

L04-1205 : Constantin Orăsan; Viktor Pekar; Laura Hasler
A Comparison of Summarisation Methods Based on Term Specificity Estimation

L04-1206 : Massimo Moneglia
Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects

L04-1207 : L. Devillers; I. Vasilescu
Reliability of Lexical and Prosodic Cues in Two Real-life Spoken Dialog Corpora

L04-1208 : Carlo Strapparava; Alessandro Valitutti
WordNet Affect: an Affective Extension of WordNet

L04-1209 : Margarita Hospedales; Manel Rodríguez
The GENOMA-KB Platform: Queries over Integrated Linguistic Resources

L04-1210 : Morena Danieli; Juan María Garrido; Massimo Moneglia; Andrea Panizza; Silvia Quazza; Marc Swerts
Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech "C-ORAL-ROM"

L04-1211 : Maja Popović; Hermann Ney
Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

L04-1212 : Matthias Eck; Stephan Vogel; Alex Waibel
Language Model Adaptation for Statistical Machine Translation Based on Information Retrieval

L04-1213 : Olga Uryupina
Evaluating Name-Matching for Coreference Resolution

L04-1214 : Carlos Amaral; Dominique Laurent; André Martins; Afonso Mendes; Cláudia Pinto
Design and Implementation of a Semantic Search Engine for Portuguese

L04-1215 : Richard Campbell; Eric Ringger
Converting Treebank Annotations to Language Neutral Syntax

L04-1216 : Yalina Alphonse; Pierrette Bouillon
Methodology For Building Thematic Indexes In Medicine For French

L04-1217 : Carmen Garcia-Mateo; Javier Dieguez-Tirado; Laura Docio-Fernandez; Antonio Cardenal-Lopez
Transcrigal: A Bilingual System for Automatic Indexing of Broadcast News

L04-1218 : Arantza Díaz de Ilarraza; Aitzpea Garmendia ; Maite Oronoz
Abar-Hitz: An Annotation Tool for the Basque Dependency Treebank

L04-1219 : Valia Kordoni; Julia Neu
Creating Multi-purpose Linguistic Resources for Modern Greek: a Deep Modern Greek Grammar

L04-1220 : Leo Wanner; Margarita Alonso Ramos; Antonia Martí
Enriching the Spanish EuroWordNet by Collocations

L04-1221 : Charles J. Fillmore; Collin F. Baker; Hiroaki Sato
FrameNet as a "Net"

L04-1222 : Alfonso Ortega; Federico Sukno; Eduardo LLeida; Alejandro Frangi; Antonio Miguel; Luis Buera; Ernesto Zacur
AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition

L04-1223 : Robert S. Melvin; Win May; Shrikanth Narayanan; Panayiotis Georgiou; Shadi Ganjavi
Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients

L04-1224 : Brian MacWhinney; Steven Bird; Christopher Cieri; Craig Martell
Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction

L04-1225 : Robert S. Belvin; Susanne Riehemann; Kristin Precoda
A Fine-Grained Evaluation Method for Speech-to-Speech Machine Translation Using Concept Annotations

L04-1226 : David M. de Matos; Ricardo Ribeiro; Nuno J. Mamede
Rethinking Reusable Resources

L04-1227 : Adam Meyers; Ruth Reeves; Catherine Macleod; Rachel Szekely; Veronika Zielinska; Brian Young
The Cross-Breeding of Dictionaries

L04-1228 : Adam Meyers; Ruth Reeves; Catherine Macleod; Rachel Szekely; Veronika Zielinska; Brian Young; Ralph Grishman
Annotating Noun Argument Structure for NomBank

L04-1229 : Felix Sasaki; Andreas Witt; Dafydd Gibbon; Thorsten Trippel
Concept-based Queries: Combining and Reusing Linguistic Corpus Formats and Query Languages

L04-1230 : Felix Sasaki; Andreas Witt
Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources

L04-1231 : Hiroyuki Kaji; Osamu Imaichi
Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora

L04-1232 : Alexis Palmer; Jonas Kuhn; Carlota Smith
Utilization of Multiple Language Resources for Robust Grammar-Based Tense and Aspect Classification

L04-1233 : Yoshida Kyôsuke; Hashimoto Taiichi; Tokunaga Takenobu; Tanaka Hozumi
Retrieving Annotated Corpora for Corpus Annotation

L04-1234 : Tokunaga Takenobu; Koyama Tomofumi; Saito Suguru; Nakajima Masayuki
Classification of Japanese Spatial Nouns

L04-1235 : Antonio Sanfilippo; Gus Calapristi; Vernon Crow; Beth Hetzler; Alan Turner
Meaningful Clusters

L04-1236 : V. Finley Lacatusu; Steven J. Maiorano; Sanda M. Harabagiu
Multi-Document Summarization Using Multiple-Sequence Alignment

L04-1237 : Jahna Otterbacher; Dragomir Radev
RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation

L04-1238 : Sandra Aluisio; Gisele Montilha Pinheiro; Aline M. P. Manfrin; Leandro H. M. de Oliveira; Luiz C. Genoves Jr.; Stella E. O. Tagnin
The Lácio-Web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools

L04-1239 : Dragomir Radev; Jahna Otterbacher; Zhu Zhang
CST Bank: A Corpus for the Study of Cross-document Structural Relationships

L04-1240 : Jonas Kuhn; B'alam Mateo-Toledo
Applying Computational Linguistic Techniques in a Documentary Project for Q'anjob'al (Mayan, Guatemala)

L04-1241 : Minoru Sasaki; Hiroyuki Shinnou
Information Retrieval System Using Latent Contextual Relevance

L04-1242 : Daisuke Kawahara; Ryohei Sasano; Sadao Kurohashi
Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames

L04-1243 : Sun-Mee Bae; Key-Sun Choi
Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers

L04-1244 : Rita Nüebel
Evaluation and Adaptation of a Specialised Language Checking Tool for Non-specialised Machine Translation and Non-expert MT Users for Multi-lingual Telecooperation

L04-1245 : A. Lavelli; M. E. Califf; F. Ciravegna; D. Freitag; C. Giuliano; N. Kushmerick; L. Romano
A Critical Survey of the Methodology for IE Evaluation

L04-1246 : Jer Hayes; Tony Veale; Nuno Seco
Enriching WordNet Via Generative Metonymy and Creative Polysemy

L04-1247 : Tom Laureys; Guy De Pauw; Hugo Van hamme; Walter Daelemans; Dirk Van Compernolle
Evaluation and Adaptation of the Celex Dutch Morphological Database

L04-1248 : Li Tang; Donghong Ji; Lingpeng Yang; Yu Nie
A Model of Semantic Representations Analysis for Chinese Sentences

L04-1249 : Kyonghee Paik; Kiyonori Ohtake; Kazuhide Yamamoto
A Comparison of Two Variant Corpora: The Same Content with Different Source

L04-1250 : Christopher B. Quirk
Training a Sentence-Level Machine Translation Confidence Measure

L04-1251 : Sonja E. Bosch; Laurette Pretorius
Software Tools for Morphological Tagging of Zulu Corpora and Lexicon Development

L04-1252 : David Wible; Chin-Hwa Kuo; Nai-Lung Tsao
Improving Collocation Extraction for High Frequency Words

L04-1253 : Ai Kawazoe; Asanobu Kitamoto; Nigel Collier
Annotation of Coreference Relations Among Linguistic Expressions and Images in Biological Articles

L04-1254 : Michael Kluck
Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus

L04-1255 : Hélène Manuélian
Generating Coreferential Descriptions from a Structured Model of the Context

L04-1256 : Thatsanee Charoenporn; Virach Sornlertlamvanich; Sawit Kasuriya; Chatchawarn Hansakunbuntheung; Hitoshi Isahara
Open Collaborative Development of the Thai Language Resources for Natural Language Processing

L04-1257 : Lambros Kranias; Anna Samiotou
Automatic Translation Memory Fuzzy Match Post-Editing: A Step Beyond Traditional TM/MT Integration

L04-1258 : Ineke Schuurman; Wim Goedertier; Heleen Hoekstra; Nelleke Oostdijk; Richard Piepenbrock; Machteld Schouppe
Linguistic Annotation of the Spoken Dutch Corpus: If We Had To Do It All Over Again

L04-1259 : Attila Novák; Viktor Nagy; Csaba Oravecz
Combining Symbolic and Statistical Methods in Morphological Analysis and Unknown Word Guessing

L04-1260 : Balázs Kis; Begoña Villada; Gosse Bouma; Gábor Ugray; Tamás Bíró; Gábor Pohl; John Nerbonne
A New Approach to the Corpus-based Statistical Investigation of Hungarian Multi-word Lexemes

L04-1261 : M. Begoña Villada Moirón
Discarding Noise in an Automatically Acquired Lexicon of Support verb Constructions

L04-1262 : Francisco Nevado; Francisco Casacuberta; Josu Landa
Translation Memories Enrichment by Statistical Bilingual Segmentation

L04-1263 : J. C. Roux; P. H. Louw; T. R. Niesler
The African Speech Technology Project: An Assessment

L04-1264 : Kris Demuynck; Tom Laureys; Patrick Wambacq; Dirk Van Compernolle
Automatic Phonemic Labeling and Segmentation of Spoken Dutch

L04-1265 : Nelleke Oostdijk; Lou Boves
Using Large Multi-purpose Corpora for Specific Research Questions: Discourse Phenomena Related to Wh-questions in the Spoken Dutch Corpus

L04-1266 : Paola Mariani; Costanza Badii
Methods of Digital Access for Legal Language Documentation

L04-1267 : Peter Wittenburg; Heidi Johnson; Markus Buchhorn; Hennie Brugman; Daan Broeder
Architecture for Distributed Language Resource Management and Archiving

L04-1268 : Hanne Fersøe; Elviira Hartikainen; Henk van den Heuvel; Giulio Maltese; Asuncíon Moreno; Shaunie Shammass; Ute Ziegenhain
Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes

L04-1269 : Antoni Oliver; Marko Tadić
Enlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora

L04-1270 : Panagiotis Zervas; Manolis Maragoudakis; Nikos Fakotakis; George Kokkinakis
Learning to Predict Pitch Accents Using Bayesian Belief Networks for Greek Language

L04-1271 : Joaquim Moré; Salvador Climent; Antoni Oliver
A Grammar and Style Checker Based on Internet Searches

L04-1272 : Peter Wittenburg; Greg Gulrajani; Daan Broeder; Marcus Uneson
Cross-Disciplinary Integration of Metadata Descriptions

L04-1273 : Valeria Quochi
Representing Italian Complex Nominals: A Pilot Study

L04-1274 : Hayssam Traboulsi; David Cheng; Khurshid Ahmad
Text Corpora, Local Grammars and Prediction

L04-1275 : Helmut Schmid; Arne Fitschen; Ulrich Heid
SMOR: A German Computational Morphology Covering Derivation, Composition and Inflection

L04-1276 : Emi Izumi; Kiyotaka Uchimoto; Hitoshi Isahara
The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners' Errors

L04-1277 : Paul Gévaudan; Dirk Wiebel
Dynamic Lexicographic Data Modelling. A Diachronic Dictionary Development Report

L04-1278 : Laura Alonso; Maria Fuentes; Marc Massot; Horacio Rodríguez
Re-using High-quality Resources for Continued Evaluation of Automated Summarization Systems

L04-1279 : Marc Rössler
Corpus-based Learning of Lexical Resources for German Named Entity Recognition

L04-1280 : Hennie Brugman; Onno Crasborn; Albert Russel
Collaborative Annotation of Sign Language Data with Peer-to-Peer Technology

L04-1281 : Glòria Vàzquez; Ana Fernández Montraveta; Irene Castellón; Laura Alonso
Semantic Categorization of Spanish Se-constructions

L04-1282 : Angelo Dalli; Valentin Tablan; Kalina Bontcheva; Yorick Wilks; Daan Broeder; Hennie Brugman; Peter Wittenburg
Web Services Architecture for Language Resources

L04-1283 : Daan Broeder; Thierry Declerck; Laurent Romary; Markus Uneson; Sven Strömqvist; Peter Wittenburg
A Large Metadata Domain of Language Resources

L04-1284 : Tamás Gröbler; Gábor Hodász; Balázs Kis
MetaMorpho TM: A Rule-Based Translation Corpus

L04-1285 : Hennie Brugman; Albert Russel
Annotating Multi-media/Multi-modal Resources with ELAN

L04-1286 : Agnès Tutin; Meriam Haddara; Ruslan Mitkov; Constantin Orasan
Annotation of Anaphoric Expressions in an Aligned Bilingual Corpus

L04-1287 : Tylman Ule; Kiril Simov
Unexpected Productions May Well be Errors

L04-1288 : Avik Sarkar; Anne De Roeck
A Framework for Evaluating the Suitability of Non-English Corpora for Language Engineering

L04-1289 : Anna Samiotou; Lambros Kranias; Dimitrios Kokkinakis
Intelligent Building of Language Resources for HLT Applications

L04-1290 : Tomoyosi Akiba; Atsushi Fujii; Katunobu Itou
Collecting Spontaneously Spoken Queries for Information Retrieval

L04-1291 : Hristo Tanev; Milen Kouylekov; Matteo Negri; Bonaventura Coppola; Bernardo Magnini
Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

L04-1292 : Michael Daum; Kilian A. Foth; Wolfgang Menzel
Automatic Transformation of Phrase Treebanks to Dependency Trees

L04-1293 : Maria Luigia Ceccotti; Manuela Sassi
Computational Lexicography and Carlo Emilio Gadda, Principe dell'Analisi e Duca della Buona Cognizione

L04-1294 : Yoko Mizuta; Nigel Collier
An Annotation Scheme for a Rhetorical Analysis of Biology Articles

L04-1295 : Antoinette Renouf; Andrew Kehoe
Textual Distraction as a Basis for Evaluating Automatic Summarisers

L04-1296 : Milena Slavcheva
Verb Valency Descriptors for a Syntactic Treebank

L04-1297 : Walter Kasper; Jörg Steffen; Jakub Piskorski; Paul Buitelaar
Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project

L04-1298 : S.R. Deepa; Kalika Bali; A.G. Ramakrishnan; Partha Pratim Talukdar
Automatic Generation of Compound Word Lexicon for Hindi Speech Synthesis

L04-1299 : Saif Ahmad; Paulo C F de Oliveira; Khurshid Ahmad
Summarization of Multimodal Information

L04-1300 : Toomas Altosaar; Matti Karjalainen
Design of an Interactive Web-based User Interface for Speech Database Query Formation

L04-1301 : Syd Bauman; Alejandro Bia; Lou Burnard; Tomaž Erjavec; Christine Ruotolo; Susan Schreibman
Migrating Language Resources from SGML to XML: The Text Encoding Initiative Recommendations

L04-1302 : Niels Ole Bernsen; Laila Dybkjær; Svend Kiilerich
Evaluating Conversation with Hans Christian Andersen

L04-1303 : Catia Cucchiarini; Elisabeth D'Halleweyn
The New Dutch-Flemish HLT Programme: a Concerted Effort to Stimulate the HLT Sector

L04-1304 : Eiko Yamamoto; Kyoji Umemura
Related Word-pairs Extraction Without Dictionaries

L04-1305 : Rachel Aires; Aline Manfrin; Sandra Aluísio; Diana Santos
What is my Style? Using Stylistic Features of Portuguese Web Texts to Classify Web Pages According to Users' Needs

L04-1306 : Marco Baroni; Silvia Bernardini
BootCaT: Bootstrapping Corpora and Terms from the Web

L04-1307 : Jörg Steffen
N-Gram Language Modeling for Robust Multi-Lingual Document Classification

L04-1308 : Ana-Maria Barbu
A Word Alignment System Based on a Translation Equivalence Extractor

L04-1309 : Daan Broeder; Peter Wittenburg; Onno Crasborn
Using Profiles for IMDI Metadata Creation

L04-1310 : Karlheinz Mörth
Rethinking Readability of Digital Editions – The Case of the AAC's "Digital Brenner"

L04-1311 : Daniel Ferrés; Marc Massot; Muntsa Padró; Horacio Rodríguez; Jordi Turmo
Automatic Building Gazetteers of Co-referring Named Entities

L04-1312 : Nilda Ruimy; Pierrette Bouillon; Bruno Cartoni
Semi-Automatic Derivation of a French Lexicon from CLIPS

L04-1313 : Nancy Ide; Keith Suderman
The American National Corpus First Release

L04-1314 : Stefan Evert; Ulrich Heid; Kristina Spranger
Identifying Morphosyntactic Preferences in Collocations

L04-1315 : Laila Dybkjær; Niels Ole Bernse
Towards General-Purpose Annotation Tools – How Far Are We Today?

L04-1316 : Uwe D. Reichel; Karl Weilhammer
Automated Morphological Segmentation and Evaluation

L04-1317 : Nancy Ide; Laurent Romary
A Registry of Standard Data Categories for Linguistic Annotation

L04-1318 : Andrew Hippisley; Chara Karavasili
A Natural Language Approach to Information Management: Tracking Scientific Advances Through the Structure of Words

L04-1319 : Rita Marinelli; Adriana Roventini; Alessandro Enea
Building a Maritime Domain Lexicon: a Few Considerations on the Database Structure and the Semantic Coding

L04-1320 : Péter Halácsy; András Kornai; László Németh; András Rung; István Szakadát; Viktor Trón
Creating Open Language Resources for Hungarian

L04-1321 : Atsushi Fujii; Makoto Iwayama; Noriko Kando
Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop

L04-1322 : Yuka Tateisi; Jun-ichi Tsujii
Part-of-Speech Annotation of Biology Research Abstracts

L04-1323 : Božo Bekavac; Petya Osenova; Kiril Simov; Marko Tadić
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

L04-1324 : Lina Henriksen; Bart Jongejan; Bente Maegaard
Corporate Voice, Tone of Voice and Controlled Language Techniques

L04-1325 : Nikos Fakotakis
Cypriot Speech Database: Data Collection and Greek to Cypriot Dialect Adaptation

L04-1326 : Borja Navarro; Manuel Palomar; Patricio Martínez-Barco
Automatic Extraction of Syntactic Semantic Patterns for Multilingual Resources

L04-1327 : Dominique Dutoit; Pierre Nugues; Patrick de Torcy
The Integral Dictionary: An Ontological Resource for the Semantic Web: Integration of EuroWordNet, Balkanet, TID, and SUMO

L04-1328 : Viktor Pekar; Richard Evans; Ruslan Mitkov
Categorizing Web Pages as a Preprocessing Step for Information Extraction

L04-1329 : Christian Weiss
A Framework for Data-driven Video-realistic Audio-visual Speech-synthesis

L04-1330 : Manuela Kunze; Dietmar Rösner
Corpus Based Enrichment of GermaNet Verb Frames

L04-1331 : Thierry Poibeau; Bénédicte Goujon
Semi-automatic Acquisition of Command Grammar

L04-1332 : Thierry Declerck; Paul Buitelaar; Nicoletta Calzolari; Alessandro Lenci
Towards a Language Infrastructure for the Semantic Web

L04-1333 : Alvin Martin; David Miller; Mark Przybocki; Joseph Campbell; Hirotaka Nakasone
Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004

L04-1334 : Stephan Vogel; Christian Monson
Augmenting Manual Dictionaries for Statistical Machine Translation Systems

L04-1335 : Christian Biemann; Uwe Quasthoff; Christian Wolff
Linguistic Corpus Search

L04-1336 : Nicoletta Calzolari; Khalid Choukri; Maria Gavrilidou; Bente Maegaard; Paola Baroni; Hanne Fersøe; Alessandro Lenci; Valérie Mapelli; Monica Monachini; Stelios Piperidis
ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs

L04-1337 : Evie Coussé; Steven Gillis; Hanne Kloots; Marc Swerts
The Influence of the Labeller’s Regional Background on Phonetic Transcriptions: Implications for the Evaluation of Spoken Language Resources

L04-1338 : Paul Buitelaar; Diana Steffen; Martin Volk; Dominic Widdows; Bogdan Sacaleanu; Špela Vintar; Stanley Peters; Hans Uszkoreit
Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain

L04-1339 : Chris Biemann; Stefan Bordag; Uwe Quasthoff
Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences

L04-1340 : Paul Buitelaar; Daniel Olejnik; Mihaela Hutanu; Alexander Schutz; Thierry Declerck; Michael Sintek
Towards Ontology Engineering Based on Linguistic Analysis

L04-1341 : Dorota Iskra; Rainer Siemund; Jamal Borno; Asuncion Moreno; Ossama Emam; Khalid Choukri; Oren Gedge; Herbert Tropf; Albino Nogueiras; Imed Zitouni; Anastasios Tsopanoglou; Nikos Fakotakis
OrienTel - Telephony Databases Across Northern Africa and the Middle East

L04-1342 : Hanne Fersøe; Monica Monachini
ELRA Validation Methodology and Standard Promotion for Linguistic Resources

L04-1343 : Hanno Biber; Evelyn Breiteneder
The AAC [Austrian Academy Corpus] – An Enterprise to Develop Large Electronic Text Corpora

L04-1344 : Diana Binnenpoorte; Catia Cucchiarini; Helmer Strik; Lou Boves
Improving Automatic Phonetic Transcription of Spontaneous Speech Through Variant-Based Pronunciation Variation Modelling

L04-1345 : Massimo Poesio; Mijail A. Kabadjov
A General-Purpose, Off-the-shelf Anaphora Resolution Module: Implementation and Preliminary Evaluation

L04-1346 : Donghong Ji; Li Tang; Lingpeng Yang
Building a Conceptual Graph Bank for Chinese Language

L04-1347 : Anne Abeillé; Nicolas Barrier
Enriching a French Treebank

L04-1348 : Béatrice Daille; Samuel Dufour-Kowalski; Emmanuel Morin
French-English Multi-word Term Alignment Based on Lexical Context Analysis

L04-1349 : Vincenzo Pallotta; Hatem Ghorbel; Patrick Ruch; Giovanni Coray
An Argumentative Annotation Schema for Meeting Discussions

L04-1350 : Jochen Trommer; Dalina Kallulli
A morphological Analyzer for Standard Albanian

L04-1351 : Abdelhadi Soudi; Andreas Eisele
Generating an Arabic Full-form Lexicon for Bidirectional Morphology Lookup

L04-1352 : Petr Pollák; Jan Černocký
Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

L04-1353 : Catarina Ribeiro; Ricardo Santos; João Correia; Rui Pedro Chaves; Palmira Marrafa
INQUER: A WordNet-based Question-Answering Application

L04-1354 : António Branco; João Silva
Evaluating Solutions for the Rapid Development of State-of-the-Art POS Taggers for Portuguese

L04-1355 : Stefan Klatt
A High Quality Partial Parser for Annotating German Text Corpora

L04-1356 : Manolis Maragoudakis; Nikos Fakotakis
Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval

L04-1357 : Lars Bo Larsen
Usability Evaluation of Spoken Dialogue Systems

L04-1358 : Iulia Nica; Mª Antònia Martí; Andrés Montoyo; Sonia Vázquez
Enriching EWN with Syntagmatic Information by Means of WSD

L04-1359 : Rita Marinelli
Proper Names and Polysemy: From a Lexicographic Experience

L04-1360 : Ulrich Heid; Bettina Säuberlich; Esther Debus-Gregor; Werner Scholze-Stubenrecht
Tools for Upgrading Printed Dictionaries by Means of Corpus-based Lexical Acquisition

L04-1361 : Jakub Piskorski
Extraction of Polish Named-Entities

L04-1362 : Juan Fernández; Mauro Castillo; German Rigau; Jordi Atserias; Jordi Turmo
Automatic Acquisition of Sense Examples Using ExRetriever

L04-1363 : Cvetana Krstev; Duško Vitas; Ranka Stankoviæ; Ivan Obradoviæ; Gordana Pavloviæ-Lažetiæ
Combining Heterogeneous Lexical Resources

L04-1364 : Viet-Bac Le; Do-Dat Tran; Eric Castelli; Laurent Besacier; Jean-François Serignat
Spoken and Written Language Resources for Vietnamese

L04-1365 : Andrei Popescu-Belis; Maria Georgescul; Alexander Clark; Susan Armstrong
Building and Using a Corpus of Shallow Dialogue Annotated Meetings

L04-1366 : Lorenzo Piccioni; Eros Zanchetta
XTERM: A Flexible Standard-Compliant XML-Based Termbase Management System

L04-1367 : Márton Miháltz
Word Sense Disambiguation Using Random Indexing

L04-1368 : Ulrich Heid; Holger Voormann; Jan-Torsten Milde; Ulrike Gut; Katrin Erk; Sebastian Padó
Querying Both Time-aligned and Hierarchical Corpora with NXT Search

L04-1369 : A. Chalamandaris; P. Tsiakoulis; S. Raptis; G. Giannopoulos; G. Carayannis
Bypassing Greeklish!

L04-1370 : Catarina Ribeiro; Ricardo Santos; Rui Pedro Chaves; Palmira Marrafa
Semi-Automatic UNL Dictionary Generation Using WordNet.PT

L04-1371 : Alexander Geyken
Bootstrapping a Database of German Multi-word Expressions

L04-1372 : Le An Ha
A Practical Comparison of Different Filters Used in Automatic Term Extraction

L04-1373 : Jesús Giménez; Lluís Màrquez
SVMTool: A general POS Tagger Generator Based on Support Vector Machines

L04-1374 : Stefanie Herrmann; Hartmut Keck; Stephan Kepser
A Multi-Modal Documentation System for Warao

L04-1375 : Ulrich Callmeier; Andreas Eisele; Ulrich Schäfer; Melanie Siegel
The DeepThought Core Architecture Framework

L04-1376 : Jordi Atserias; Salvador Climent; German Rigau
Towards the Meaning Top Ontology: Sources of Ontological Meaning

L04-1377 : Zygmunt Vetulani
An Environment for Dialogue Corpora Collection (ENDIACC)

L04-1378 : G. Bordel; A. Ezeiza; K. Lopez de Ipina; M. Méndez; M. Peñagarikano; T. Rico; C. Tovar; E. Zulueta
Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish

L04-1379 : António Teixeira; Liliana Ferreira; Lurdes Moutinho; Rosa Lídia Coimbra; Raquel Lisboa
An Acoustic Corpus Contemplating Regional Variation for Studies of European Portuguese Nasals

L04-1380 : Laurent Romary; Amalia Todirascu; David Langlois
Experiments on Building Language Resources for Multi-Modal Dialogue Systems

L04-1381 : David Day; Chad McHenry; Robyn Kozierok; Laurel Riek
Callisto: A Configurable Annotation Workbench

L04-1382 : Ray Clifford; Neil Granoien; Douglas Jones; Wade Shen; Clifford Weinstein
The Effect of Text Difficulty on Machine Translation Performance -- A Pilot Study with ILR-Rated Texts in Spanish, Farsi, Arabic, Russian and Korean

L04-1383 : Joachim Wermter; Udo Hahn
An Annotated German-Language Medical Text Corpus as Language Resource

L04-1384 : Diana Pérez; Enrique Alfonseca; Pilar Rodríguez
Application of the BLEU Method for Evaluating Free-text Answers in an E-learning Environment

L04-1385 : Kyoko Kanzaki; Qing Ma; Eiko Yamamoto; Masaki Murata; Hitoshi Isahara
Extraction of Hyperonymy of Adjectives from Large Corpora by Using the Neural Network Model

L04-1386 : Eleni Miltsakaki; Rashmi Prasad; Aravind Joshi; Bonnie Webber
The Penn Discourse Treebank

L04-1387 : Violeta Seretan; Luka Nerima; Eric Wehrli
Using the Web as a Corpus for the Syntactic-Based Collocation Identification

L04-1388 : Michael Schiehlen; Kristina Spranger
Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons

L04-1389 : Henk Harkema; Robert Gaizauskas; Mark Hepple; Neil Davis; Yikun Guo; Angus Roberts; Ian Roberts
A Large-Scale Resource for Storing and Recognizing Technical Terminology

L04-1390 : Holmer Hemsen
Evaluation of a Multimodal Dialogue System for Small-screen Devices

L04-1391 : Christian Biemann; Stefan Bordag; Uwe Quasthoff; Christian Wolff
Web Services for Language Resources and Language Technology Applications

L04-1392 : Elisabeth Pinto; Delphine Charlet; Hélène François; Djamel Mostefa; Olivier Boëffard; Dominique Fohr; Odile Mella; Frédéric Bimbot; Khalid Choukri; Yann Philip; Francis Charpentier
Development of New Telephone Speech Databases for French: the NEOLOGOS Project

L04-1393 : Karel Pala; Pavel Smrz
Top Ontology as a Tool for Semantic Role Tagging

L04-1394 : Argyrios Vasilakopoulos; Michele Bersani; William J. Black
A Suite of Tools for Marking Up Textual Data for Temporal Text Mining Scenarios

L04-1395 : Anne De Roeck; Avik Sarkar; Paul Garthwaite
Frequent Term Distribution Measures for Dataset Profiling

L04-1396 : Josef Psutka; Pavel Ircing; Jan Hajič; Vlasta Radová; Josef V. Psutka; William J. Byrne; Samuel Gustman
Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project

L04-1397 : Asunción Gómez-Pérez; M. Carmen Suárez-Figueroa
Ontology Evaluation Functionalities of RDF(S),DAML+OIL, and OWL Parsers and Ontology Platforms

L04-1398 : Anna Sinopalnikova; Pavel Smrz
Word Association Norms as a Unique Supplement of Traditional Language Resources

L04-1399 : Nadine Aldinger
Towards a Dynamic Lexicon: Predicting the Syntactic Argument Structure of Complex Verbs

L04-1400 : Robert Král
Semantic Annotating of Czech Corpus via WSD

L04-1401 : Jean Carletta; Shipra Dingare; Malvina Nissim; Tatiana Nikitina
Using the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: a Case Study

L04-1402 : Malvina Nissim; Shipra Dingare; Jean Carletta; Mark Steedman
An Annotation Scheme for Information Status in Dialogue

L04-1403 : Alex Trutnev; Antoine Ronzenknop; Martin Rajman
Speech Recognition Simulation and its Application for Wizard-of-Oz Experiments

L04-1404 : Murat Deviren; Khalid Daoudi; Kamel Smaïli
Language Modeling Using Dynamic Bayesian Networks

L04-1405 : Udo Hahn; Joachim Wermter
Pumping Documents Through a Domain and Genre Classification Pipeline

L04-1406 : Kiril Simov; Petya Osenova
A Hybrid Strategy For Regular Grammar Parsing

L04-1407 : Jordi Atserias; Bernardo Magnini; Octavian Popescu; Eneko Agirre; Aitziber Atutxa; German Rigau; John Carroll; Rob Koeling
Cross-Language Acquisition of Semantic Models for Verbal Predicates

L04-1408 : Andrea Sansò
MED-TYP: A Typological Database for Mediterranean Languages

L04-1409 : Kallirroi Georgila; Nikos Fakotakis; George Kokkinakis
A graphical Tool for Handling Rule Grammars in Java Speech Grammar Format

L04-1410 : Svetlana Sheremetyeva
A Flexible Language Acquisition Tool Kit for Natural Language Processing

L04-1411 : David Martínez; Eneko Agirre
The Effect of Bias on an Automatically-built Word Sense Corpus

L04-1412 : Victoria Arranz; Núria Castell; Josep Maria Crego; Jesús Giménez; Adrià de Gispert; Patrik Lambert
Bilingual Connections for Trilingual Corpora: An XML Approach

L04-1413 : Thorsten Trippel; Dafydd Gibbon; Alexandra Thies; Jan-Torsten Milde; Karin Looks; Benjamin Hell; Ulrike Gut
CoGesT: a Formal Transcription System for Conversational Gesture

L04-1414 : Anders Nøklestad
Memory-based Classification of Proper Names in Norwegian

L04-1415 : Alex Trutnev; Martin Rajman
Comparative Evaluations in the Domain of Automatic Speech Recognition

L04-1416 : Thorsten Trippel; Felix Sasaki; Dafydd Gibbon
Consistent Storage of Metadata in Inference Lexica: the MetaLex Approach

L04-1417 : Nuno Cavalheiro Marques; Sérgio Gonçalves
Applying a Part-of-Speech Tagger to Postal Address Detection on the Web

L04-1418 : Monica Monachini; Federico Calzolari; Michele Mammini; Sergio Rossi; Marisa Ulivieri
Unifying Lexicons in view of a Phonological and Morphological Lexical DB

L04-1419 : A. Braffort; A. Choisier; C. Collet; P. Dalle; F. Gianni; F. Lenseigne; J. Segouat
Toward an Annotation Software for Video of Sign Language, Including Image Processing Tools and Signing Space Modelling

L04-1420 : Fabio Tamburini
Building Distributed Language Resources By Grid Computing

L04-1421 : Bernd Bohnet; Halyna Seniv
Mapping Dependency Structures to Phrase Structures and the Automatic Acquisition of Mapping Rules

L04-1422 : Georgiana Puşcaşu
A Framework for Temporal Resolution

L04-1423 : Stephan Busemann
EGRAM – A Grammar Development Environment and its Usage for Language Generation

L04-1424 : Louise Guthrie; Roberto Basili; Fabio Zanzotto; Kalina Bontcheva; Hamish Cunningham; David Guthrie; Jia Cui; Marco Cammisa; Jerry Cheng-Chieh Liu; Cassia Farria Martin; Kristiyan Haralambiev; Martin Holub; Klaus Macherey; Fredrick Jelinek
Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text

L04-1425 : Eneko Agirre; Aitziber Atutxa; Koldo Gojenola; Kepa Sarasola
Exploring Portability of Syntactic Information from English to Basque

L04-1426 : Jordi Atserias; Luís Villarejo; German Rigau
Spanish WordNet 1.6: Porting the Spanish Wordnet Across Princeton Versions

L04-1427 : Magdalena Wolska; Bao Quoc Vo; Dimitra Tsovaltzi; Ivana Kruijff-Korbayová; Elena Karagjosova; Helmut Horacek; Armin Fiedler; Christoph Benzmüller
An Annotated Corpus of Tutorial Dialogs on Mathematical Theorem Proving

L04-1428 : Lonneke van der Plas; Vincenzo Pallotta; Martin Rajman; Hatem Ghorbel
Automatic Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet

L04-1429 : Anna Kupść; Teruko Mitamura; Benjamin Van Durme; Eric Nyberg
Pronominal Anaphora Resolution for Unrestricted Text

L04-1430 : G. Gravier; J-F. Bonastre; E. Geoffrois; S. Galliano; K. Mc Tait; K. Choukri
The ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News

L04-1431 : Manfred Klenner; Fabio Rinaldi; Michael Hess
Steps Towards Semantically Annotated Language Resources

L04-1432 : Nina Wacholder; Sharon Small; Bing Bai; Diane Kelly; Robert Rittman; Sean Ryan; Robert Salkin; Peng Song; Ying Sun; Liu Ting; Paul Kantor; Tomek Strzalkowski
Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System

L04-1433 : Karin Müller
Semi-Automatic Construction of a Question Treebank

L04-1434 : Bogdan Babych; Debbie Elliott; Anthony Hartley
Calibrating Resource-light Automatic MT Evaluation: a Cheap Approach to Ranking MT Systems by the Usability of Their Output

L04-1435 : Stelios Piperidis; Iason Demiros; Prokopis Prokopidis; Peter Vanroose; Anja Hoethker; Walter Daelemans; Elsa Sklavounou; Manos Konstantinou; Yannis Karavidas
Multimodal, Multilingual Resources in the Subtitling Process

L04-1436 : Kazuki Adachi; Tomoki Toda; Hiromichi Kawanami; Hiroshi Saruwatari; Kiyohiro Shikano
Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification

L04-1437 : Serge A. Yablonsky
Integration of Russian Language Resources

L04-1438 : Roberto Basili; Nicola Lorusso; Maria Teresa Pazienza; Fabio Massimo Zanzotto
A2Q: An Agent-based Architecure for Multilingual Q&A

L04-1439 : Guadalupe Aguado de Cea; Inmaculada Álvarez-de-Mon; Antonio Pareja-Lora
OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

L04-1440 : Kaarel Kaljurand; Fabio Rinaldi; James Dowdall; Michael Hess
Exploiting Language Resources for Semantic Web Annotations

L04-1441 : Kiyong Lee; Lou Burnard; Laurent Romary; Eric de la Clergerie; Thierry Declerck; Syd Bauman; Harry Bunt; Lionel Clément; Tomaž Erjavec; Azim Roussanaly; Claude Roux
Towards an International Standard on Feature Structure Representation

L04-1442 : Ariadna Font Llitjós; Jaime Carbonell
The Translation Correction Tool: English-Spanish User Studies

L04-1443 : Brian Mitchell; Robert Gaizauskas
A Labelled Corpus for Prepositional Phrase Attachment

L04-1444 : Gabriel Infante-Lopez; Maarten de Rijke
Comparing the Ambiguity Reduction Abilities of Probabilistic Context-Free Grammars

L04-1445 : Paul Morarescu; Sanda Harabagiu
NameNet: a Self-Improving Resource for Name Classification

L04-1446 : Katerina Pastra; Yorick Wilks
Image-Language Multimodal Corpora: Needs, Lacunae and an AI Synergy for Annotation

L04-1447 : Na-Rae Han; Martin Chodorow; Claudia Leacock
Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus

L04-1448 : Radek Sedláček
The Core of the Czech Derivational Dictionary

L04-1449 : Walter Daelemans; Anja Höthker; Erik Tjong Kim Sang
Automatic Sentence Simplification for Subtitling in Dutch and English

L04-1450 : Canasai Kruengkrai; Thatsanee Charoenporn; Virach Sornlertlamvanich; Hitoshi Isahara
Enriching a Thai Lexical Database with Selectional Preferences

L04-1451 : Jonathan G. Fiscus
Results of the 2003 Topic Detection and Tracking Evaluation

L04-1452 : Jennifer Foster
Parsing Ungrammatical Input: an Evaluation Procedure

L04-1453 : Melania Degeratu; Vasileios Hatzivassiloglou
An Automatic Method for Constructing Domain-Specific Ontology Resources

L04-1454 : Ann Copestake; Fabre Lambeau; Benjamin Waldron; Francis Bond; Dan Flickinger; Stephan Oepen
A Lexicon Module for a Grammar Development Environment

L04-1455 : Bogdan Babych; Anthony Hartley
Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality

L04-1456 : Roberto Bartolini; Alessandro Lenci; Simonetta Montemagni; Vito Pirrelli; Claudia Soria
Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques

L04-1457 : Lionel Clément; Benoît Sagot; Bernard Lang
Morphology Based Automatic Acquisition of Large-coverage Lexica

L04-1458 : Kiril Ribarov
Towards Intelligent Written Cultural Heritage Processing - Lexical processing

L04-1459 : Violetta Cavalli-Sforza; Jaime G. Carbonell; Peter J. Jansen
Developing Language Resources for a Transnational Digital Government System

L04-1460 : Mary D. Swift; Myroslava O. Dzikovska; Joel R. Tetreault; James F. Allen
Semi-automatic Syntactic and Semantic Corpus Annotation with a Deep Parser

L04-1461 : Georges Fafiotte; Christian Boitet; Mark Seligman; Zong Chengqing
Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment

L04-1462 : Judita Preiss; Caroline Gasperin; Ted Briscoe
Can Anaphoric Definite Descriptions be Replaced by Pronouns?

L04-1463 : Roberto Bartolini; Alessandro Lenci; Simonetta Montemagni; Vito Pirrelli
Hybrid Constraints for Robust Parsing: First Experiments and Evaluation

L04-1464 : Véronique Aubergé; Nicolas Audibert; Albert Rilliard
E-Wiz: a Trapper Protocol for Hunting the Expressive Speech Corpora in Lab

L04-1465 : Simone Teufel; Hans van Halteren
Agreement in Human Factoid Annotation for Summarization Evaluation

L04-1466 : Albert Rilliard; Véronique Aubergé; Nicolas Audibert
Evaluating an Authentic Audio-Visual Expressive Speech Corpus

L04-1467 : Nadia Mana; Roldano Cattoni; Emanuele Pianta; Franca Rossi; Fabio Pianesi; Susanne Burger
The Italian NESPOLE! Corpus: a Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

L04-1468 : Eugenio Picchi; Maria Luigia Ceccotti; Sebastiana Cucurullo; Manuela Sassi; Eva Sassolini
Linguistic Miner: An Italian Linguistic Knowledge System

L04-1469 : Antonietta Alonge; Birte Lönneker
Metaphors in Wordnets: From Theory to Practice

L04-1470 : Harry Bunt; Laurent Romary
Standardization in Multimodal Content Representation: Some Methodological Issues

L04-1471 : Roberto Basili; Marco Cammisa; Fabio Massimo Zanzotto
A Similarity Measure for Unsupervised Semantic Disambiguation

L04-1472 : Laila Dybkjær; Niels Ole Bernsen; Wolfgang Minker
Usability Evaluation of Multimodal and Domain-Oriented Spoken Language Dialogue Systems

L04-1473 : Jaap Kamps; Maarten Marx; Robert J. Mokken; Maarten de Rijke
Using WordNet to Measure Semantic Orientations of Adjectives

L04-1474 : Per Weijnitz; Eva Forsbom; Ebba Gustavii; Eva Pettersson; Jörg Tiedemann
MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain

L04-1475 : Esmeralda Uraga; César Gamboa
VOXMEX Speech Database: Design of a Phonetically Balanced Corpus

L04-1476 : Christopher Brewster; Harith Alani; Srinandan Dasmahapatra; Yorick Wilks
Data Driven Ontology Evaluation

L04-1477 : Oliver Schonefeld; Jan-Torsten Milde
Embedding IMDI Metadata into a Large Phonetic Corpus

L04-1478 : Francesca Bertagna
Using Semantic Language Resources to Support Textual Inference for Question Answering

L04-1479 : Vasco Calais Pedro; Jeongwoo Ko; Eric Nyberg; Teruko Mitamura
An Information Repository Model for Advanced Question Answering Systems

L04-1480 : Francesca Bertagna; Alessandro Lenci; Monica Monachini; Nicoletta Calzolari
Content Interoperability of Lexical Resources: Open Issues and "MILE" Perspectives

L04-1481 : Martin Čmejrek; Jan Cuřín; Jiří Havelka; Jan Hajič; Vladislav Kuboň
Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation

L04-1482 : Christian Monson; Lori Levin; Rodolfo Vega; Ralf Brown; Ariadna Font Llitjos; Alon Lavie; Jaime Carbonell; Eliseo Cañulef; Rosendo Huisca
Data Collection and Analysis of Mapudungun Morphology for Spelling Correction

L04-1483 : Arlindo O. Veiga; Fernando S. Perdigão
An Efficient Word Confidence Measure Using Likelihood Ratio Scores

L04-1484 : Kenji Sagae; Brian MacWhinney; Alon Lavie
Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs

L04-1485 : Huarui Zhang; Churen Huang; Shiwen Yu
Distributional Consistency: As a General Method for Defining a Core Lexicon

L04-1486 : Rebecca J. Passonneau
Computing Reliability for Coreference Annotation

L04-1487 : Eneko Agirre; Oier Lopez de Lacalle
Publicly Available Topic Signatures for all WordNet Nominal Senses

L04-1488 : Timothy Baldwin; Emily M. Bender; Dan Flickinger; Ara Kim; Stephan Oepen
Road-testing the English Resource Grammar Over the British National Corpus

L04-1489 : Ying Zhang; Stephan Vogel; Alex Waibel
Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?

L04-1490 : Peter Anick
Exploiting Anchor Text as a Lexical Resource

L04-1491 : Dragomir Radev; Timothy Allison; Sasha Blair-Goldensohn; John Blitzer; Arda Çelebi; Stanko Dimitrov; Elliott Drabek; Ali Hakim; Wai Lam; Danyu Liu; Jahna Otterbacher; Hong Qi; Horacio Saggion; Simone Teufel; Michael Topper; Adam Winkel; Zhu Zhang
MEAD - A Platform for Multidocument Multilingual Text Summarization

L04-1492 : Saurabh Garg; Bilyana Martinovski; Susan Robinson; Jens Stephan; Joel Tetreault; David R. Traum
Evaluation of Transcription and Annotation Tools for a Multi-modal, Multi-party Dialogue Corpus

L04-1493 : Michael Emonts
Current Projects in Languages of Military Interest at the Defense Language Institute

L04-1494 : Aline Villavicencio; Timothy Baldwin; Benjamin Waldron
A Multilingual Database of Idioms

L04-1495 : Kazuaki Maeda; Stephanie Strassel
Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium

L04-1496 : Stephanie Strassel
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text

L04-1497 : Marc Vilain
Building part-of-speech Corpora Through Histogram Hopping

L04-1498 : Gregory Ernest Monaco; Abdelhadi Soudi
An Emerging Transcontinental Collaborative Research and Education Agenda in Human Language Technologies

L04-1499 : Susan Robinson; Bilyana Martinovski; Saurabh Garg; Jens Stephan; David Traum
Issues in Corpus Development for Multi-party Multi-modal Task-oriented Dialogue

L04-1500 : Christopher Cieri; David Miller; Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text

L04-1501 : David R. Traum; Susan Robinson; Jens Stephan
Evaluation of Multi-party Virtual Reality Dialogue Interaction

L04-1502 : Christopher Cieri; Joseph P. Campbell; Hirotaka Nakasone; David Miller; Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data

L04-1503 : Alessandro Mazzei; Vincenzo Lombardo
Building a Large Grammar for Italian

L04-1504 : Kitazawa Shigeyoshi; Kiriyama Shinya; Itoh Toshihiko; Nick Campbell
Japanese MULTEXT: a Prosodic Corpus

L04-1505 : Giuseppe Cappeli; Paulo Alberto
The OLISSIPO and LECTIO Projects

L04-1506 : Long Qiu; Min-Yen Kan; Tat-Seng Chua
A Public Reference Implementation of the RAP Anaphora Resolution Algorithm

L04-1507 : Mark Hepple; Neil Ireson; Paolo Allegrini; Simone Marchi; Simonetta Montemagni; Jose Maria Gomez Hidalgo
NLP-enhanced Content Filtering Within the POESIA Project

L04-1508 : Philippe Martin
WinPitch Corpus, a Text to Speech Alignment Tool for Multimodal Corpora

L04-1509 : Stefan Evert
The Statistical Analysis of Morphosyntactic Distributions

L04-1510 : Luciana Bordoni; Leonardo Pasqualini; Filippo Sciarrone
CHeM: A System for the Automatic Analysis of e-mails in the Restoration and Conservation Domain

L04-1511 : Robert Irie; Beth Sundheim
Resources for Place Name Analysis

L04-1512 : Bente Maegaard
NEMLAR - An Arabic Language Resources Project

L04-1513 : Key-Sun Choi; Hee-Sook Bae; Wonseok Kang; Juho Lee; Eunhe Kim; Hekyeong Kim; Donghee Kim; Youngbin Song; Hyosik Shin
Korean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy

L04-1514 : Christophe Jouis; Jean-Marie Ferru
Intranet Try To Find Project (ITTF): An Approach for the Search of Relevant Information Inside an Organization

L04-1515 : Christopher Cieri; Mark Liberman
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

L04-1516 : Khalid Choukri
Recent Activities within the European Language Resources Association: Issues on Sharing Language Resources and Evaluation

L04-1517 : Widad Mustafa El Hadi; Ismail Timimi; Marianne Dabbadie
EVALDA-CESART Project: Terminological Resources Acquisition Tools Evaluation Campaign

L04-1518 : Gabriella Pardelli; Manuela Sassi; Sara Goggi
From Weaver to the ALPAC Report

L04-1519 : Rute Costa; Raquel Silva
The Verb in the Terminological Collocations. Contribution to the Development of a Morphological Analyser: MorphoCom

L04-1520 : Joaquim F. Ferreira da Silva; Zornitsa Kozareva; José Gabriel Pereira Lopes
Cluster Analysis and Classification of Named Entities

L04-1521 : Khalid Choukri; Mahtab Nikkhou; Niklas Paulsson
Network of Data Centres (NetDC): BNSC - An Arabic Broadcast News Speech Corpus

L04-1522 : Valérie Mapelli; Maria Nava; Sylvain Surcin; Djamel Mostefa; Khalid Choukri
Technolangue: A Permanent Evaluation and Information Infrastructure

L04-1523 : Palmira Marrafa
Extending Wordnets To Implicit Information

L04-1524 : Boris Dobrov; Igor Kuralenok; Natalia Loukachevitch; Igor Nekrestyanov; Ilya Segalovich
Russian Information Retrieval Evaluation Seminar