Owen Rambow

Also published as: Owen C. Rambow

2023

pdf bib abs
NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi Fung | Tuhin Chakrabarty | Hao Guo | Owen Rambow | Smaranda Muresan | Heng Ji
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Knowledge of norms is needed to understand and reason about acceptable behavior in human communication and interactions across sociocultural scenarios. Most computational research on norms has focused on a single culture, and manually built datasets, from non-conversational settings. We address these limitations by proposing a new framework, NormSage, to automatically extract culture-specific norms from multi-lingual conversations. NormSage uses GPT-3 prompting to 1) extract candidate norms directly from conversations and 2) provide explainable self-verification to ensure correctness and relevance. Comprehensive empirical results show the promise of our approach to extract high-quality culture-aware norms from multi-lingual conversations (English and Chinese), across several quality metrics. Further, our relevance verification can be extended to assess the adherence and violation of any norm with respect to a conversation on-the-fly, along with textual explanation. NormSage achieves an AUC of 94.6% in this grounding setup, with generated explanations matching human-written quality.

pdf bib
Proceedings of the Seventh International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2023)
Owen Rambow | François Lareau
Proceedings of the Seventh International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2023)

pdf bib abs
A Cautious Generalization Goes a Long Way: Learning Morphophonological Rules
Salam Khalifa | Sarah Payne | Jordan Kodner | Ellen Broselow | Owen Rambow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Explicit linguistic knowledge, encoded by resources such as rule-based morphological analyzers, continues to prove useful in downstream NLP tasks, especially for low-resource languages and dialects. Rules are an important asset in descriptive linguistic grammars. However, creating such resources is usually expensive and non-trivial, especially for spoken varieties with no written standard. In this work, we present a novel approach for automatically learning morphophonological rules of Arabic from a corpus. Motivated by classic cognitive models for rule learning, rules are generalized cautiously. Rules that are memorized for individual items are only allowed to generalize to unseen forms if they are sufficiently reliable in the training data. The learned rules are further examined to ensure that they capture true linguistic phenomena described by domain experts. We also investigate the learnability of rules in low-resource settings across different experimental setups and dialects.

pdf bib abs
Deep Active Learning for Morphophonological Processing
Seyed Morteza Mirbostani | Yasaman Boreshban | Salam Khalifa | SeyedAbolghasem Mirroshandel | Owen Rambow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Building a system for morphological processing is a challenging task in morphologically complex languages like Arabic. Although there are some deep learning based models that achieve successful results, these models rely on a large amount of annotated data. Building such datasets, specially for some of the lower-resource Arabic dialects, is very difficult, time-consuming, and expensive. In addition, some parts of the annotated data do not contain useful information for training machine learning models. Active learning strategies allow the learner algorithm to select the most informative samples for annotation. There has been little research that focuses on applying active learning for morphological inflection and morphophonological processing. In this paper, we have proposed a deep active learning method for this task. Our experiments on Egyptian Arabic show that with only about 30% of annotated data, we achieve the same results as does the state-of-the-art model on the whole dataset.

pdf bib abs
Towards Generative Event Factuality Prediction
John Murzaku | Tyler Osborne | Amittai Aviram | Owen Rambow
Findings of the Association for Computational Linguistics: ACL 2023

We present a novel end-to-end generative task and system for predicting event factuality holders, targets, and their associated factuality values. We perform the first experiments using all sources and targets of factuality statements from the FactBank corpus. We perform multi-task learning with other tasks and event-factuality corpora to improve on the FactBank source and target task. We argue that careful domain specific target text output format in generative systems is important and verify this with multiple experiments on target text output structure. We redo previous state-of-the-art author-only event factuality experiments and also offer insights towards a generative paradigm for the author-only event factuality prediction task.

pdf bib abs
Finding Common Ground: Annotating and Predicting Common Ground in Spoken Conversations
Magdalena Markowska | Mohammad Taghizadeh | Adil Soubki | Seyed Mirroshandel | Owen Rambow
Findings of the Association for Computational Linguistics: EMNLP 2023

When we communicate with other humans, we do not simply generate a sequence of words. Rather, we use our cognitive state (beliefs, desires, intentions) and our model of the audience’s cognitive state to create utterances that affect the audience’s cognitive state in the intended manner. An important part of cognitive state is the common ground, which is the content the speaker believes, and the speaker believes the audience believes, and so on. While much attention has been paid to common ground in cognitive science, there has not been much work in natural language processing. In this paper, we introduce a new annotation and corpus to capture common ground. We then describe some initial experiments extracting propositions from dialog and tracking their status in the common ground from the perspective of each speaker.

2022

We present the BeSt corpus, which records cognitive state: who believes what (i.e., factuality), and who has what sentiment towards what. This corpus is inspired by similar source-and-target corpora, specifically MPQA and FactBank. The corpus comprises two genres, newswire and discussion forums, in three languages, Chinese (Mandarin), English, and Spanish. The corpus is distributed through the LDC.

pdf bib abs
Towards Learning Arabic Morphophonology
Salam Khalifa | Jordan Kodner | Owen Rambow
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

One core challenge facing morphological inflection systems is capturing language-specific morphophonological changes. This is particularly true of languages like Arabic which are morphologically complex. In this paper, we learn explicit morphophonological rules from morphologically annotated Egyptian Arabic and corresponding surface forms. These rules are human-interpretable, capture known morphophonological phenomena in the language, and are generalizable to unseen forms.

pdf bib abs
KOJAK: A New Corpus for Studying German Discourse Particle ja
Adil Soubki | Owen Rambow | Chong Kang
Proceedings of the 3rd Workshop on Computational Approaches to Discourse

In German, ja can be used as a discourse particle to indicate that a proposition, according to the speaker, is believed by both the speaker and audience. We use this observation to create KoJaK, a distantly-labeled English dataset derived from Europarl for studying when a speaker believes a statement to be common ground. This corpus is then analyzed to identify lexical choices in English that correspond with German ja. Finally, we perform experiments on the dataset to predict if an English clause corresponds to a German clause containing ja and achieve an F-measure of 75.3% on a balanced test corpus.

pdf bib abs
Re-Examining FactBank: Predicting the Author’s Presentation of Factuality
John Murzaku | Peter Zeng | Magdalena Markowska | Owen Rambow
Proceedings of the 29th International Conference on Computational Linguistics

We present a corrected version of a subset of the FactBank data set. Previously published results on FactBank are no longer valid. We perform experiments on FactBank using multiple training paradigms, data smoothing techniques, and polarity classifiers. We argue that f-measure is an important alternative evaluation metric for factuality. We provide new state-of-the-art results for four corpora including FactBank. We perform an error analysis on Factbank combined with two similar corpora.

pdf bib abs
From Stance to Concern: Adaptation of Propositional Analysis to New Tasks and Domains
Brodie Mather | Bonnie Dorr | Adam Dalton | William de Beaumont | Owen Rambow | Sonja Schmer-Galunder
Findings of the Association for Computational Linguistics: ACL 2022

We present a generalized paradigm for adaptation of propositional analysis (predicate-argument pairs) to new tasks and domains. We leverage an analogy between stances (belief-driven sentiment) and concerns (topical issues with moral dimensions/endorsements) to produce an explanatory representation. A key contribution is the combination of semi-automatic resource building for extraction of domain-dependent concern types (with 2-4 hours of human labor per domain) and an entirely automatic procedure for extraction of domain-independent moral dimensions and endorsement values. Prudent (automatic) selection of terms from propositional structures for lexical expansion (via semantic similarity) produces new moral dimension lexicons at three levels of granularity beyond a strong baseline lexicon. We develop a ground truth (GT) based on expert annotators and compare our concern detection output to GT, to yield 231% improvement in recall over baseline, with only a 10% loss in precision. F1 yields 66% improvement over baseline and 97.8% of human performance. Our lexically based approach yields large savings over approaches that employ costly human labor and model building. We provide to the community a newly expanded moral dimension/value lexicon, annotation guidelines, and GT.

2021

pdf bib
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
Young-bum Kim | Yunyao Li | Owen Rambow
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

pdf bib abs
Finite-state Model of Shupamem Reduplication
Magdalena Markowska | Jeffrey Heinz | Owen Rambow
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Shupamem, a language of Western Cameroon, is a tonal language which also exhibits the morpho-phonological process of full reduplication. This creates two challenges for finite-state model of its morpho-syntax and morphophonology: how to manage the full reduplication and the autosegmental nature of lexical tone. Dolatian and Heinz (2020) explain how 2-way finite-state transducers can model full reduplication without an exponential increase in states, and finite-state transducers with multiple tapes have been used to model autosegmental tiers, including tone (Wiebe, 1992; Dolatian and Rawski, 2020a). Here we synthesize 2-way finite-state transducers and multitape transducers, resulting in a finite-state formalism that subsumes both, to account for the full reduplicative processes in Shupamem which also affect tone.

2020

pdf bib abs
To Test Machine Comprehension, Start by Defining Comprehension
Jesse Dunietz | Greg Burnham | Akash Bharadwaj | Owen Rambow | Jennifer Chu-Carroll | Dave Ferrucci
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many tasks aim to measure machine reading comprehension (MRC), often focusing on question types presumed to be difficult. Rarely, however, do task designers start by considering what systems should in fact comprehend. In this paper we make two key contributions. First, we argue that existing approaches do not adequately define comprehension; they are too unsystematic about what content is tested. Second, we present a detailed definition of comprehension—a “Template of Understanding”—for a widely useful class of texts, namely short narratives. We then conduct an experiment that strongly suggests existing systems are not up to the task of narrative understanding as we define it.

pdf bib abs
Email Classification Incorporating Social Networks and Thread Structure
Sakhar Alkhereyf | Owen Rambow
Proceedings of the Twelfth Language Resources and Evaluation Conference

Existing methods for different document classification tasks in the context of social networks typically only capture the semantics of texts, while ignoring the users who exchange the text and the network they form. However, some work has shown that incorporating the social network information in addition to information from language is effective for various NLP applications including sentiment analysis, inferring user attributes, and predicting inter-personal relations. In this paper, we present an empirical study of email classification into “Business” and “Personal” categories. We represent the email communication using various graph structures. As features, we use both the textual information from the email content and social network information from the communication graphs. We also model the thread structure for emails. We focus on detecting personal emails, and we evaluate our methods on two corpora, only one of which we train on. The experimental results reveal that incorporating social network information improves over the performance of an approach based on textual information only. The results also show that considering the thread structure of emails improves the performance further. Furthermore, our approach improves over a state-of-the-art baseline which uses node embeddings based on both lexical and social network information.

2019

We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.

pdf bib abs
Syntax-aware Neural Semantic Role Labeling with Supertags
Jungo Kasai | Dan Friedman | Robert Frank | Dragomir Radev | Owen Rambow
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.

2018

pdf bib abs
Automatically Tailoring Unsupervised Morphological Segmentation to the Language
Ramy Eskander | Owen Rambow | Smaranda Muresan
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Morphological segmentation is beneficial for several natural language processing tasks dealing with large vocabularies. Unsupervised methods for morphological segmentation are essential for handling a diverse set of languages, including low-resource languages. Eskander et al. (2016) introduced a Language Independent Morphological Segmenter (LIMS) using Adaptor Grammars (AG) based on the best-on-average performing AG configuration. However, while LIMS worked best on average and outperforms other state-of-the-art unsupervised morphological segmentation approaches, it did not provide the optimal AG configuration for five out of the six languages. We propose two language-independent classifiers that enable the selection of the optimal or nearly-optimal configuration for the morphological segmentation of unseen languages.

pdf bib abs
Author Commitment and Social Power: Automatic Belief Tagging to Infer the Social Context of Interactions
Vinodkumar Prabhakaran | Premkumar Ganeshkumar | Owen Rambow
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Understanding how social power structures affect the way we interact with one another is of great interest to social scientists who want to answer fundamental questions about human behavior, as well as to computer scientists who want to build automatic methods to infer the social contexts of interactions. In this paper, we employ advancements in extra-propositional semantics extraction within NLP to study how author commitment reflects the social context of an interactions. Specifically, we investigate whether the level of commitment expressed by individuals in an organizational interaction reflects the hierarchical power structures they are part of. We find that subordinates use significantly more instances of non-commitment than superiors. More importantly, we also find that subordinates attribute propositions to other agents more often than superiors do — an aspect that has not been studied before. Finally, we show that enriching lexical features with commitment labels captures important distinctions in social meanings.

pdf bib abs
End-to-End Graph-Based TAG Parsing with Neural Networks
Jungo Kasai | Robert Frank | Pauli Xu | William Merrill | Owen Rambow
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end parser, which jointly performs supertagging, POS tagging, and parsing, outperforms the previously reported best results by more than 2.2 LAS and UAS points. The graph-based parsing architecture allows for global inference and rich feature representations for TAG parsing, alleviating the fundamental trade-off between transition-based and graph-based parsing systems. We also demonstrate that the proposed parser achieves state-of-the-art performance in the downstream tasks of Parsing Evaluation using Textual Entailments (PETE) and Unbounded Dependency Recovery. This provides further support for the claim that TAG is a viable formalism for problems that require rich structural analysis of sentences.

2017

pdf bib abs
TAG Parsing with Neural Networks and Vector Representations of Supertags
Jungo Kasai | Bob Frank | Tom McCoy | Owen Rambow | Alexis Nasr
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present supertagging-based models for Tree Adjoining Grammar parsing that use neural network architectures and dense vector representation of supertags (elementary trees) to achieve state-of-the-art performance in unlabeled and labeled attachment scores. The shift-reduce parsing model eschews lexical information entirely, and uses only the 1-best supertags to parse a sentence, providing further support for the claim that supertagging is “almost parsing.” We demonstrate that the embedding vector representations the parser induces for supertags possess linguistically interpretable structure, supporting analogies between grammatical structures like those familiar from recent work in distributional semantics. This dense representation of supertags overcomes the drawbacks for statistical models of TAG as compared to CCG parsing, raising the possibility that TAG is a viable alternative for NLP tasks that require the assignment of richer structural descriptions to sentences.

pdf bib abs
Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora
Sakhar Alkhereyf | Owen Rambow
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing

In this paper, we present an empirical study of email classification into two main categories “Business” and “Personal”. We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For this classification task, we extract social networks features from the graphs in addition to lexical features from email content and we compare the performance of SVM and Extra-Trees classifiers using these features. Combining graph features with lexical features improves the performance on both classifiers. We also provide manually annotated sets of the Avocado and Enron email corpora as a supplementary contribution.

pdf bib abs
Predicting User Views in Online News
Daniel Hardt | Owen Rambow
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

We analyze user viewing behavior on an online news site. We collect data from 64,000 news articles, and use text features to predict frequency of user views. We compare predictiveness of the headline and “teaser” (viewed before clicking) and the body (viewed after clicking). Both are predictive of clicking behavior, with the full article text being most predictive.

pdf bib
Linguistically Rich Vector Representations of Supertags for TAG Parsing
Dan Friedman | Jungo Kasai | R. Thomas McCoy | Robert Frank | Forrest Davis | Owen Rambow
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

pdf bib
TAG Parser Evaluation using Textual Entailments
Pauli Xu | Robert Frank | Jungo Kasai | Owen Rambow
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

2016

pdf bib
The Columbia University - New York University Abu Dhabi SIGMORPHON 2016 Morphological Reinflection Shared Task Submission
Dima Taji | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Revisiting Supertagging and Parsing: How to Use Supertags in Transition-Based Parsing
Wonchang Chung | Suhas Siddhesh Mhatre | Alexis Nasr | Owen Rambow | Srinivas Bangalore
Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)

pdf bib
Hyperedge Replacement and Nonprojective Dependency Structures
Daniel Bauer | Owen Rambow
Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12)

pdf bib abs
Detecting Level of Belief in Chinese and Spanish
Juan Pablo Colomer | Keyu Lai | Owen Rambow
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

There has been extensive work on detecting the level of committed belief (also known as “factuality”) that an author is expressing towards the propositions in his or her utterances. Previous work on English has revealed that this can be done as a sequence tagging task. In this paper, we investigate the same task for Chinese and Spanish, two very different languages from English and from each other.

pdf bib abs
Incrementally Learning a Dependency Parser to Support Language Documentation in Field Linguistics
Morgan Ulinski | Julia Hirschberg | Owen Rambow
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present experiments in incrementally learning a dependency parser. The parser will be used in the WordsEye Linguistics Tools (WELT) (Ulinski et al., 2014) which supports field linguists documenting a language’s syntax and semantics. Our goal is to make syntactic annotation faster for field linguists. We have created a new parallel corpus of descriptions of spatial relations and motion events, based on pictures and video clips used by field linguists for elicitation of language from native speaker informants. We collected descriptions for each picture and video from native speakers in English, Spanish, German, and Egyptian Arabic. We compare the performance of MSTParser (McDonald et al., 2006) and MaltParser (Nivre et al., 2006) when trained on small amounts of this data. We find that MaltParser achieves the best performance. We also present the results of experiments using the parser to assist with annotation. We find that even when the parser is trained on a single sentence from the corpus, annotation time significantly decreases.

pdf bib abs
Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages
Ramy Eskander | Owen Rambow | Tianchun Yang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We investigate using Adaptor Grammars for unsupervised morphological segmentation. Using six development languages, we investigate in detail different grammars, the use of morphological knowledge from outside sources, and the use of a cascaded architecture. Using cross-validation on our development languages, we propose a system which is language-independent. We show that it outperforms two state-of-the-art systems on 5 out of 6 languages.

pdf bib abs
Automatically Processing Tweets from Gang-Involved Youth: Towards Detecting Loss and Aggression
Terra Blevins | Robert Kwiatkowski | Jamie MacBeth | Kathleen McKeown | Desmond Patton | Owen Rambow
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Violence is a serious problems for cities like Chicago and has been exacerbated by the use of social media by gang-involved youths for taunting rival gangs. We present a corpus of tweets from a young and powerful female gang member and her communicators, which we have annotated with discourse intention, using a deep read to understand how and what triggered conversations to escalate into aggression. We use this corpus to develop a part-of-speech tagger and phrase table for the variant of English that is used and a classifier for identifying tweets that express grieving and aggression.

pdf bib abs
Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine
Ramy Eskander | Nizar Habash | Owen Rambow | Arfath Pasha
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much. However, as more and more written dialectal Arabic is found in social media, NLP for Arabic dialects becomes an important goal. We present a methodology for creating a morphological analyzer and a morphological tagger for dialectal Arabic, and we illustrate it on Egyptian and Levantine Arabic. To our knowledge, these are the first analyzer and tagger for Levantine.

pdf bib
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Kevin Knight | Ani Nenkova | Owen Rambow
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib abs
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
Faisal Al-Shargi | Aidan Kaplan | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present new language resources for Moroccan and Sanaani Yemeni Arabic. The resources include corpora for each dialect which have been morphologically annotated, and morphological analyzers for each dialect which are derived from these corpora. These are the first sets of resources for Moroccan and Yemeni Arabic. The resources will be made available to the public.

pdf bib abs
A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels
Vinodkumar Prabhakaran | Owen Rambow
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In order to gain a deep understanding of how social context manifests in interactions, we need data that represents interactions from a large community of people over a long period of time, capturing different aspects of social context. In this paper, we present a large corpus of Wikipedia Talk page discussions that are collected from a broad range of topics, containing discussions that happened over a period of 15 years. The dataset contains 166,322 discussion threads, across 1236 articles/topics that span 15 different topic categories or domains. The dataset also captures whether the post is made by an registered user or not, and whether he/she was an administrator at the time of making the post. It also captures the Wikipedia age of editors in terms of number of months spent as an editor, as well as their gender. This corpus will be a valuable resource to investigate a variety of computational sociolinguistics research questions regarding online social interactions.

Text preprocessing is an important and necessary task for all NLP applications. A simple variation in any preprocessing step may drastically affect the final results. Moreover replicability and comparability, as much as feasible, is one of the goals of our scientific enterprise, thus building systems that can ensure the consistency in our various pipelines would contribute significantly to our goals. The problem has become quite pronounced with the abundance of NLP tools becoming more and more available yet with different levels of specifications. In this paper, we present a dynamic unified preprocessing framework and tool, SPLIT, that is highly configurable based on user requirements which serves as a preprocessing tool for several tools at once. SPLIT aims to standardize the implementations of the most important preprocessing steps by allowing for a unified API that could be exchanged across different researchers to ensure complete transparency in replication. The user is able to select the required preprocessing tasks among a long list of preprocessing steps. The user is also able to specify the order of execution which in turn affects the final preprocessing output.

2015

pdf bib
SLSA: A Sentiment Lexicon for Standard Arabic
Ramy Eskander | Owen Rambow
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sentiment and Belief: How to Think about, Represent, and Annotate Private States
Owen Rambow | Janyce Wiebe
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts

pdf bib
Validating Literary Theories Using Automatic Social Network Extraction
Prashant Jayannavar | Apoorv Agarwal | Melody Ju | Owen Rambow
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
Committed Belief Tagging on the Factbank and LU Corpora: A Comparative Study
Gregory Werner | Vinodkumar Prabhakaran | Mona Diab | Owen Rambow
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

pdf bib
DIWAN: A Dialectal Word Annotation Tool for Arabic
Faisal Al-Shargi | Owen Rambow
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib
Automatic Transliteration of Romanized Dialectal Arabic
Mohamed Al-Badrashiny | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages
Jeff Good | Julia Hirschberg | Owen Rambow
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Documenting Endangered Languages with the WordsEye Linguistics Tool
Morgan Ulinski | Anusha Balakrishnan | Daniel Bauer | Bob Coyne | Julia Hirschberg | Owen Rambow
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Power of Confidence: How Poll Scores Impact Topic Dynamics in Political Debates
Vinodkumar Prabhakaran | Ashima Arora | Owen Rambow
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Using Simple NLP Tools to Trace the Globalization of the Art World
Mohamed AlTantawy | Alix Rule | Owen Rambow | Zhongyu Wang | Rupayan Basu
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Power of Confidence: How Poll Scores Impact Topic Dynamics in Political Debates
Vinodkumar Prabhakaran | Ashima Arora | Owen Rambow
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf bib
Using Frame Semantics in Natural Language Processing
Apoorv Agarwal | Daniel Bauer | Owen Rambow
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

pdf bib
Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script
Ramy Eskander | Mohamed Al-Badrashiny | Nizar Habash | Owen Rambow
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Light verb constructions with ‘do’ and ‘be’ in Hindi: A TAG analysis
Ashwini Vaidya | Owen Rambow | Martha Palmer
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007). MADAMIRA improves upon the two systems with a more streamlined Java implementation that is more robust, portable, extensible, and is faster than its ancestors by more than an order of magnitude. We also discuss an online demo (see http://nlp.ldeo.columbia.edu/madamira/) that highlights these aspects.

pdf bib
Unsupervised Morphology-Based Vocabulary Expansion
Mohammad Sadegh Rasooli | Thomas Lippincott | Nizar Habash | Owen Rambow
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Predicting Power Relations between Participants in Written Dialog from a Single Thread
Vinodkumar Prabhakaran | Owen Rambow
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
WELT: Using Graphics Generation in Linguistic Fieldwork
Morgan Ulinski | Anusha Balakrishnan | Bob Coyne | Julia Hirschberg | Owen Rambow
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Frame Semantic Tree Kernels for Social Network Extraction from Text
Apoorv Agarwal | Sriramkumar Balasubramanian | Anup Kotalwar | Jiehan Zheng | Owen Rambow
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Staying on Topic: An Indicator of Power in Political Debates
Vinodkumar Prabhakaran | Ashima Arora | Owen Rambow
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Gender and Power: How Gender and Gender Environment Affect Manifestations of Power
Vinodkumar Prabhakaran | Emily E. Reid | Owen Rambow
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Written Dialog and Social Power: Manifestations of Different Types of Power in Dialog Behavior
Vinodkumar Prabhakaran | Owen Rambow
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Automatic Extraction of Social Networks from Literary Text: A Case Study on Alice in Wonderland
Apoorv Agarwal | Anup Kotalwar | Owen Rambow
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
SINNET: Social Interaction Network Extractor from Text
Apoorv Agarwal | Anup Kotalwar | Jiehan Zheng | Owen Rambow
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf bib
Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features
Yuval Marton | Nizar Habash | Owen Rambow
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf bib
Morphological Analysis and Disambiguation for Dialectal Arabic
Nizar Habash | Ryan Roth | Owen Rambow | Ramy Eskander | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Processing Spontaneous Orthography
Ramy Eskander | Nizar Habash | Owen Rambow | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving the Quality of Minority Class Identification in Dialog Act Tagging
Adinoyi Omuya | Vinodkumar Prabhakaran | Owen Rambow
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
SPMRL‘13 Shared Task System: The CADIM Arabic Dependency Parser
Yuval Marton | Nizar Habash | Owen Rambow | Sarah Alkhulani
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
Who’s (Really) the Boss? Perception of Situational Power in Written Interactions
Vinodkumar Prabhakaran | Owen Rambow | Mona Diab
Proceedings of COLING 2012

pdf bib
Predicting Overt Display of Power in Written Dialogs
Vinodkumar Prabhakaran | Owen Rambow | Mona Diab
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Detecting Influencers in Written Online Conversations
Or Biran | Sara Rosenthal | Jacob Andreas | Kathleen McKeown | Owen Rambow
Proceedings of the Second Workshop on Language in Social Media

pdf bib
Social Network Analysis of Alice in Wonderland
Apoorv Agarwal | Augusto Corvalan | Jacob Jensen | Owen Rambow
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
Creating a Tree Adjoining Grammar from a Multilayer Treebank
Rajesh Bhatt | Owen Rambow | Fei Xia
Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11)

pdf bib
A Comprehensive Gold Standard for the Enron Organizational Hierarchy
Apoorv Agarwal | Adinoyi Omuya | Aaron Harnly | Owen Rambow
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Unsupervised Induction of a Syntax-Semantics Lexicon Using Iterative Refinement
Hagen Fürstenau | Owen Rambow
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib abs
Conventional Orthography for Dialectal Arabic
Nizar Habash | Mona Diab | Owen Rambow
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world. DA lives side-by-side with the official language, Modern Standard Arabic (MSA). DA differs from MSA on all levels of linguistic representation, from phonology and morphology to lexicon and syntax. Unlike MSA, DA has no standard orthography since there are no Arabic dialect academies, nor is there a large edited body of dialectal literature that follows the same spelling standard. In this paper, we present CODA, a conventional orthography for dialectal Arabic; it is designed primarily for the purpose of developing computational models of Arabic dialects. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Egyptian Arabic.

pdf bib abs
Annotations for Power Relations on Email Threads
Vinodkumar Prabhakaran | Huzaifa Neralwala | Owen Rambow | Mona Diab
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Social relations like power and influence are difficult concepts to define, but are easily recognizable when expressed. In this paper, we describe a multi-layer annotation scheme for social power relations that are recognizable from online written interactions. We introduce a typology of four types of power relations between dialog participants: hierarchical power, situational power, influence and control of communication. We also present a corpus of Enron emails comprising of 122 threaded conversations, manually annotated with instances of these power relations between participants. Our annotations also capture attempts at exercise of power or influence and whether those attempts were successful or not. In addition, we also capture utterance level annotations for overt display of power. We describe the annotation definitions using two example email threads from our corpus illustrating each type of power relation. We also present detailed instructions given to the annotators and provide various statistics on annotations in the corpus.

pdf bib abs
The Dependency-Parsed FrameNet Corpus
Daniel Bauer | Hagen Fürstenau | Owen Rambow
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

When training semantic role labeling systems, the syntax of example sentences is of particular importance. Unfortunately, for the FrameNet annotated sentences, there is no standard parsed version. The integration of the automatic parse of an annotated sentence with its semantic annotation, while conceptually straightforward, is complex in practice. We present a standard dataset that is publicly available and that can be used in future research. This dataset contains parser-generated dependency structures (with POS tags and lemmas) for all FrameNet 1.5 sentences, with nodes automatically associated with FrameNet annotations.

2011

pdf bib
Sentiment Analysis of Twitter Data
Apoorv Agarwal | Boyi Xie | Ilia Vovsha | Owen Rambow | Rebecca Passonneau
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
VigNet: Grounding Language in Graphics using Frame Semantics
Bob Coyne | Daniel Bauer | Owen Rambow
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Fuzzy Syntactic Reordering for Phrase-based Statistical Machine Translation
Jacob Andreas | Nizar Habash | Owen Rambow
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Fast Yet Rich Morphological Analysis
Mohamed Altantawy | Nizar Habash | Owen Rambow
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

pdf bib
Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks
Rajesh Bhatt | Owen Rambow | Fei Xia
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Automatic Detection and Classification of Social Events
Apoorv Agarwal | Owen Rambow
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Word-Based Dialect Identification with Georeferenced Rules
Yves Scherrer | Owen Rambow
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Annotation Scheme for Social Network Extraction from Text
Apoorv Agarwal | Owen C. Rambow | Rebecca J. Passonneau
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Automatic Committed Belief Tagging
Vinodkumar Prabhakaran | Owen Rambow | Mona Diab
Coling 2010: Posters

pdf bib
The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece
Owen Rambow
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib abs
Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach
Mohamed Altantawy | Nizar Habash | Owen Rambow | Ibrahim Saleh
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

MAGEAD is a morphological analyzer and generator for Modern Standard Arabic (MSA) and its dialects. We introduced MAGEAD in previous work with an implementation of MSA and Levantine Arabic verbs. In this paper, we port that system to MSA nominals (nouns and adjectives), which are far more complex to model than verbs. Our system is a functional morphological analyzer and generator, i.e., it analyzes to and generates from a representation consisting of a lexeme and linguistic feature-value pairs, where the features are syntactically (and perhaps semantically) meaningful, rather than just morphologically. A detailed evaluation of the current implementation comparing it to a commonly used morphological analyzer shows that it has good morphological coverage with precision and recall scores in the 90s. An error analysis reveals that the majority of recall and precision errors are problems in the gold standard or a result of the discrepancy between different models of form-based/functional morphology.

We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make use of ECs. We make a high-level distinction between two types of ECs, trace and silent, on the basis of whether they are postulated to mark displacement or not. Each type is further refined into several subtypes based on the underlying linguistic phenomena which the ECs are introduced to handle. This paper discusses the stages at which we add ECs to the Hindi/Urdu treebank and why. We investigate methodically the different types of ECs and their role in our syntactic and semantic representations. We also examine our decisions whether or not to coindex each type of ECs with other elements in the representation.

2009

pdf bib
MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note)
Srinivas Bangalore | Pierre Boullier | Alexis Nasr | Owen Rambow | Benoît Sagot
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Contrasting the Interaction Structure of an Email and a Telephone Corpus: A Machine Learning Approach to Annotation of Dialogue Function Units
Jun Hu | Rebecca Passonneau | Owen Rambow
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Is Coordination Quantification?
Kevin Lerman | Owen Rambow
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

pdf bib abs
Improving NER in Arabic Using a Morphological Tagger
Benjamin Farber | Dayne Freitag | Nizar Habash | Owen Rambow
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer. Surprisingly, the relevant features used are the capitalization of the English gloss chosen by the tagger, and the fact that an analysis is returned (that a word is not OOV to the morphological analyzer). The use of the tagger also improves over a third system which just uses a morphological analyzer, yielding a 14\% reduction in error over the baseline. We conduct a thorough error analysis to identify sources of success and failure among the variations, and show that by combining the systems in simple ways we can significantly influence the precision-recall trade-off.

pdf bib abs
Using Semantically Annotated Corpora to Build Collocation Resources
Margarita Alonso Ramos | Owen Rambow | Leo Wanner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present an experiment in extracting collocations from the FrameNet corpus, specifically, support verbs such as direct in Environmentalists directed strong criticism at world leaders. Support verbs do not contribute meaning of their own and the meaning of the construction is provided by the noun; the recognition of support verbs is thus useful in text understanding. Having access to a list of support verbs is also useful in applications that can benefit from paraphrasing, such as generation (where paraphrasing can provide variety). This paper starts with a brief presentation of the notion of lexical function in Meaning-Text Theory, where they fall under the notion of lexical function, and then discusses how relevant information is encoded in the FrameNet corpus. We describe the resource extracted from the FrameNet corpus.

pdf bib
Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking
Ryan Roth | Owen Rambow | Nizar Habash | Mona Diab | Cynthia Rudin
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Nizar Habash | Ryan Gabbard | Owen Rambow | Seth Kulick | Mitch Marcus
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Semi-automatic error analysis for large-scale statistical machine translation
Katrin Kirchhoff | Owen Rambow | Nizar Habash | Mona Diab
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Building and Refining Rhetorical-Semantic Relation Models
Sasha Blair-Goldensohn | Kathleen McKeown | Owen Rambow
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Arabic Diacritization through Full Morphological Tagging
Nizar Habash | Owen Rambow
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Grammar Approximation by Representative Sublanguage: A New Model for Language Learning
Smaranda Muresan | Owen Rambow
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Parsing Arabic Dialects
David Chiang | Mona Diab | Nizar Habash | Owen Rambow | Safiullah Shareef
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The Hidden TAG Model: Synchronous Grammars for Parsing Resource-Poor Languages
David Chiang | Owen Rambow
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
The Metagrammar Goes Multilingual: A Cross-Linguistic Look at the V2-Phenomenon
Alexandra Kinyon | Owen Rambow | Tatjana Scheffler | SinWon Yoon | Aravind K. Joshi
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects
Nizar Habash | Owen Rambow
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

In this paper, we describe the methodological procedures and issues that emerged from the development of a pilot Levantine Arabic Treebank (LATB) at the Linguistic Data Consortium (LDC) and its use at the Johns Hopkins University (JHU) Center for Language and Speech Processing workshop on Parsing Arabic Dialects (PAD). This pilot, consisting of morphological and syntactic annotation of approximately 26,000 words of Levantine Arabic conversational telephone speech, was developed under severe time constraints; hence the LDC team drew on their experience in treebanking Modern Standard Arabic (MSA) text. The resulting Levantine dialect treebanked corpus was used by the PAD team to develop and evaluate parsers for Levantine dialect texts. The parsers were trained on MSA resources and adapted using dialect-MSA lexical resources (some developed especially for this task) and existing linguistic knowledge about syntactic differences between MSA and dialect. The use of the LATB for development and evaluation of syntactic parsers allowed the PAD team to provide feedbasck to the LDC treebank developers. In this paper, we describe the creation of resources for this corpus, as well as transformations on the corpus to eliminate speech effects and lessen the gap between our pre-existing MSA resources and the new dialectal corpus

pdf bib abs
Inter-annotator Agreement on a Multilingual Semantic Annotation Task
Rebecca Passonneau | Nizar Habash | Owen Rambow
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Six sites participated in the Interlingual Annotation of Multilingual Text Corpora (IAMTC) project (Dorr et al., 2004; Farwell et al., 2004; Mitamura et al., 2004). Parsed versions of English translations of news articles in Arabic, French, Hindi, Japanese, Korean and Spanish were annotated by up to ten annotators. Their task was to match open-class lexical items (nouns, verbs, adjectives, adverbs) to one or more concepts taken from the Omega ontology (Philpot et al., 2003), and to identify theta roles for verb arguments. The annotated corpus is intended to be a resource for meaning-based approaches to machine translation. Here we discuss inter-annotator agreement for the corpus. The annotation task is characterized by annotators freedom to select multiple concepts or roles per lexical item. As a result, the annotation categories are sets, the number of which is bounded only by the number of distinct annotator-lexical item pairs. We use a reliability metric designed to handle partial agreement between sets. The best results pertain to the part of the ontology derived from WordNet. We examine change over the course of the project, differences among annotators, and differences across parts of speech. Our results suggest a strong learning effect early in the project.

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

2005

pdf bib
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop
Nizar Habash | Owen Rambow
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Morphological Analysis and Generation for Arabic Dialects
Nizar Habash | Owen Rambow | George Kiraz
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2004

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

pdf bib
A Simple String-Rewriting Formalism for Dependency Grammar
Alexis Nasr | Owen Rambow
Proceedings of the Workshop on Recent Advances in Dependency Grammar

pdf bib
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms
Owen Rambow | Matthew Stone
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
SuperTagging and Full Parsing
Alexis Nasr | Owen Rambow
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

pdf bib
Summarizing Email Threads
Owen Rambow | Lokesh Shrestha | John Chen | Christy Laurdisen
Proceedings of HLT-NAACL 2004: Short Papers

This paper describes an approach for handling structural divergences and recovering dropped arguments in an implemented Korean to English machine translation system. The approach relies on canonical predicate-argument structures (or dependency structures), which provide a suitable pivot representation for the handling of structural divergences and the recovery of dropped arguments. It can also be converted to and from the interface representations of many off-the-shelf parsers and generators.

pdf bib
Exploiting a Probabilistic Hierarchical Model for Generation
Srinivas Bangalore | Owen Rambow
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Evaluation Metrics for Generation
Srinivas Bangalore | Owen Rambow | Steve Whittaker
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

pdf bib
Using TAGs, a Tree Model, and a Language Model for Generation
Srinivas Bangalore | Owen Rambow
Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5)

pdf bib
The Sino-Korean light verb construction and lexical argument structure
Chung-hye Han | Owen Rambow
Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5)

pdf bib
Corpus-Based Lexical Choice in Natural Language Generation
Srinivas Bangalore | Owen Rambow
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Framework for MT and Multilingual NLG Systems Based on Uniform Lexico-Structural Processing
Benoit Lavoie | Richard Kittredge | Tanya Korelsky | Owen Rambow
Sixth Applied Natural Language Processing Conference

1998

pdf bib
Pseudo-Projectivity: A Polynomially Parsable Non-Projective Dependency Grammar
Sylvain Kahane | Alexis Nasr | Owen Rambow
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
A Framework for Customizable Generation of Hypertext Presentations
Benoit Lavoie | Owen Rambow
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Wh-islands in TAG and related formalisms
Owen Rambow | K. Vijay-Shanker
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

pdf bib
Pseudo-Projectivity, A Polynomially Parsable Non-Projective Dependency Grammar
Sylvain Kahane | Alexis Nasr | Owen Rambow
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
A Framework for Customizable Generation of Hypertext Presentations
Benoit Lavoie | Owen Rambow
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib abs
Rapid prototyping of domain-apecific machine translation systems
Martha Palmer | Owen Rambow | Alexis Nasr
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper reports on an experiment in assembling a domain-specific machine translation prototype system from off-the-shelf components. The design goals of this experiment were to reuse existing components, to use machine-learning techniques for parser specialization and for transfer lexicon extraction, and to use an expressive, lexicalized formalism for the transfer component.

1997

pdf bib
Enriching lexical transfer with cross-linguistic semantic features or how to do interlingua without interlingua
Alexis Nasr | Owen Rambow | Martha Palmer | Joseph Rosenzweig
AMTA/SIG-IL First Workshop on Interlinguas

pdf bib
Customizable Descriptions of Object-Oriented Models
Benoit Lavoie | Owen Rambow | Ehud Reiter
Fifth Conference on Applied Natural Language Processing

1996

pdf bib
Synchronous Models of Language
Owen Rambow | Giorgio Satta
34th Annual Meeting of the Association for Computational Linguistics

pdf bib
The ModelExplainer
Benoit Lavoie | Owen Rambow | Ehud Reiter
Eighth International Natural Language Generation Workshop (Posters and Demonstrations)

1995

pdf bib
D-Tree Grammars
Owen Rambow | K. Vijay-Shanker | David Weir
33rd Annual Meeting of the Association for Computational Linguistics

pdf bib abs
Parsing Non-Immediate Dominance Relations
Tilman Becker | Owen Rambow
Proceedings of the Fourth International Workshop on Parsing Technologies

We present a new technique for parsing grammar formalisms that express non-immediate dominance relations by ‘dominance-links’. Dominance links have been introduced in various formalisms such as extensions to CFG and TAG in order to capture long-distance dependencies in free-word order languages (Becker et al., 1991; Rambow, 1994). We show how the addition of ‘link counters’ to standard parsing algorithms such as CKY- and Earley-based methods for TAG results in a polynomial time complexity algorithm for parsing lexicalized V-TAG, a multi-component version of TAGs defined in (Rambow, 1994). A variant of this method has previously been applied to context-free grammar based formalisms such as UVG-DL.

pdf bib
Parsing D-Tree Grammars
K. Vijay-Shanker | David Weir | Owen Rambow
Proceedings of the Fourth International Workshop on Parsing Technologies