All ‘A’ sessions are held in the Bayside Auditorium A; ‘B’ sessions are in Bayside 103; ‘C’ sessions are in Bayside 104; and ‘D’ sessions and the Student Research Workshop sessions are in Bayside 102.
Session Chair: David Chiang
Fatiha Sadat and Nizar Habash
Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality.
Necip Fazil Ayan and Bonnie J. Dorr
This paper presents an extensive evaluation of five different alignments and investigates their impact on the corresponding MT system output. We introduce new measures for intrinsic evaluations and examine the distribution of phrases and untranslated words during decoding to identify which characteristics of different alignments affect translation. We show that precision-oriented alignments yield better MT output (translating more words and using longer phrases) than recall-oriented alignments.
Session Chair: Martha Palmer
Matthew Purver, Konrad P. Körding, Thomas L. Griffiths and Joshua B. Tenebaum
We present a method for unsupervised topic modelling which adapts methods used in document classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification: automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods (Galley et al., 2003) while simultaneously extracting topics which rate highly when assessed for coherence by human judges. We also show that this method appears robust in the face of off-topic dialogue and speech recognition errors.
Igor Malioutov and
We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account long-range cohesion dependencies. Our results demonstrate that global analysis improves the segmentation accuracy and is robust in the presence of speech recognition errors.
Session Chair: Vincent Ng
Shane Bergsma and Dekang Lin
We present an approach to pronoun resolution based on syntactic paths. Through a simple bootstrapping procedure, we learn the likelihood of coreference between a pronoun and a candidate noun based on the path in the parse tree between the two entities. This path information enables us to handle previously challenging resolution instances, and also robustly addresses traditional syntactic coreference constraints. Highly coreferent paths also allow mining of precise probabilistic gender/number information. We combine statistical knowledge with well-known features in a Support Vector Machine pronoun resolution classifier. Significant gains in performance are observed on several datasets.
Xiaofeng Yang, Jian Su and Chew Lim Tan
Syntactic knowledge is important for pronoun resolution. Traditionally, the syntactic information for pronoun resolution is represented in terms of features that have to be selected and defined heuristically. In the paper, we propose a kernel-based method that can automatically mine the syntactic information from the parse trees for pronoun resolution. Specifically, we utilize the parse trees directly as a structured feature and apply kernel functions to this feature, as well as other normal features, to learn the resolution classifier. In this way, our approach avoids the efforts of decoding the parse trees into the set of flat syntactic features. The experimental results show that our approach can bring significant performance improvement and is reliably effective for the pronoun resolution task.
Session Chair: Martin Kay
It has previously been assumed in the psycholinguistic literature that finite-state models of language are crucially limited in their explanatory power by the locality of the probability distribution and the narrow scope of information used by the model. We show that a simple computational model (a bigram part-of-speech tagger based on the design used by Corley and Crocker (2000) makes correct predictions on processing difficulty observed in a wide range of empirical sentence processing data. We use two modes of evaluation: one that relies on comparison with a control sentence, paralleling practice in human studies; another that measures probability drop in the disambiguating region of the sentence. Both are surprisingly good indicators of the processing difficulty of garden-path sentences. The sentences tested are drawn from published sources and systematically explore five different types of ambiguity: previous studies have been narrower in scope and smaller in scale. We do not deny the limitations of finite-state models, but argue that our results show that their usefulness has been underestimated.
Philippe Blache, Barbara Hemforth and Stéphane Rauzy
We propose in this paper a method for quantifying sentence grammaticality. The approach based on Property Grammars, a constraint-based syntactic formalism, makes it possible to evaluate a grammaticality index for any kind of sentence, including ill-formed ones. We compare on a sample of sentences the grammaticality indices obtained from PG formalism and the acceptability judgements measured by means of a psycholinguistic analysis. The results show that the derived grammaticality index is a fairly good tracer of acceptability scores.
Session Chair: David Chiang
Phil Blunsom and Trevor Cohn
In this paper we present a novel approach for inducing word alignments from sentence-aligned data. We use a Conditional Random Field (CRF), a discriminative model, which is estimated on a small supervised training set. The CRF is conditioned on both the source and target texts, and thus allows for the use of arbitrary and overlapping features over these data. Moreover, the CRF has efficient training and decoding processes which both find globally optimal solutions.
We apply this alignment model to both French-English and Romanian-English language pairs. We show how a large number of highly predictive features can be easily incorporated into the CRF, and demonstrate that even with only a few hundred word-aligned training sentences, our model improves over the current state-of-the-art with alignment error rates of 5.29 and 25.8 for the two tasks respectively.
Richard Sproat, Tao Tao and ChengXiang Zhai
In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step.
Dragos Stefan Munteanu and Daniel Marcu
We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.
Session Chair: Martha Palmer
Yee Seng Chan and Hwee Tou Ng
Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well-calibrated probabilities when performing these estimations. By using well-calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy.
Samuel Brody, Roberto Navigli and Mirella Lapata
Combination methods are an effective way of improving system performance. This paper examines the benefits of system combination for unsupervised WSD. We investigate several voting- and arbiter-based combination strategies over a diverse pool of unsupervised WSD systems. Our combination methods rely on predominant senses which are derived automatically from raw text. Experiments using the SemCor and Senseval-3 data sets demonstrate that our ensembles yield significantly better results when compared with state-of-the-art.
Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford Dictionary of English. We assess the quality of the mapping and the induced clustering, and evaluate the performance of coarse WSD systems in the Senseval-3 English all-words task.
Session Chair: Vincent Ng
Patrick Pantel and Marco Pennacchiotti
In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems, on different size and genre corpora, on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision.
Zhou GuoDong, Su Jian and Zhang Min
This paper proposes a novel hierarchical learning strategy
to deal with the data sparseness
problem in relation extraction by modeling the commonality among related classes. For each class
in the hierarchy either predefined manually
or automatically clustered, a linear discriminative function is determined in a top-down way using a
perceptron algorithm with the lower-level
weight vector derived from the upper-level weight vector. As the
upper-level class normally has
much more positive training examples than the lower-level class, the corresponding linear
discriminative function can be determined more reliably. The upper-level discriminative function then can
effectively guide the
discriminative function learning in the lower-level, which otherwise might suffer from limited training data.
Evaluation on the ACE
Jinxiu Chen, Donghong Ji, Chew Lim Tan and Zhengyu Niu
of manually labeled data is an obstacle to supervised relation extraction
methods. In this paper we investigate a graph based semi-supervised learning
algorithm, a label propagation (LP) algorithm, for relation extraction. It
represents labeled and unlabeled examples and their distances as the nodes and
the weights of edges of a graph, and tries to obtain a labeling function to
satisfy two constraints: 1) it should be fixed on the labeled nodes, 2) it
should be smooth on the whole graph. Experiment results on the ACE corpus
showed that this LP algorithm achieves better performance than
Session Chair: Martin Kay
This paper proposes a
generic mathematical formalism for the combination of various structures:
strings, trees, dags, graphs and products of them. The polarization of the
objects of the elementary structures controls the saturation of the final
structure. This formalism is both elementary and powerful enough to strongly
simulate many grammar formalisms, such as rewriting systems, dependency
Yael Cohen-Sygal and Shuly Wintner
This work provides the essential foundations for modular construction of (typed) unification grammars for natural languages. Much of the information in such grammars is encoded in the signature, and hence the key is facilitating a modularized development of type signatures. We introduce a definition of signature modules and show how two modules combine. Our definitions are motivated by the actual needs of grammar developers obtained through a careful examination of large scale grammars. We show that our definitions meet these needs by conforming to a detailed set of desiderata.
Özlem Çetinoğlu and Kemal Oflazer
This paper investigates the use of sublexical units as a solution to
handling the complex morphology with productive derivational processes, in the
development of a lexical functional grammar for Turkish. Such sublexical units
make it possible to expose the internal structure of words with multiple
derivations to the grammar rules in a uniform manner. This in turn leads to
more succinct and manageable rules. Further, the semantics of the derivations
can also be systematically reflected in a compositional way by constructing
Session Chair: Joakim Nivre
John Hale, Izhak Shafran, Lisa Yung, Bonnie Dorr, Mary Harper, Anna Krasnyanskaya, Matthew Lease, Yang Liu, Brian Roark, Matthew Snover and Robin Stewart
A grammatical method of combining two kinds of speech repair cues is presented. One cue, prosodic disjuncture, is detected by a decision tree-based ensemble classifier that uses acoustic cues to identify where normal prosody seems to be interrupted (Lickley, 1996). The other cue, syntactic parallelism, codifies the expectation that repairs continue a syntactic category that was left unfinished in the reparandum (Levelt, 1983). The two cues are combined in a Treebank PCFG whose states are split using a few simple tree transformations. Parsing performance on the Switchboard and Fisher corpora suggests that these two cues help to locate speech repairs in a synergistic way.
Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Takehiko Maruyama and Yasuyoshi Inagaki
Spoken monologues feature greater sentence length and structural complexity than do spoken dialogues. To achieve high parsing performance for spoken monologues, it could prove effective to simplify the structure by dividing a sentence into suitable language units. This paper proposes a method for dependency parsing of Japanese monologues based on sentence segmentation. In this method, the dependency parsing is executed in two stages: at the clause level and the sentence level. First, the dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, the dependencies over clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows this method to be effective for efficient dependency parsing of Japanese monologue sentences.
This paper describes a
parser which generates parse trees with empty
elements in which traces and fillers are co-indexed. The parser is an unlexicalized PCFG parser which is guaranteed to return the most probable parse. The grammar is extracted from a version of the
Session Chair: Stanley Peters
Matthew Frampton and Oliver Lemon
We explore the use of restricted dialogue contexts in reinforcement learning (RL) of effective dialogue strategies for information seeking spoken dialogue systems (e.g. COMMUNICATOR (Walker et al., 2001)). The contexts we use are richer than previous research in this area, e.g. (Levin and Pieraccini, 1997; Schefer and Young, 2001; Singh et al., 2002; Pietquin, 2004), which use only slot-based information, but are much less complex than the full dialogue Information States explored in (Henderson et al., 2005), for which tractable learning is an issue. We explore how incrementally adding richer features allows learning of more effective dialogue strategies. We use 2 user simulations learned from COMMUNICATOR data (Walker et al., 2001; Georgila et al., 2005b) to explore the effects of different features on learned dialogue strategies. Our results show that adding the dialogue moves of the last system and user turns increases the average reward of the automatically learned strategies by 65:9% over the original (hand-coded) COMMUNICATOR systems, and by 7:8% over a baseline RL policy that uses only slot-status features. We show that the learned strategies exhibit an emergent focus switching strategy and effective use of the `give help' action.
Mihai Rotaru and Diane J. Litman
Speech recognition problems are a reality in current spoken dialogue systems. In order to better understand these phenomena, we study dependencies between speech recognition problems and several higher level dialogue factors that define our notion of student state: frustration/anger, certainty and correctness. We apply Chi Square (?2) analysis to a corpus of speech-based computer tutoring dialogues to discover these dependencies both within and across turns. Significant dependencies are combined to produce interesting insights regarding speech recognition problems and to propose new strategies for handling these problems. We also find that tutoring, as a new domain for speech applications, exhibits interesting tradeoffs and new factors to consider for spoken dialogue design.
Srinivas Bangalore, Giuseppe Di Fabbrizio and Amanda Stent
Data-driven techniques have been used for many computational linguistics tasks. Models derived from data are generally more robust than hand-crafted systems since they better reflect the distribution of the phenomena being modeled. With the availability of large corpora of spoken dialog, dialog management is now reaping the benefits of data-driven techniques. In this paper, we compare two approaches to modeling subtask structure in dialog: a chunk-based model of subdialog sequences, and a parse-based, or hierarchical, model. We evaluate these models using customer agent dialogs from a catalog service domain.
Session Chair: Hal Daumé
Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Greiner and Dale Schuurmans
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.
Jun Suzuki, Erik McDermott and Hideki Isozaki
This paper proposes a framework for training Conditional Random Fields (CRFs) to optimize multivariate evaluation measures, including non-linear measures such as F-score. Our proposed framework is derived from an error minimization approach that provides a simple solution for directly optimizing any evaluation measure. Specifically focusing on sequential segmentation tasks, i.e. text chunking and named entity recognition, we introduce a loss function which closely reflects the target evaluation measure for these tasks, namely, segmentation F-score. Our experiments show that our method performs better than standard CRF training.
Jianfeng Gao, Hisami Suzuki and Bin Yu
Lasso is a regularization method for parameter estimation in linear models. It optimizes the model parameters with respect to a loss function subject to model complexities. This paper explores the use of lasso for statistical language modeling for text input. Owing to the very large number of parameters, directly optimizing the penalized lasso loss function is impossible. Therefore, we investigate two approximation methods, the boosted lasso (BLasso) and the forward stagewise linear regression (FSLR). Both methods, when used with the exponential loss function, bear strong resemblance to the boosting algorithm which has been used as a discriminative training method for language modeling. Evaluations on the task of Japanese text input show that BLasso is able to produce the best approximation to the lasso solution, and leads to a significant improvement, in terms of character error rate, over boosting and the traditional maximum likelihood estimation.
Session Chair: John Prager
Tsunenori Ishioka and Masayuki Kameda
We have developed an automated Japanese essay scoring system called Jess. The system needs expert writings rather than expert raters to build the evaluation model. By detecting statistical outliers of predetermined aimed essay features compared with many professional writings for each prompt, our system can evaluate essays. The following three features are examined: (1) rhetoric – syntactic variety, or the use of various structures in the arrangement of phases, clauses, and sentences, (2) organization – characteristics associated with the orderly presentation of ideas, such as rhetorical features and linguistic cues, and (3) content – vocabulary related to the topic, such as relevant information and precise or specialized vocabulary. The final evaluation score is calculated by deducting from a perfect score assigned by a learning process using editorials and columns from the Mainichi Daily News newspaper. A diagnosis for the essay is also given.
Ryo Nagata, Atsuo Kawai, Koichiro Morihiro and Naoki Isu
This paper proposes a method for detecting errors in article usage and singular plural usage based on the mass count distinction. First, it learns decision lists from training data generated automatically to distinguish mass and count nouns. Then, in order to improve its performance, it is augmented by feedback that is obtained from the writing of learners. Finally, it detects errors by applying rules to the mass count distinction. Experiments show that it achieves a recall of 0.71 and a precision of 0.72 and outperforms other methods used for comparison when augmented by feedback.
Chris Brockett, William B. Dolan and Michael Gamon
This paper presents a
pilot study of the use of phrasal Statistical Machine Translation (
Session Chair: Joakim Nivre
Jens Nilsson, Joakim Nivre and Johan Hall
Transforming syntactic representations in order to improve parsing accuracy has been exploited successfully in statistical parsing systems using constituency-based representations. In this paper, we show that similar transformations can give substantial improvements also in data-driven dependency parsing. Experiments on the Prague Dependency Treebank show that systematic transformations of coordinate structures and verb groups result in a 10% error reduction for a deterministic data-driven dependency parser. Combining these transformations with previously proposed techniques for recovering non-projective dependencies leads to state-of-the-art accuracy for the given data set.
Session Chair: Stanley Peters
Ryuichiro Higashinaka, Rashmi Prasad and Marilyn A. Walker
Spoken language generation for dialogue systems requires a dictionary of mappings between semantic representations of concepts the system wants to express and realizations of those concepts. Dictionary creation is a costly process; it is currently done by hand for each dialogue domain. We propose a novel unsupervised method for learning such mappings from user reviews in the target domain, and test it on restaurant reviews. We test the hypothesis that user reviews that provide individual ratings for distinguished attributes of the domain entity make it possible to map review sentences to their semantic representation with high precision. Experimental analyses show that the mappings learned cover most of the domain ontology, and provide good linguistic variation. A subjective user evaluation shows that the consistency between the semantic representations and the learned realizations is high and that the naturalness of the realizations is higher than a hand-crafted baseline.
Measuring Language Divergence by Intra-Lexical Comparison
T. Mark Ellison and Simon Kirby
This paper presents a method for building genetic language taxonomies based on a new approach to comparing lexical forms. Instead of comparing forms cross-linguistically, a matrix of language-internal similarities between forms is calculated. These matrices are then compared to give distances between languages. We argue that this coheres better with current thinking in linguistics and psycholinguistics. An implementation of this approach, called PHILOLOGICON, is described, along with its application to Dyen et al.'s (1992) ninety-five wordlists from Indo-European languages.
Session Chair: John Prager
Olivier Ferret and Michael Zock
A good dictionary contains not only many entries and a lot of information concerning each one of them, but also adequate means to reveal the stored information. Information access depends crucially on the quality of the index. We will present here some ideas of how a dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic resource an index based on the notion of association. We will also present preliminary work of how a subset of such associations, for example, topical associations, can be acquired by filtering a network of lexical co-occurrences extracted from a corpus.
Session Chair: Dan Klein
Kilian A. Foth, Tomas By and Wolfgang Menzel
We investigate the utility of supertag information for guiding an existing dependency parser of German. Using weighted constraints to integrate the additionally available information, the decision process of the parser is influenced by changing its preferences, without excluding alternative structural interpretations from being considered. The paper reports on a series of experiments using varying models of supertags that significantly increase the parsing accuracy. In addition, an upper bound on the accuracy that can be achieved with perfect supertags is estimated.
Session Chair: Chu Ren Huang
Dmitry Davidov and Ari Rappoport
a novel approach for discovering word categories, sets of words sharing a
significant aspect of their meaning. We utilize meta-patterns of high-frequency
words and content words in order to discover pattern candidates. Symmetric
patterns are then identified using graph-based measures, and word categories
are created based on graph clique sets. Our method is the first pattern-based
method that requires no corpus annotation or manually provided seed patterns or
words. We evaluate our algorithm on very large corpora in two languages, using
both human judgments and WordNet-based evaluation. Our fully unsupervised
results are superior to previous work that used a
Session Chair: Simone Teufel
We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
Session Chair: Johan Bos
Peter D. Turney
We present an unsupervised learning algorithm that mines
large text corpora for patterns that express implicit semantic relations. For a given
input word pair X:Y with some unspecified semantic relations, the corresponding
output list of patterns <P1,...,Pm> is ranked according to how well each
pattern Pi expresses the relations between X and Y. For example, given
X=ostrich and Y=bird, the two highest-ranking output patterns are "X is
the largest Y" and "Y such as the X". The output patterns are
intended to be useful for finding further pairs with the same relations, to
support the construction of lexicons, ontologies, and semantic networks. The
patterns are sorted by pertinence, where the pertinence of a pattern Pi for a
word pair X:Y is the expected relational similarity between the given pair and
typical pairs for Pi. The algorithm is empirically evaluated on two tasks,
Session Chair: Owen Rambow
Kilian A. Foth and Wolfgang Menzel
In this paper we investigate the benefit of stochastic predictor components for the parsing quality which can be obtained with a rule-based dependency grammar. By including a chunker, a supertagger, a PP attacher, and a fast probabilistic parser we were able to improve upon the baseline by 3.2%, bringing the overall labelled accuracy to 91.1% on the German NEGRA corpus. We attribute the successful integration to the ability of the underlying grammar model to combine uncertain evidence in a soft manner, thus avoiding the problem of error propagation.
Benoît Sagot and Éric de La Clergerie
We introduce an error mining technique for automatically detecting errors in resources that are used in parsing systems. We applied this technique on parsing results produced on several million words by two distinct parsing systems, which share the syntactic lexicon and the pre-parsing processing chain. We were thus able to identify missing and erroneous information in these resources.
David McClosky, Eugene Charniak and Mark Johnson
Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard "Charniak parser" checks in at a labeled precision-recall f-measure of 89.7% on the Penn WSJ test set, but only 82.9% on the test set from the Brown treebank corpus.
This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (McClosky et al. 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4%.
Session Chair: Chu Ren Huang
Anna Korhonen, Yuval Krymolowski and Nigel Collier
Lexical classes, when
tailored to the application and domain in question, can provide an effective
means to deal with a number of natural language processing (NLP) tasks. While
manual construction of such classes is difficult, recent research shows that it
is possible to automatically induce verb classes from cross-domain corpora with
promising accuracy. We report a novel experiment where similar technology is
applied to the important, challenging domain of biomedicine. We show that the
resulting classification, acquired from a corpus of biomedical journal
articles, is highly accurate and strongly domain specific. It can be used to
Masato Hagiwara, Yasuhiro Ogawa and Katsuhiko Toyama
Various methods have been proposed for automatic synonym acquisition, as synonyms are one of the most fundamental lexical knowledge. Whereas many methods are based on contextual clues of words, little attention has been paid to what kind of categories of contextual information are useful for the purpose. This study has experimentally investigated the impact of contextual information selection, by extracting three kinds of word relationships from corpora: dependency, sentence co-occurrence, and proximity. The evaluation result shows that while dependency and proximity perform relatively well by themselves, combination of two or more kinds of contextual information gives more stable performance. We’ve further investigated useful selection of dependency relations and modification categories, and it is found that modification has the greatest contribution, even greater than the widely adopted subject object combination.
James Gorman and James R. Curran
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naive nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (O (n2) in the vocabulary size).
In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance.
Session Chair: Simone Teufel
Wenjie Li, Mingli Wu, Qin Lu, Wei Xu and Chunfa Yuan
Event-based summarization attempts to select and organize sentences in a summary with respect to events or sub-events that the sentences describe. Each event has its own internal structure and meanwhile relates to the other events semantically, temporally, spatially, causally or conditionally. In this paper, we define an event as one or more event terms along with the named entities associated, and present a novel approach to derive intra- and inter- event relevance using the information of internal association, semantic related-ness, distributional similarity and named entity clustering. We then apply PageRank ranking algorithm to estimate the significance of an event for inclusion in a summary from the event relevance derived. Experiments on the DUC 2001 test data shows that the relevance of the named entities involved in events achieves better result when their relevance is derived from the event terms they associate. It also reveals that the topic-specific from documents themselves outperforms the semantic relevance from a general purpose knowledge base like Word-Net.
James Clarke and Mirella Lapata
Sentence compression is the task of producing a summary at the sentence level. This paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation. We provide a novel comparison between a supervised constituent-based and a weakly supervised word-based compression algorithm and examine how these models port to different domains (written vs. spoken text). To achieve this, a human-authored compression corpus has been created and our study highlights potential problems with the automatically gathered compression corpora currently used. Finally, we assess whether automatic evaluation measures can be used to determine compression quality.
Danushka Bollegala, Naoaki Okazaki and Mitsuru Ishizuka
Ordering information is a difficult but important task for applications generating natural-language text. We present a bottom-up approach to arranging sentences extracted for multi-document summarization. To capture the association and order of two textual segments (eg, sentences), we define four criteria, chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion until we obtain the overall segment with all sentences arranged. Our experimental results show a significant improvement over existing sentence ordering strategies.
Session Chair: Johan Bos
Feng Pan, Rutu Mulkar and Jerry R. Hobbs
We have constructed a corpus of news articles in which events are annotated for estimated bounds on their durations. Here we describe a method for measuring inter-annotator agreement for these event duration distributions. We then show that machine learning techniques applied to this data yield coarse-grained event duration information, considerably outperforming a baseline and approaching human performance.
Fabio Massimo Zanzotto and Alessandro Moschitti
In this paper we define a novel similarity measure between examples of textual entailments and we use it as a kernel function in Support Vector Machines (SVMs). This allows us to automatically learn the rewrite rules that describe a non trivial set of entailment cases. The experiments with the data sets of the RTE 2005 challenge show an improvement of 4.4% over the state-of-the-art methods.
Alexander Koller and Stefan Thater
We present an efficient algorithm for the redundancy elimination problem: Given an underspecified semantic representation (USR) of a scope ambiguity, compute an USR with fewer mutually equivalent readings. The algorithm operates on underspecified chart representations which are derived from dominance graphs; it can be applied to the USRs computed by large-scale grammars. We evaluate the algorithm on a corpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime.
Session Chair: Takashi Ninomiya
Amit Dubey, Frank Keller and Patrick Sturt
The psycholinguistic literature provides evidence for syntactic priming, i.e., the tendency to repeat structures. This paper describes a method for incorporating priming into an incremental probabilistic parser. Three models are compared, which involve priming of rules between sentences, within sentences, and within coordinate structures. These models simulate the reading time advantage for parallel structures found in human data, and also yield a small increase in overall parsing accuracy.
Mengqiu Wang, Kenji Sagae and Teruko Mitamura
We present a novel classifier-based deterministic parser
for Chinese constituency parsing. Our parser computes parse trees from bottom
up in one pass, and uses classifiers to make shift-reduce decisions. Trained
and evaluated on the standard training and test sets, our best model (using
stacked classifiers) runs in linear time and has labeled precision and recall
above 88% using gold-standard part-of-speech tags, surpassing the best
published results. Our
Slav Petrov, Leon Barrett, Romain Thibaux and Dan Klein
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.
Session Chair: Hwee Tou Ng
Oana Frunza and Diana Inkpen
Partial cognates are pairs of words in two languages that have the same meaning in some, but not all contexts. Detecting the actual meaning of a partial cognate in con-text can be useful for Machine Translation tools and for Computer-Assisted Language Learning tools. In this paper we propose a supervised and a semi-supervised method to disambiguate partial cognates between two languages: French and English. The methods use only automatically-labeled data; therefore they can be applied for other pairs of languages as well. We also show that our methods perform well when using corpora from different domains.
Ido Dagan, Oren Glickman, Alfio Gliozzo, Efrat Marmorshtein and Carlo Strapparava
This paper investigates conceptually and empirically the novel sense matching task, which requires to recognize whether the senses of two synonymous words match in context. We suggest direct approaches to the problem, which avoid the intermediate step of explicit word sense disambiguation, and demonstrate their appealing advantages and stimulating potential for future research.
Zhimao Lu, Haifeng Wang, Jianmin Yao, Ting Liu and Sheng Li
This paper presents a new approach based on Equivalent Pseudowords (EPs) to tackle Word Sense Disambiguation (WSD) in Chinese language. EPs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the EP solution on Senseval-3 Chinese test set. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of EP for unsupervised WSD.
Session Chair: Ming Zhou
Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka and Jun'ichi Tsujii
paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks
with a tractable computational cost. Our framework can handle an
Radu Florian, Hongyan Jing, Nanda Kambhatla and Imed Zitouni
As natural language processing moves towards natural language understanding, the tasks are becoming more and more subtle: we are interested in more nuanced word characteristics, more linguistic properties, more semantic and syntactic features. One such example, which we consider in this article, is the mention detection in the ACE project (NIST, 2004), where the goal is to identify named, nominal or pronominal references to real-world entities – mentions – and label them with three types of information: entity type, entity subtype and mention type. In this article, we investigate several methods to assign these related tags and compare them on several data sets. A system based on the methods presented in this article ranked very well in the ACE’04 evaluation.
Zhenmei Gu and Nick Cercone
Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, an HMM is used to model text at the document level. This modeling might cause undesired redundancy in extraction in the sense that more than one filler is identified and extracted. We propose to use HMMs to model text at the segment level, in which the extraction process consists of two steps: a segment retrieval step followed by an extraction step. In order to retrieve extraction relevant segments from documents, we introduce a method to use HMMs to model and retrieve segments. Our experimental results show that the resulting segment HMM IE system not only achieves near zero extraction redundancy, but also has better overall extraction performance than traditional document HMM IE systems.
Session Chair: Erhard Hinrichs
Lei Shi, Cheng Niu, Ming Zhou and Jianfeng Gao
This paper presents a
new web mining
scheme for parallel data acquisition. Based on the Document Object Model (
John Judge, Aoife Cahill and Josef van Genabith
This paper describes
the development of QuestionBank, a corpus of 4000 parse annotated questions for
(i) use in training parsers employed in QA, and (ii) evaluation of question
parsing. We present a series of experiments to investigate the effectiveness of
QuestionBank as both an exclusive and supplementary training resource for a
state-of-the-art parser in parsing both question and non-question test sets. We
introduce a new method for recovering empty nodes and their antecedents
(capturing long distance dependencies) from parser output in CFG trees using
We present an algorithm
which creates a German CCGbank by translating the syntax graphs in the German
Tiger corpus into
Session Chair: Kevin Knight
Robert C. Moore, Wen-tau Yih and Andreas Bode
For many years, statistical machine translation relied on generative models to provide bilingual word alignments. In 2005, several independent efforts showed that discriminative models could be used to enhance or replace the standard generative approach. Building on this work, we demonstrate substantial improvement in word-alignment accuracy, partly though improved training methods, but predominantly through selection of more and better features. Our best model produces the lowest alignment error rate yet reported on Canadian Hansard’s bilingual data.
Deyi Xiong, Qun Liu and Shouxun Lin
We propose a novel reordering model for phrase-based statistical machine translation
Yaser Al-Onaizan and Kishore Papineni
In this paper, we argue that n-gram language models are
not sufficient to address word reordering required for Machine Translation. We
propose a new distortion model that can be used with existing phrase-based
Session Chair: Janyce Wiebe
Anette Hulth and Beáta B. Megyesi
This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance – as measured by micro-averaged F-measure on a standard text categorization collection – is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for experiments in which the keywords are the only input to the categorizer, either represented as unigrams or intact. Of these two experiments, the unigrams have the best performance, although neither performs as well as headlines only
Jingyang Li, Maosong Sun and Xian Zhang
Words and character-bigrams are both used as features in Chinese text processing tasks, but no systematic comparison or analysis of their values as features for Chinese text categorization has been reported heretofore. We carry out here a full performance comparison between them by experiments on various document collections (including a manually word-segmented corpus as a golden standard), and a semi-quantitative analysis to elucidate the characteristics of their behavior; and try to provide some preliminary clue for feature term choice (in most cases, character-bigrams are better than words) and dimensionality setting in text categorization systems.
Alfio Gliozzo and Carlo Strapparava
Cross-language Text Categorization is the task to assign semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian).
In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora.
Experiments show the effectiveness of our approach, providing a low cost solution for the Cross Language Text Categorization task. In particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization.
Session Chair: Anoop Sarkar
Qi Zhang, Fuliang Weng and Zhe Feng
Recent developments in statistical modeling of various
linguistic phenomena have shown that additional features give consistent
performance improvements. Quite often, improvements are limited by the number
of features a system is able to explore. This paper describes a novel
progressive training algorithm that selects features from virtually unlimited
feature spaces for conditional maximum entropy (
Noah A. Smith and Jason Eisner
We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward "broken" hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005), apply the latter to grammar induction in si languages, and show that our new approach improves accuracy by 1-17% (absolute) over CE (and 8-30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hidden-structure discovery problems.
Imed Zitouni, Jeffrey S. Sorensen and Ruhi Sarikaya
Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document. The approach can easily integrate and make effective use of diverse types of information; the model we propose integrates a wide array of lexical, segment based and part-of-speech tag features. The combination of these feature types leads to a state-of-the-art diacritization model. Using a publicly available corpus (LDC's Arabic Treebank Part 3), we achieve a diacritic error rate of 5:1%, a segment error rate 8:5%, and a word error rate of 17:3%. In case-ending-less setting, we obtain a diacritic error rate of 2:2%, a segment error rate 4:0%, and a word error rate of 7:2%.
Session Chair: Jian-Yun Nie
Yuanhua Lv, Le Sun, Junlin Zhang, Jian-Yun Nie, Wan Chen and Wei Zhang
General information retrieval systems are designed to serve all users without considering individual needs. In this paper, we propose a novel approach to personalized search. It can, in a unified way, exploit and utilize implicit feedback information, such as query logs and immediately viewed documents. Moreover, our approach can implement result re-ranking and query expansion simultaneously and collaboratively. Based on this approach, we develop a client-side personalized web search agent PAIR (Personalized Assistant for Information Retrieval), which supports both English and Chinese. Our experiments on TREC and HTRDP collections clearly show that the new approach is both effective and efficient.
Jiang Zhu and Haifeng Wang
This paper explores the relationship between the translation quality and the retrieval effectiveness in Machine Translation (MT) based Cross-Language Information Retrieval (CLIR). To obtain MT systems of different translation quality, we degrade a rule-based MT system by decreasing the size of the rule base and the size of the dictionary. We use the degraded MT systems to translate queries and submit the translated queries of varying quality to the IR system. Retrieval effectiveness is found to correlate highly with the translation quality of the queries. We further analyze the factors that affect the retrieval effectiveness. Title queries are found to be preferred in MT-based CLIR. In addition, dictionary-based degradation is shown to have stronger impact than rule-based degradation in MT-based CLIR.
The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are 5.5 and 10.4 higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of 1.6 for documents and 1.7 for sentences and terms. We conclude with an analysis of IDF stability with respect to random, journal, and section partitions of the 100,830 full-text scientific articles in our experimental corpus.
Yang Liu, Qun Liu and Shouxun Lin
We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntax-based because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.
Katsuhito Sudoh, Hajime Tsukada and Hideki Isozaki
This paper proposes a named entity recognition (
Ryu Iida, Kentaro Inui and Yuji Matsumoto
We approach the zero-anaphora resolution problem by decomposing it into intra-sentential and inter-sentential zero-anaphora resolution. For the former problem, syntactic patterns of the appearance of zero-pronouns and their antecedents are useful clues. Taking Japanese as a target language, we empirically demonstrate that incorporating rich syntactic pattern features in a state-of-the-art learning-based anaphora resolution model dramatically improves the accuracy of intra-sentential zero-anaphora, which consequently improves the overall performance of zero-anaphora resolution.
Seong-Bae Park, Yoon-Shik Tae and Se-Young Park
Automatic word spacing is one of the important tasks in Korean language processing and information retrieval. Since there are a number of confusing cases in word spacing of Korean, there are some mistakes in many texts including news articles. This paper presents a high-accurate method for automatic word spacing based on self-organizing n-gram model. This method is basically a variant of n-gram model, but achieves high accuracy by automatically adapting context size.
In order to find the optimal context size, the proposed method automatically increases the context size when the contextual distribution after increasing it does not agree with that of the current context. It also decreases the context size when the distribution of reduced context is similar to that of the current context. This approach achieves high accuracy by considering higher dimensional data in case of necessity, and the increased computational cost are compensated by the reduced context size. The experimental results show that the self-organizing structure of n-gram model enhances the basic model.
Session Chair: Michael White
Qing Li, Sung-Hyon Myaeng, Yun Jin and Bo-yeong Kang
Due to the historical and cultural reasons, English phases, especially the proper nouns and new words, frequently appear in Web pages written primarily in Asian languages such as Korean and Chinese. Although these English terms and their equivalences in the Asian language refer to the same concept, they are erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and suggests a novel technique to solve it. Our method firstly extracts the English phrase from Asian language Web pages, and then unifies the extracted phrase and its equivalence(s) in the language as one index unit. Experimental results show that the high precision of our conceptual unification approach greatly improves the IR performance.
Niladri Chatterjee and Saumya Agrawal
Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches, proposed in (Fung and McKeown, 1994) and (Somers, 1998), respectively, for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory. However, subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus. The present paper discusses the new version of the algorithm and its performance in detail.
Badam-Osor Khaltar, Atsushi Fujii and Tetsuya Ishikawa
This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words as loanwords. In addition, we correspond the extracted loanwords to Japanese words and produce a bilingual dictionary. We propose a stemming method for Mongolian to extract loanwords correctly. We verify the effectiveness of our methods experimentally.
Session Chair: Yuji Matsumoto
Meni Adler and Michael Elhadad
Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. We present an unsupervised stochastic model - the only resource we use is a morphological analyzer - which deals with the data sparseness problem caused by the affixational morphology of the Hebrew language.
We present a text encoding method for languages with affixational morphology in which the knowledge of word formation rules (which are quite restricted in Hebrew) helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large-scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with affix morphology.
Sharon Goldwater, Thomas L. Griffiths and Mark Johnson
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures.
Nizar Habash and Owen Rambow
We present MAGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. MAGEAD performs an on-line analysis to or generation from a root+pattern+features representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects. We present a detailed evaluation of MAGEAD.
Session Chair: Jan Hajič
Yoav Goldberg, Meni Adler and Michael Elhadad
We present a method for Noun Phrase chunking in Hebrew. We
show that the traditional definition of base-NPs as non-recursive noun phrases
does not apply in Hebrew, and propose an alternative definition of Simple
NPs. We review syntactic properties
of Hebrew related to noun phrases, which indicate that the task of Hebrew
SimpleNP chunking is harder than base-NP chunking in English. As a
confirmation, we apply methods known to work well for English to Hebrew data.
These methods give low results (F from 76 to 86) in Hebrew. We then discuss our
method, which applies
James R. Curran, Stephen Clark and David Vadas
With performance above
97% accuracy for newspaper text, part of speech (
We describe a
multi-tagging approach which maintains a suitable level of lexical category
ambiguity for accurate and efficient
Tetsuji Nakagawa and Yuji Matsumoto
In this paper, we
present a method for guessing
Session Chair: Marine Carpuat
Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Results show that these Chinese punctuation marks, though fewer in number than cue phrases, are easy to identify, have strong correlation with certain relations, and can be used as distinctive indicators of nuclearity in Chinese texts.
Current parsing models are not immediately applicable for languages that exhibit strong interaction between morphology and syntax, e.g., Modern Hebrew (MH), Arabic and other Semitic languages. This work represents a first attempt at modeling morphological-syntactic interaction in a generative probabilistic framework to allow for MH parsing. We show that morphological information selected in tandem with syntactic categories is instrumental for parsing Semitic languages. We further show that redundant morphological information helps syntactic disambiguation.
We present a novel hybrid approach for Word Sense Disambiguation (WSD) which makes use of a relational formalism to represent instances and background knowledge. It is built using Inductive Logic Programming techniques to combine evidence coming from both sources during the learning process, producing a rule-based WSD model. We experimented with this approach to disambiguate 7 highly ambiguous verbs in English- Portuguese translation. Results showed that the approach is promising, achieving an average accuracy of 75%, which outperforms the other machine learning techniques investigated (66%).
Session Chair: Alon Lavie
Masaaki Nagata, Kuniko Saito, Kazuhide Yamamoto and Kazuteru Ohashi
In this paper, we present a novel global reordering model that can be incorporated into standard phrase-based statistical machine translation. Unlike previous local reordering models that emphasize the reordering of adjacent phrase pairs [Tillmann-Zhang05], our model explicitly models the reordering of long distances by directly estimating the parameters from the phrase alignments of bilingual training sentences. In principle, the global phrase-reordering model is conditioned on the source and target phrases that are currently being translated, and the previously translated source and target phrases. To cope with sparseness, we use N-best phrase alignments and bilingual phrase clustering, and investigate a variety of combinations of conditioning factors. Through experiments, we show, that the global reordering model significantly improves the translation accuracy of a standard Japanese-English translation task.
Christoph Tillmann and Tong Zhang
paper presents a novel training algorithm for a linearly-scored block sequence
translation model. The key component is a new procedure to directly optimize
the global scoring function used by a
Session Chair: Roland Kuhn
Shinsuke Mori, Daisuke Takuma and Gakuto Kurata
The noisy channel model approach is successfully applied to various natural language processing tasks. Currently the main research focus of this approach is adaptation methods, how to capture characteristics of words and expressions in a target domain given example sentences in that domain. As a solution we describe a method enlarging the vocabulary of a language model to an almost infinite size and capturing their context information. Especially the new method is suitable for languages in which words are not delimited by whitespace. We applied our method to a phoneme-to-text transcription task in Japanese and reduced about 10% of the errors in the results of an existing method.
Shourya Roy and L Venkata Subramaniam
Call centers handle customer queries from various domains such as computer sales and support, mobile phones, car rental, etc. Each such domain generally has a domain model which is essential to handle customer complaints. These models contain common problem categories, typical customer issues and their solutions, greeting styles. Currently these models are manually created over time. Towards this, we propose an unsupervised technique to generate domain models automatically from call transcriptions. We use a state of the art Automatic Speech Recognition system to transcribe the calls between agents and customers, which still results in high word error rates (40%) and show that even from these noisy transcriptions of calls we can automatically build a domain model. The domain model is comprised of primarily a topic taxonomy where every node is characterized by topic(s), typical Questions-Answers (Q&As), typical actions and call statistics. We show how such a domain model can be used for topic identification of unseen calls. We also propose applications for aiding agents while handling calls and for agent monitoring based on the domain model.
Session Chair: Daniel Marcu
John D. Kelleher, Geert-Jan M. Kruijff and Fintan J. Costello
The paper presents a new model for context-dependent interpretation of linguistic expressions about spatial proximity between objects in a natural scene. The paper discusses novel psycholinguistic experimental data that tests and verifies the model. The model has been implemented, and enables a conversational robot to identify objects in a scene through topological spatial relations (e.g. “X near Y''). The model can help motivate the choice between topological and projective prepositions.
Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee and James Pustejovsky
This paper investigates a machine learning approach for temporally ordering and anchoring events in natural language texts. To address data sparseness, we used temporal reasoning as an over-sampling method to dramatically expand the amount of training data, resulting in predictive accuracy on link labeling as high as 93% using a Maximum Entropy classifier on human annotated data. This method compared favorably against a series of increasingly sophisticated baselines involving expansion of rules derived from human intuitions.
Session Chair: Kevin Duh
An open-domain spoken dialog system has to deal with the challenge of lacking lexical as well as conceptual knowledge. As the real world is constantly changing, it is not possible to store all necessary knowledge beforehand. Therefore, this knowledge has to be acquired during the run time of the system, with the help of the out-of-vocabulary information of a speech recognizer. As every word can have various meanings depending on the context in which it is uttered, additional context information is taken into account, when searching for the meaning of such a word. In this paper, I will present the incremental ontology learning framework On2L. The defined tasks for the framework are: the hypernym extraction from Internet texts for unknown terms delivered by the speech recognizer; the mapping of those and their hypernyms into ontological concepts and instances; and the following integration of them into the system’s ontology.
We analyze the concept of focus in speech and the relationship between focus and speech acts for prosodic generation. We determine how the speaker’s utterances are influenced by speaker’s intention. The relationship between speech acts and focus information is used to define which parts of the sentence serve as the focus parts. We propose the Focus to Emphasize Tones (FET) structure to analyze the focus components. We also design the FET grammar to analyze the intonation patterns and produce tone marks as a result of our analysis. We present a proof-of-the-concept working example to validate our proposal. More comprehensive evaluations are part of our current work.
Session Chair: Alon Lavie
Percy Liang, Alexandre Bouchard-Côté, Dan Klein and Ben Taskar
We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven discriminative approaches. In particular, we explore different ways of updating parameters given a training example. We find that making frequent but smaller updates is preferable to making fewer but larger updates. Then, we discuss an array of features and show both how they quantitatively increase BLEU score and how they qualitatively interact on specific examples. One particular feature we investigate is a novel way to introduce learning into the initial phrase extraction process, which has previously been entirely heuristic.
Alexander Fraser and Daniel Marcu
We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.
Taro Watanabe, Hajime Tsukada and Hideki Isozaki
We present a hierarchical phrase-based statistical machine translation in which a target sentence is efficiently generated in left-to-right order. The model is a class of synchronous-CFG with a Greibach Normal Form-like structure for the projected production rule: The paired target-side of a production rule takes a phrase-prefixed form. The decoder for the target-normalized form is based on an Early-style top down parser on the source side. The target-normalized form coupled with our top down parser implies a left-to-right generation of translations which enables us a straightforward integration with ngram language models. Our model was experimented on a Japanese-to-English newswire translation task, and showed statistically significant performance improvements against a phrase-based translation system.
Session Chair: Nicoletta Calzolari
Joachim Wermter and Udo Hahn
In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of co-occurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.
Marco Pennacchiotti and Patrick Pantel
Many algorithms have been developed to harvest lexical semantic resources, however few have linked the mined knowledge into formal knowledge repositories. In this paper, we propose two algorithms for automatically ontologizing (attaching) semantic relations into WordNet. We present an empirical evaluation on the task of attaching partof and causation relations, showing an improvement on F-score over a baseline model.
Rion Snow, Daniel Jurafsky and Andrew Y. Ng
We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on independent classifiers for discovering new single relationships based on hand-constructed or automatically discovered textual patterns. By contrast, our algorithm flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word’s coordinate terms to help in determining its hypernyms, and vice versa. We apply our algorithm on the problem of sense-disambiguated noun hyponym acquisition, where we combine the predictions of hypernym and coordinate term classifiers with the knowledge in a preexisting semantic taxonomy (WordNet 2.1). We add 10; 000 novel synsets to WordNet 2.1 at 84% precision, a relative error reduction of 70% over a non-joint algorithm using the same component classifiers. Finally, we show that a taxonomy built using our algorithm shows a 23% relative F-score improvement over WordNet 2.1 on an independent testset of hypernym pairs.
Session Chair: Yorick Wilks
Marius Paşca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits and Alpa Jain
In a new approach to large-scale extraction of facts from unstructured text, distributional similarities become an integral part of both the iterative acquisition of high-coverage contextual extraction patterns, and the validation and ranking of candidate facts. The evaluation measures the quality and coverage of facts extracted from one hundred million Web documents, starting from ten seed facts and using no additional knowledge, lexicons or complex tools.
Alexandre Klementiev and Dan Roth
Min Zhang, Jie Zhang, Jian Su and Guodong Zhou
This paper proposes a novel composite kernel for relation extraction. The composite kernel consists of two individual kernels: an entity kernel that allows for entity-related features and a convolution parse tree kernel that models syntactic information of relation examples. The motivation of our method is to fully utilize the nice properties of kernel methods to explore diverse knowledge for relation extraction. Our study illustrates that the composite kernel can effectively capture both flat and structured features without the need for extensive feature engineering, and can also easily scale to include more features. Evaluation on the ACE corpus shows that our method outperforms the previous best-reported methods and significantly outperforms previous two dependency tree kernels for relation extraction.
Session Chair: Stephen Wan
We present the implementation of a system which extracts
not only lexicalized grammars but also feature-based lexicalized grammars from
Korean Sejong Treebank. We report on some practical experiments where we
In this paper, we compare the performance of a state-of-the-art statistical parser (Bikel, 2004) in parsing written and spoken language and in generating subcategorization cues from written and spoken language. Although Bikel’s parser achieves a higher accuracy for parsing written language, it achieves a higher accuracy when extracting subcategorization cues from spoken language. Additionally, we explore the utility of punctuation in helping parsing and extraction of subcategorization cues. Our experiments show that punctuation is of little help in parsing spoken language and extracting subcategorization cues from spoken language. This indicates that there is no need to add punctuation in transcribing spoken corpora simply in order to help parsers.
We introduce a new multi-threaded parsing algorithm on unification grammars designed specifically for multimodal interaction and noisy environments. By lifting some traditional constraints, namely those related to the ordering of constituents, we overcome several difficulties of other systems in this domain. We also present several criteria used in this model to constrain the search process using dynamically loadable scoring functions. Some early analyses of our implementation are discussed.
Session Chair: Srinivas Bangalore
Takeshi Abekawa and Manabu Okumura
In this paper, we present a method that improves Japanese dependency parsing by using large-scale statistical information. It takes into account two kinds of information not considered in previous statistical (machine learning based) parsing methods: information about dependency relations among the case elements of a verb, and information about co-occurrence relations between a verb and its case element. This information can be collected from the results of automatic dependency parsing of large-scale corpora. The results of an experiment in which our method was used to rerank the results obtained using an existing machine learning based parsing method showed that our method can improve the accuracy of the results obtained using the existing method.
Session Chair: Dan Moldovan
Dina Demner-Fushman and Jimmy Lin
This paper presents a hybrid approach to question answering in the clinical domain that combines techniques from summarization and information retrieval. We tackle a frequently-occurring class of questions that takes the form “What is the best drug treatment for X?” Starting from an initial set of MEDLINE citations, our system first identifies the drugs under study. Abstracts are then clustered using semantic classes from the UMLS ontology. Finally, a short extractive summary is generated for each abstract to populate the clusters. Two evaluations—a manual one focused on short answers and an automatic one focused on the supporting abstracts—demonstrate that our system compares favorably to PubMed, the search system most widely used by physicians today.
Session Chair: Alexander Koller
Fabio Massimo Zanzotto, Marco Pennacchiotti and Maria Teresa Pazienza
In this paper we investigate a novel method to detect asymmetric entailment relations between verbs. Our starting point is the idea that some point-wise verb selectional preferences carry relevant semantic information. Experiments using WordNet as a gold standard show promising results. Where applicable, our method, used in combination with other approaches, significantly increases the performance of entailment detection. A combined approach including our model improves the AROC of 5% with respect to standard models.
Session Chair: Eva Hajičová
Harry Halpin and Johanna D. Moore
In this paper we present how the automatic extraction of events from text can be used to both classify narrative texts according to plot quality and produce advice in an interactive learning environment intended to help students with story writing. We focus on the story-rewriting task, in which an exemplar story is read to the students and the students rewrite the story in their own words. The system automatically extracts events from the raw text, formalized as a sequence of temporally ordered predicate-arguments. These events are given to a machine-learner that produces a coarse-grained rating of the story. The results of the machine-learner and the extracted events are then used to generate fine-grained advice for the students.
Session Chair: Srinivas Bangalore
We investigate generalizations of the all-subtrees "DOP"
approach to unsupervised parsing. Unsupervised DOP models assign all possible
binary trees to a set of sentences and
next use (a large random subset of) all subtrees from these binary trees to
compute the most probable parse trees. We will test both a relative frequency
estimator for unsupervised DOP and a maximum likelihood estimator which is
known to be statistically consistent. We report state-of-the-art results on
English (WSJ), German (NEGRA) and Chinese (
Joseph Turian and I. Dan Melamed
The present work advances the accuracy and training speed of discriminative parsing. Our discriminative parsing method has no generative component, yet surpasses a generative baseline on constituent parsing, and does so with minimal linguistic cleverness. Our model can incorporate arbitrary features of the input and parse state, and performs feature selection incrementally over an exponential feature space during training. We demonstrate the flexibility of our approach by testing it with several parsing strategies and various feature sets. Our implementation is freely available at: http://nlp.cs.nyu.edu/parser/.
Aria Haghighi and Dan Klein
We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type. This sparse prototype information is then propagated across a corpus using distributional similarity features, which augment an otherwise standard PCFG model. We show that distributional features are effective at distinguishing bracket labels, but not determining bracket locations. To improve the quality of the induced trees, we combine our PCFG induction with the CCM model of Klein and Manning (2002), which has complementary strengths: it identifies brackets but does not label them. Using only a handful of prototypes, we show substantial improvements over naive PCFG induction for English and Chinese grammar induction.
Session Chair: Dan Moldovan
Dan Shen and Dietrich Klakow
In this paper, we explore correlation of dependency
relation paths to rank candidate answers in answer extraction. Using the
correlation measure, we compare dependency relations of a candidate answer and
mapped question phrases in sentence with the corresponding relations in
question. Different from previous studies, we propose an approximate
phrase-mapping algorithm and incorporate the mapping score into the correlation
measure. The correlations are further incorporated into a Maximum Entropy-based
ranking model which estimates path weights from training. Experimental results
show that our method significantly outperforms state-of-the-art syntactic
relation-based methods by up to 20% in
Adrian Novischi and Dan Moldovan
This paper describes an algorithm for propagating verb arguments along lexical chains consisting of WordNet relations. The algorithm creates verb argument structures using VerbNet syntactic patterns. In order to increase the coverage, a larger set of verb senses were automatically associated with the existing patterns from VerbNet. The algorithm is used in an in-house Question Answering system for re-ranking the set of candidate answers. Tests on factoid questions from TREC 2004 indicate that the algorithm improved the system performance by 2.4%.
Sanda Harabagiu and Andrew Hickl
Work on the semantics of questions has argued that the relation between a question and its answer(s) can be cast in terms of logical entailment. In this paper, we demonstrate how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering (Q/A) systems. In our experiments, we show that when textual entailment information is used to either filter or rank answers returned by a Q/A system, accuracy can be increased by as much as 20% overall.
Session Chair: Alexander Koller
Rohit J. Kate and Raymond J. Mooney
We present a new approach for mapping natural language sentences to their formal meaning representations using string-kernel-based classifiers. Our system learns these classifiers for every production in the formal language grammar. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these string classifiers. Our experiments on two real-world data sets show that this approach compares favorably to other existing systems and is particularly robust to noise.
Rashid M. Abdalla and Simone Teufel
We investigate the unsupervised detection of semi-fixed cue phrases such as “This paper proposes a novel approach …” from unseen text, on the basis of only a handful of seed cue phrases with the desired semantics. The problem, in contrast to bootstrapping approaches for Question Answering and Information Extraction, is that it is hard to find a constraining context for occurrences of semi-fixed cue phrases. Our method uses components of the cue phrase itself, rather than external context, to bootstrap. It successfully excludes phrases which are different from the target semantics, but which look superficially similar. The method achieves 88% accuracy, outperforming standard bootstrapping approaches.
Ana-Maria Giuglea and Alessandro Moschitti
This article describes a robust semantic parser that uses a broad knowledge base created by interconnecting three major resources: FrameNet, VerbNet and PropBank. The FrameNet corpus contains the examples annotated with semantic roles whereas the VerbNet lexicon provides the knowledge about the syntactic behavior of the verbs. We connect VerbNet and FrameNet by mapping the FrameNet frames to the VerbNet Intersective Levin classes. The PropBank corpus, which is tightly connected to the VerbNet lexicon, is used to increase the verb coverage and also to test the effectiveness of our approach. The results indicate that our model is an interesting step towards the design of more robust semantic parsers.
Session Chair: Eva Hajičová
Gilles Sérasset, Francis Brunet-Manquat and Elena Chiocchetti
This paper presents the particular use of “Jibiki” (Papillon’s web server development platform) for the LexALP1 project. LexALP’s goal is to harmonise the terminology on spatial planning and sustainable development used within the Alpine Convention2, so that the member states are able to cooperate and communicate efficiently in the four official languages (French, German, Italian and Slovene). To this purpose, LexALP uses the Jibiki platform to build a term bank for the contrastive analysis of the specialised terminology used in six different national legal systems and four different languages. In this paper we present how a generic platform like Jibiki can cope with a new kind of dictionary.
G. Craig Murray, Bonnie Dorr, Jimmy Lin, Jan Hajič and Pavel Pecina
Thesauri and ontologies provide important value in facilitating access to digital archives by representing underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most ontologies precludes fully-automated machine translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations when constructing domain-specific lexical resources. We evaluate the effectiveness of this process by producing a probabilistic phrase dictionary and translating a thesaurus of 56,000 concepts used to catalogue a large archive of oral histories. Our experiments demonstrate a cost-effective technique for accurate machine translation of large ontologies.
Violeta Seretan and Eric Wehrli
This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the syntactic information) is first theoretically motivated, then empirically validated by a comparative evaluation experiment.
Session Chair: Dekai Wu
Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang and Ignacio Thayer
Statistical MT has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. Syntactic approaches seek to remedy these problems. In this paper, we take the framework for acquiring multi-level syntactic translation rules of (Galley et al., 2004) from aligned tree-string pairs, and present two main extensions of their approach: first, instead of merely computing a single derivation that minimally explains a sentence pair, we construct a large number of derivations that include contextually richer rules, and account for multiple interpretations of unaligned words. Second, we propose probability estimates and a training procedure for weighting these rules. We contrast different approaches on real examples, show that our estimates based on multiple derivations favor phrasal re-orderings that are linguistically better motivated, and establish that our larger rules provide a 3.63 BLEU point increase over minimal rules.
David Talbot and Miles Osborne
Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of their distributions over target types. We propose a language-independent framework for minimising lexical redundancy that can be optimised directly from parallel text. Optimisation of the source lexicon for a given target language is viewed as model selection over a set of cluster-based translation models.
Redundant distinctions between types may exhibit
monolingual regularities, for example, inflexion patterns. We define a prior
over model structure using a Markov random field and learn features over sets
of monolingual types that are predictive of bilingual redundancy. The prior
makes model selection more robust without the need for language-specific
assumptions regarding redundancy. Using these models in a phrase-based
Benjamin Wellington, Sonjia Waxmonsky and I. Dan Melamed
This paper describes a
study of the patterns of translational equivalence exhibited by a variety of
bitexts. The study found that the complexity of these patterns in every bitext
was higher than suggested in the literature. These findings shed new light on
why “syntactic” constraints have not helped to improve statistical translation
models, including finite state phrase-based models, tree-to-string models, and
tree-to-tree models. The paper also presents evidence that inversion
transduction grammars cannot generate some translational equivalence relations,
even in relatively simple real bitexts in syntactically similar languages with
rigid word order. Instructions for replicating our experiments are at
Session Chair: Jianfeng Gao
Yee Whye Teh
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.
Yunqing Xia, Kam-Fai Wong and Wenjie Li
Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormously troublesome because it makes static chat language corpus outdated quickly in representing contemporary chat language. To address the dynamic problem, we propose the phonetic mapping models to present mappings between chat terms and standard words via phonetic transcription, i.e. Chinese Pinyin in our case. Different from character mappings, the phonetic mappings can be constructed from available standard Chinese corpus. To perform the task of dynamic chat language term normalization, we extend the source channel model by incorporating the phonetic mapping models. Experimental results show that this method is effective and stable in normalizing dynamic chat language terms.
Jianfeng Li, Haifeng Wang, Dengjun Ren and Guohua Li
This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the performance loss caused by pruning the bigram. Then we propose a step-by-step growing algorithm to build the language model of desired size. Experimental results show that the discriminative pruning method leads to a much smaller model compared with the model pruned using the state-of-the-art method. At the same Chinese word segmentation F-measure, the number of bigrams in the model can be reduced by up to 90%. Correlation between language model perplexity and word segmentation performance is also discussed.
Session Chair: Rosie Jones
Hsin-Hsi Chen, Ming-Shun Lin and Yu-Chuan Wei
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-Occurrence Double Check (CODC), are presented. In the experiments on Rubenstein-Goodenough’s benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of direct association, association matrix and scalar association matrix verify that the double-check frequencies are reliable. Further study on named entity clustering shows that the five measures are quite useful. In particular, CODC measure is very stable on word-word and name-name experiments. The application of CODC measure to expand community chains for personal name disambiguation achieves 9.65% and 14.22% increase compared to the system without community expansion. All the experiments illustrate that the novel model of web search with double-checking is feasible for mining associations from the web.
Yusuke Miyao, Tomoko Ohta, Katsuya Masuda, Yoshimasa Tsuruoka, Kazuhiro Yoshida, Takashi Ninomiya and Jun'ichi Tsujii
This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and ontological identifiers by applying a deep parser and a term recognizer. During the run time, user requests are converted into queries of region algebra on these annotations. Structural matching with pre-computed semantic annotations establishes the accurate and efficient retrieval of relational concepts. This framework was applied to a text retrieval system for MEDLINE. Experiments on the retrieval of biomedical correlations revealed that the cost is sufficiently small for real-time applications and that the retrieval precision is significantly improved.
Mu Li, Muhua Zhu, Yang Zhang and Ming Zhou
A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms: it is high between a frequently occurring misspelling and its correction, and low between two irrelevant terms only with similar spellings. We present two models that are able to take advantage of this property. Experimental results demonstrate that the distributional similarity based models can significantly outperform their baseline systems in the web query spelling correction task.
Session Chair: Donia Scott
Aoife Cahill and Josef van Genabith
We present a novel PCFG-based architecture for robust
probabilistic generation based on wide-coverage
John D. Kelleher and Geert-Jan M. Kruijff
This paper presents an approach to incrementally generating locative expressions. It addresses the issue of combinatorial explosion inherent in the construction of relational context models by: (a) contextually defining the set of objects in the context that may function as a landmark, and (b) sequencing the order in which spatial relations are considered using a cognitively motivated hierarchy of relations, and visual and discourse salience.
Hisami Suzuki and Kristina Toutanova
Japanese case markers, which indicate the grammatical relation of the complement NP to the predicate, often pose challenges to the generation of Japanese text, be it done by a foreign language learner, or by a machine translation (MT) system. In this paper, we describe the task of predicting Japanese case markers and propose machine learning methods for solving it in two settings: (i) monolingual, when given information only from the Japanese sentence; and (ii) bilingual, when also given information from a corresponding English source sentence in an MT context. We formulate the task after the well-studied task of English semantic role labelling, and explore features from a syntactic dependency structure of the sentence. For the monolingual task, we evaluated our models on the Kyoto Corpus and achieved over 84% accuracy in assigning correct case markers for each phrase. For the bilingual task, we achieved an accuracy of 92% per phrase using a bilingual dataset from a technical domain. We show that in both settings, features that exploit dependency information, whether derived from gold-standard annotations or automatically assigned, contribute significantly to the prediction of case markers.
Session Chair: Peter Turney
Wei-Hao Lin and Alexander Hauptmann
In this paper we investigate how to automatically determine if two document collections are written from different perspectives. By perspectives we mean a point of view, for example, from the perspective of Democrats or Republicans. We propose a test of different perspectives based on distribution divergence between the statistical models of two collections. Experimental results show that the test can successfully distinguish document collections of different perspectives from other types of collections.
Janyce Wiebe and Rada Mihalcea
Subjectivity and meaning are both important properties of language. This paper explores their interaction, and brings empirical evidence in support of the hypotheses that (1) subjectivity is a property that can be associated with word senses, and (2) word sense disambiguation can directly benefit from subjectivity annotations.
Session Chair: John Prange
John Prager, Pablo Duboue and Jennifer Chu-Carroll
This paper demonstrates a conceptually simple but effective method of increasing the accuracy of QA systems on factoid-style questions. We define the notion of an inverted question, and show that by requiring that the answers to the original and inverted questions be mutually consistent, incorrect answers get demoted in confidence and correct ones promoted. Additionally, we show that lack of validation can be used to assert no-answer (nil) conditions. We demonstrate increases of performance on TREC and other question-sets, and discuss the kinds of future activities that can be particularly beneficial to approaches such as ours.
Yi Chen, Ming Zhou and Shilong Wang
Statistical ranking methods based on centroid vector (profile) extracted from ex-ternal knowledge have become widely adopted in the top definitional QA systems in TREC 2003 and 2004. In these approaches, terms in the centroid vector are treated as a bag of words based on the independent assumption. To relax this assumption, this paper proposes a novel language model-based answer reranking method to improve the existing bag-of-words model approach by considering the dependence of the words in the centroid vector. Experiments have been conducted to evaluate the different dependence models. The results on the TREC 2003 test set show that the reranking approach with biterm language model, significantly outperforms the one with the bag-of-words model and unigram language model by 14.9% and 12.5% respectively in F-Measure(5).
Session Chair: Gerald Penn
Daniel Feinstein and Shuly Wintner
Unification grammars are widely accepted as an expressive means for describing the structure of natural languages. In general, the recognition problem is undecidable for unification grammars. Even with restricted variants of the formalism, offline parsable grammars, the problem is computationally hard. We present two natural constraints on unification grammars which limit their expressivity. We first show that non-reentrant unification grammars generate exactly the class of context-free languages. We then relax the constraint and show that one-reentrant unification grammars generate exactly the class of tree-adjoining languages. We thus relate the commonly used and linguistically motivated formalism of unification grammars to more restricted, computationally tractable classes of languages.
Kim Gerdes and Sylvain Kahane
This paper describes a minimal topology driven parsing algorithm for topological grammars that synchronizes a rewriting grammar and a dependency grammar, obtaining two linguistically motivated syntactic structures. The use of non-local slash and visitor features can be restricted to obtain a CKY type analysis in polynomial time. German long distance phenomena illustrate the algorithm, bringing to the fore the procedural needs of the analyses of syntax-topology mismatches in constraint based approaches like for example HPSG.
Session Chair: Donia Scott
Radu Soricut and Daniel Marcu
We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations, and have optimal algorithms for realization via interpolation with language model probability distributions. We show the effectiveness of a WIDL-based NLG system in two sentence realization tasks: automatic translation and headline generation.
Crystal Nakatsu and Michael White
This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sound natural when synthesized. The ranker is trained on realizer and synthesizer features in supervised fashion, using human judgments of synthetic voice quality on a sample of the paraphrases representative of the generator’s capability. Results from a cross-validation study indicate that discriminative paraphrase reranking can achieve substantial improvements in naturalness on average, ameliorating the problem of highly variable synthesis quality typically encountered with today’s unit selection synthesizers.