All ‘A’ sessions are held in the Bayside Auditorium A; ‘B’ sessions are in Bayside 103; ‘C’ sessions are in Bayside 104; and ‘D’ sessions and the Student Research Workshop sessions are in Bayside 102.
Monday 17th July 930am–1030am
1A: Machine Translation I
Session Chair: David Chiang
Combination of Arabic Preprocessing Schemes for Statistical Machine Translation
Fatiha Sadat and Nizar Habash
Statistical machine
translation is quite robust when it comes to the choice of input
representation. It only requires consistency between training and testing. As a
result, there is a wide range of possible preprocessing choices for data used
in statistical machine translation. This is even more so for morphologically
rich languages such as Arabic. In this paper, we study the effect of different
word-level preprocessing schemes for Arabic on the quality of phrase-based
statistical machine translation. We also present and evaluate different methods
for combining preprocessing schemes resulting in improved translation quality.
Going Beyond
Necip Fazil
Ayan and
Bonnie J. Dorr
This paper presents an
extensive evaluation of five different alignments and investigates their impact
on the corresponding MT system output. We introduce new measures for intrinsic
evaluations and examine the distribution of phrases and untranslated words
during decoding to identify which characteristics of different alignments
affect translation. We show that precision-oriented alignments yield better MT
output (translating more words and using longer phrases) than recall-oriented
alignments.
1B: Topic Segmentation
Session Chair: Martha Palmer
Unsupervised Topic Modelling for Multi-Party Spoken Discourse
Matthew Purver, Konrad P. Körding, Thomas L. Griffiths and Joshua B. Tenebaum
We present a method for
unsupervised topic modelling which adapts methods used in document
classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented
multi-party discourse transcripts. We show how Bayesian inference in this
generative model can be used to simultaneously address the problems of topic
segmentation and topic identification: automatically segmenting multi-party
meetings into topically coherent segments with performance which compares well
with previous unsupervised segmentation-only methods (Galley et al., 2003)
while simultaneously extracting topics which rate highly when assessed for
coherence by human judges. We also show that this method appears robust in the
face of off-topic dialogue and speech recognition errors.
Minimum Cut Model for Spoken Lecture Segmentation
Igor Malioutov and
We consider the task of
unsupervised lecture segmentation. We formalize segmentation as a
graph-partitioning task that optimizes the normalized cut criterion. Our
approach moves beyond localized comparisons and takes into account long-range
cohesion dependencies. Our results demonstrate that global analysis improves
the segmentation accuracy and is robust in the presence of speech recognition
errors.
1C: Coreference
Session Chair: Vincent Ng
Bootstrapping Path-Based Pronoun Resolution
Shane Bergsma and Dekang Lin
We
present an approach to pronoun resolution based on syntactic paths. Through a
simple bootstrapping procedure, we learn the likelihood of coreference between
a pronoun and a candidate noun based on the path in the parse tree between the
two entities. This path information enables us to handle previously challenging
resolution instances, and also robustly addresses traditional syntactic
coreference constraints. Highly coreferent paths also allow mining of precise
probabilistic gender/number information. We combine statistical knowledge with
well-known features in a Support Vector Machine pronoun resolution classifier.
Significant gains in performance are observed on several datasets.
Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge
Xiaofeng
Yang, Jian Su and
Chew Lim Tan
Syntactic knowledge is important for pronoun resolution. Traditionally, the syntactic information for pronoun resolution is represented in terms of features that have to be selected and defined heuristically. In the paper, we propose a kernel-based method that can automatically mine the syntactic information from the parse trees for pronoun resolution. Specifically, we utilize the parse trees directly as a structured feature and apply kernel functions to this feature, as well as other normal features, to learn the resolution classifier. In this way, our approach avoids the efforts of decoding the parse trees into the set of flat syntactic features. The experimental results show that our approach can bring significant performance improvement and is reliably effective for the pronoun resolution task.
1D: Grammars I
Session Chair: Martin Kay
A Finite-State Model of Human Sentence Processing
It has previously been assumed in the psycholinguistic literature that
finite-state models of language are crucially limited in their explanatory
power by the locality of the probability distribution and the narrow scope of
information used by the model. We show that a simple computational model (a
bigram part-of-speech tagger based on the design used by Corley and Crocker
(2000) makes correct predictions on processing difficulty observed in a wide
range of empirical sentence processing data. We use two modes of evaluation: one that
relies on comparison with a control sentence, paralleling practice in human studies;
another that measures probability drop in the disambiguating region of the
sentence. Both are surprisingly good indicators of the processing difficulty of
garden-path sentences. The sentences tested are drawn from published sources
and systematically explore five different types of ambiguity: previous studies
have been narrower in scope and smaller in scale. We do not deny the
limitations of finite-state models, but argue that our results show that their
usefulness has been underestimated.
Acceptability Prediction by Means of Grammaticality Quantification
Philippe Blache, Barbara Hemforth and Stéphane Rauzy
We propose in this paper a method for quantifying sentence grammaticality. The approach based on Property Grammars, a constraint-based syntactic formalism, makes it possible to evaluate a grammaticality index for any kind of sentence, including ill-formed ones. We compare on a sample of sentences the grammaticality indices obtained from PG formalism and the acceptability judgements measured by means of a psycholinguistic analysis. The results show that the derived grammaticality index is a fairly good tracer of acceptability scores.
Monday 17th July 1100am–1230pm
2A: Machine Translation II
Session Chair: David Chiang
Discriminative Word Alignment with Conditional Random Fields
Phil Blunsom and Trevor Cohn
In this
paper we present a novel approach for inducing word alignments from sentence-aligned
data. We use a Conditional Random Field (CRF), a discriminative model, which is
estimated on a small supervised training set. The CRF is conditioned on both
the source and target texts, and thus allows for the use of arbitrary and
overlapping features over these data. Moreover, the CRF has efficient training
and decoding processes which both find globally optimal solutions.
We apply
this alignment model to both French-English and Romanian-English language
pairs. We show how a large number of highly predictive features can be easily
incorporated into the CRF, and demonstrate that even with only a few hundred
word-aligned training sentences, our model improves over the current
state-of-the-art with alignment error rates of 5.29 and 25.8 for the two tasks
respectively.
Named Entity Transliteration with Comparable Corpora
Richard Sproat, Tao Tao and ChengXiang Zhai
In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step.
Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora
Dragos Stefan Munteanu and Daniel Marcu
We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.
2B: Word Sense Disambiguation I
Session Chair: Martha Palmer
Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation
Yee Seng Chan and Hwee Tou Ng
Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well-calibrated probabilities when performing these estimations. By using well-calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy.
Ensemble Methods for Unsupervised WSD
Samuel Brody, Roberto Navigli and Mirella Lapata
Combination methods are an effective way of improving system performance. This paper examines the benefits of system combination for unsupervised WSD. We investigate several voting- and arbiter-based combination strategies over a diverse pool of unsupervised WSD systems. Our combination methods rely on predominant senses which are derived automatically from raw text. Experiments using the SemCor and Senseval-3 data sets demonstrate that our ensembles yield significantly better results when compared with state-of-the-art.
Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance
Roberto Navigli
Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford Dictionary of English. We assess the quality of the mapping and the induced clustering, and evaluate the performance of coarse WSD systems in the Senseval-3 English all-words task.
2C: Information Extraction I
Session Chair: Vincent Ng
Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations
Patrick Pantel and Marco Pennacchiotti
In this
paper, we present Espresso, a weakly-supervised, general-purpose, and accurate
algorithm for harvesting semantic relations. The main contributions are: i) a
method for exploiting generic patterns by filtering incorrect instances using
the Web; and ii) a principled measure of pattern and instance reliability
enabling the filtering algorithm. We present an empirical comparison of
Espresso with various state of the art systems, on different size and genre
corpora, on extracting various general and specific relations. Experimental
results show that our exploitation of generic patterns substantially increases
system recall with small effect on overall precision.
Modeling Commonality among Related Classes in Relation Extraction
Zhou GuoDong, Su Jian and Zhang Min
This paper proposes a novel hierarchical learning strategy
to deal with the data sparseness
problem in relation extraction by modeling the commonality among related classes. For each class
in the hierarchy either predefined manually
or automatically clustered, a linear discriminative function is determined in a top-down way using a
perceptron algorithm with the lower-level
weight vector derived from the upper-level weight vector. As the
upper-level class normally has
much more positive training examples than the lower-level class, the corresponding linear
discriminative function can be determined more reliably. The upper-level discriminative function then can
effectively guide the
discriminative function learning in the lower-level, which otherwise might suffer from limited training data.
Evaluation on the ACE
Relation Extraction Using Label Propagation Based Semi-supervised Learning
Jinxiu Chen, Donghong Ji, Chew Lim Tan and Zhengyu Niu
Shortage
of manually labeled data is an obstacle to supervised relation extraction
methods. In this paper we investigate a graph based semi-supervised learning
algorithm, a label propagation (LP) algorithm, for relation extraction. It
represents labeled and unlabeled examples and their distances as the nodes and
the weights of edges of a graph, and tries to obtain a labeling function to
satisfy two constraints: 1) it should be fixed on the labeled nodes, 2) it
should be smooth on the whole graph. Experiment results on the ACE corpus
showed that this LP algorithm achieves better performance than
2D: Grammars II
Session Chair: Martin Kay
Polarized Unification Grammars
Sylvain Kahane
This paper proposes a
generic mathematical formalism for the combination of various structures:
strings, trees, dags, graphs and products of them. The polarization of the
objects of the elementary structures controls the saturation of the final
structure. This formalism is both elementary and powerful enough to strongly
simulate many grammar formalisms, such as rewriting systems, dependency
grammars,
Partially Specified Signatures: a Vehicle for Grammar Modularity
Yael Cohen-Sygal and Shuly Wintner
This work provides the
essential foundations for modular construction of
(typed) unification grammars for natural languages. Much of the information in such grammars is encoded in the signature, and hence the key is facilitating a modularized development of type signatures. We introduce a definition of signature modules and show how two modules combine. Our definitions are motivated by the actual needs of grammar developers obtained through a careful examination of large scale grammars. We show that our definitions meet these needs by conforming to a detailed set of desiderata.
Morphology-Syntax Interface for Turkish
Özlem Çetinoğlu and Kemal Oflazer
This paper investigates the use of sublexical units as a solution to
handling the complex morphology with productive derivational processes, in the
development of a lexical functional grammar for Turkish. Such sublexical units
make it possible to expose the internal structure of words with multiple
derivations to the grammar rules in a uniform manner. This in turn leads to
more succinct and manageable rules. Further, the semantics of the derivations
can also be systematically reflected in a compositional way by constructing
Monday 17th July 200pm–330pm
3A: Parsing I
Session Chair: Joakim Nivre
PCFGs with Syntactic and Prosodic Indicators of Speech Repairs
John Hale, Izhak Shafran, Lisa Yung, Bonnie Dorr, Mary Harper, Anna Krasnyanskaya, Matthew Lease, Yang Liu, Brian Roark, Matthew Snover and Robin Stewart
A grammatical method of
combining two kinds of speech repair cues is presented. One cue, prosodic
disjuncture, is detected by a decision tree-based ensemble classifier that uses
acoustic cues to identify where normal prosody seems to be interrupted
(Lickley, 1996). The other cue, syntactic parallelism, codifies the expectation
that repairs continue a syntactic category that was left unfinished in the
reparandum (Levelt, 1983). The two cues are combined in a Treebank PCFG whose
states are split using a few simple tree transformations. Parsing performance
on the Switchboard and Fisher corpora suggests that these two cues help to
locate speech repairs in a synergistic way.
Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries
Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Takehiko Maruyama and Yasuyoshi Inagaki
Spoken monologues feature greater sentence length and structural complexity than do spoken dialogues. To achieve high parsing performance for spoken monologues, it could prove effective to simplify the structure by dividing a sentence into suitable language units. This paper proposes a method for dependency parsing of Japanese monologues based on sentence segmentation. In this method, the dependency parsing is executed in two stages: at the clause level and the sentence level. First, the dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, the dependencies over clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows this method to be effective for efficient dependency parsing of Japanese monologue sentences.
Trace Prediction and Recovery With Unlexicalized PCFGs and Slash Features
Helmut Schmid
This paper describes a
parser which generates parse trees with empty
elements in which traces and fillers are co-indexed. The parser is an unlexicalized PCFG parser which is guaranteed to return the most probable parse. The grammar is extracted from a version of the
3B: Dialogue I
Session Chair: Stanley Peters
Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features
Matthew Frampton and Oliver Lemon
We explore the use of
restricted dialogue contexts in reinforcement learning (RL) of effective
dialogue strategies for information seeking spoken dialogue systems (e.g.
COMMUNICATOR (Walker et al., 2001)). The contexts we use are richer than
previous research in this area, e.g. (Levin and Pieraccini, 1997; Schefer and
Young, 2001; Singh et al., 2002; Pietquin, 2004), which use only slot-based
information, but are much less complex than the full dialogue Information
States explored in (Henderson et al., 2005), for which tractable learning is an
issue. We explore how incrementally adding richer features allows learning of
more effective dialogue strategies. We use 2 user simulations learned from
COMMUNICATOR data (Walker et al., 2001; Georgila et al., 2005b) to explore the
effects of different features on learned dialogue strategies. Our results show
that adding the dialogue moves of the last system and user turns increases the
average reward of the automatically learned strategies by 65:9% over the
original (hand-coded) COMMUNICATOR systems, and by 7:8% over a baseline RL
policy that uses only slot-status features. We show that the learned strategies
exhibit an emergent focus switching strategy and effective use of the `give
help' action.
Dependencies between Student State and Speech Recognition Problems in Spoken Tutoring Dialogues
Mihai Rotaru and Diane J. Litman
Speech recognition
problems are a reality in current spoken dialogue systems. In order to better
understand these phenomena, we study dependencies between speech recognition
problems and several higher level dialogue factors that define our notion of
student state: frustration/anger, certainty and correctness. We apply Chi
Square (?2) analysis to a corpus of speech-based computer tutoring dialogues to
discover these dependencies both within and across turns. Significant
dependencies are combined to produce interesting insights regarding speech
recognition problems and to propose new strategies for handling these problems.
We also find that tutoring, as a new domain for speech applications, exhibits
interesting tradeoffs and new factors to consider for spoken dialogue design.
Learning the Structure of Task-driven Human-Human Dialogs
Srinivas Bangalore, Giuseppe Di Fabbrizio and Amanda Stent
Data-driven techniques
have been used for many computational linguistics tasks. Models derived from
data are generally more robust than hand-crafted systems since they better
reflect the distribution of the phenomena being modeled. With the availability
of large corpora of spoken dialog, dialog management is now reaping the
benefits of data-driven techniques. In this paper, we compare two approaches to
modeling subtask structure in dialog: a chunk-based model of subdialog
sequences, and a parse-based, or hierarchical, model. We evaluate these models
using customer agent dialogs from a catalog service domain.
3C: Machine Learning Methods I
Session Chair: Hal Daumé
Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Greiner and Dale Schuurmans
We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.
Training Conditional Random Fields with Multivariate Evaluation Measures
Jun Suzuki, Erik McDermott and Hideki Isozaki
This
paper proposes a framework for training Conditional Random Fields (CRFs) to
optimize multivariate evaluation measures, including non-linear measures such
as F-score. Our proposed framework is derived from an error minimization
approach that provides a simple solution for directly optimizing any evaluation
measure. Specifically focusing on sequential segmentation tasks, i.e. text
chunking and named entity recognition, we introduce a loss function which
closely reflects the target evaluation measure for these tasks, namely, segmentation
F-score. Our experiments show that our method performs better than standard CRF
training.
Approximation Lasso Methods for Language Modeling
Jianfeng Gao, Hisami Suzuki and Bin Yu
Lasso is a regularization method for parameter estimation in linear models. It optimizes the model parameters with respect to a loss function subject to model complexities. This paper explores the use of lasso for statistical language modeling for text input. Owing to the very large number of parameters, directly optimizing the penalized lasso loss function is impossible. Therefore, we investigate two approximation methods, the boosted lasso (BLasso) and the forward stagewise linear regression (FSLR). Both methods, when used with the exponential loss function, bear strong resemblance to the boosting algorithm which has been used as a discriminative training method for language modeling. Evaluations on the task of Japanese text input show that BLasso is able to produce the best approximation to the lasso solution, and leads to a significant improvement, in terms of character error rate, over boosting and the traditional maximum likelihood estimation.
3D: Applications I
Session Chair: John Prager
Automated Japanese Essay Scoring System based on Articles Written by Experts
Tsunenori Ishioka and Masayuki Kameda
We have
developed an automated Japanese essay scoring system called Jess. The system
needs expert writings rather than expert raters to build the evaluation model.
By detecting statistical outliers of predetermined aimed essay features
compared with many professional writings for each prompt, our system can
evaluate essays. The following three features are examined: (1) rhetoric –
syntactic variety, or the use of various structures in the arrangement of
phases, clauses, and sentences, (2) organization – characteristics associated
with the orderly presentation of ideas, such as rhetorical features and
linguistic cues, and (3) content – vocabulary related to the topic, such as
relevant information and precise or specialized vocabulary. The final
evaluation score is calculated by deducting from a perfect score assigned by a
learning process using editorials and columns from the Mainichi Daily News
newspaper. A diagnosis for the essay is also given.
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English
Ryo Nagata, Atsuo Kawai, Koichiro Morihiro and Naoki Isu
This paper proposes a method for detecting errors in article usage and singular plural usage based on the mass count distinction. First, it learns decision lists from training data generated automatically to distinguish mass and count nouns. Then, in order to improve its performance, it is augmented by feedback that is obtained from the writing of learners. Finally, it detects errors by applying rules to the mass count distinction. Experiments show that it achieves a recall of 0.71 and a precision of 0.72 and outperforms other methods used for comparison when augmented by feedback.
Correcting
Chris Brockett, William B. Dolan and Michael Gamon
This paper presents a
pilot study of the use of phrasal Statistical Machine Translation (
Monday 17th July 400pm–430pm
4A: Parsing II
Session Chair: Joakim Nivre
Graph Transformations in Data-Driven Dependency Parsing
Jens Nilsson, Joakim Nivre and Johan Hall
Transforming syntactic representations in order to improve parsing accuracy has been exploited successfully in statistical parsing systems using constituency-based representations. In this paper, we show that similar transformations can give substantial improvements also in data-driven dependency parsing. Experiments on the Prague Dependency Treebank show that systematic transformations of coordinate structures and verb groups result in a 10% error reduction for a deterministic data-driven dependency parser. Combining these transformations with previously proposed techniques for recovering non-projective dependencies leads to state-of-the-art accuracy for the given data set.
4B: Dialogue II
Session Chair: Stanley Peters
Learning to Generate Naturalistic Utterances Using Reviews in Spoken Dialogue Systems
Ryuichiro Higashinaka, Rashmi Prasad and Marilyn A. Walker
Spoken language
generation for dialogue systems requires a dictionary of mappings between
semantic representations of concepts the system wants to express and realizations
of those concepts. Dictionary creation is a costly process; it is currently
done by hand for each dialogue domain. We propose a novel unsupervised method
for learning such mappings from user reviews in the target domain, and test it
on restaurant reviews. We test the hypothesis that user reviews that provide
individual ratings for distinguished attributes of the domain entity make it
possible to map review sentences to their semantic representation with high
precision. Experimental analyses show that the mappings learned cover most of
the domain ontology, and provide good linguistic variation. A subjective user
evaluation shows that the consistency between the semantic representations and
the learned realizations is high and that the naturalness of the realizations
is higher than a hand-crafted baseline.
4C: Linguistic Kinships
Measuring Language Divergence by Intra-Lexical Comparison
T. Mark Ellison and Simon Kirby
This paper presents a method for building genetic language taxonomies based on a new approach to comparing lexical forms. Instead of comparing forms cross-linguistically, a matrix of language-internal similarities between forms is calculated. These matrices are then compared to give distances between languages. We argue that this coheres better with current thinking in linguistics and psycholinguistics. An implementation of this approach, called PHILOLOGICON, is described, along with its application to Dyen et al.'s (1992) ninety-five wordlists from Indo-European languages.
4D: Applications II
Session Chair: John Prager
Enhancing electronic dictionaries with an index based on associations
Olivier Ferret and Michael Zock
A good dictionary
contains not only many entries and a lot of
information concerning each one of them, but also adequate means to reveal the stored information.
Information access depends crucially on the quality of
the index. We
will present here some ideas of how a dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic resource an index based on the notion of association. We will also present preliminary work of how a subset of such associations, for example, topical associations, can be acquired by filtering a network of lexical co-occurrences extracted from a corpus.
Tuesday 18th July 1000am–1030am
5A: Parsing III
Session Chair: Dan Klein
Guiding a Constraint Dependency Parser with Supertags
Kilian A. Foth, Tomas By and Wolfgang Menzel
We investigate the utility of supertag information for guiding an existing dependency parser of German. Using weighted constraints to integrate the additionally available information, the decision process of the parser is influenced by changing its preferences, without excluding alternative structural interpretations from being considered. The paper reports on a series of experiments using varying models of supertags that significantly increase the parsing accuracy. In addition, an upper bound on the accuracy that can be achieved with perfect supertags is estimated.
5B: Lexical Issues I
Session Chair: Chu Ren Huang
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words
Dmitry Davidov and Ari Rappoport
We present
a novel approach for discovering word categories, sets of words sharing a
significant aspect of their meaning. We utilize meta-patterns of high-frequency
words and content words in order to discover pattern candidates. Symmetric
patterns are then identified using graph-based measures, and word categories
are created based on graph clique sets. Our method is the first pattern-based
method that requires no corpus annotation or manually provided seed patterns or
words. We evaluate our algorithm on very large corpora in two languages, using
both human judgments and WordNet-based evaluation. Our fully unsupervised
results are superior to previous work that used a
5C: Summarization I
Session Chair: Simone Teufel
Bayesian Query-Focused Summarization
Hal Daumé
We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
5D: Semantics I
Session Chair: Johan Bos
Expressing Implicit Semantic Relations without Supervision
Peter D. Turney
We present an unsupervised learning algorithm that mines
large text corpora for patterns that express implicit semantic relations. For a given
input word pair X:Y with some unspecified semantic relations, the corresponding
output list of patterns <P1,...,Pm> is ranked according to how well each
pattern Pi expresses the relations between X and Y. For example, given
X=ostrich and Y=bird, the two highest-ranking output patterns are "X is
the largest Y" and "Y such as the X". The output patterns are
intended to be useful for finding further pairs with the same relations, to
support the construction of lexicons, ontologies, and semantic networks. The
patterns are sorted by pertinence, where the pertinence of a pattern Pi for a
word pair X:Y is the expected relational similarity between the given pair and
typical pairs for Pi. The algorithm is empirically evaluated on two tasks,
solving multiple-choice
Tuesday 18th July 1100am–1230pm
6A: Parsing IV
Session Chair: Owen Rambow
Hybrid Parsing: Using Probabilistic Models as Predictors for a Symbolic Parser
Kilian A. Foth and Wolfgang Menzel
In this paper we investigate the benefit of stochastic predictor components for the parsing quality which can be obtained with a rule-based dependency grammar. By including a chunker, a supertagger, a PP attacher, and a fast probabilistic parser we were able to improve upon the baseline by 3.2%, bringing the overall labelled accuracy to 91.1% on the German NEGRA corpus. We attribute the successful integration to the ability of the underlying grammar model to combine uncertain evidence in a soft manner, thus avoiding the problem of error propagation.
Error mining in parsing results
Benoît Sagot and
Éric de La Clergerie
We introduce an error mining technique for automatically detecting errors in resources that are used in parsing systems. We applied this technique on parsing results produced on several million words by two distinct parsing systems, which share the syntactic lexicon and the pre-parsing processing chain. We were thus able to identify missing and erroneous information in these resources.
Reranking and Self-Training for Parser Adaptation
David McClosky, Eugene Charniak and Mark Johnson
Statistical
parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have
shown vast improvements over the last 10 years. Much of this improvement,
however, is based upon an ever-increasing number of features to be trained on
(typically) the WSJ treebank data. This has led to concern that such parsers
may be too finely tuned to this corpus at the expense of portability to other
genres. Such worries have merit. The standard "Charniak parser"
checks in at a labeled precision-recall f-measure of 89.7% on the Penn WSJ test
set, but only 82.9% on the test set from the Brown treebank corpus.
This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (McClosky et al. 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4%.
6B: Lexical Issues II
Session Chair: Chu Ren Huang
Automatic Classification of Verbs in Biomedical Texts
Anna Korhonen, Yuval Krymolowski and Nigel Collier
Lexical classes, when
tailored to the application and domain in question, can provide an effective
means to deal with a number of natural language processing (NLP) tasks. While
manual construction of such classes is difficult, recent research shows that it
is possible to automatically induce verb classes from cross-domain corpora with
promising accuracy. We report a novel experiment where similar technology is
applied to the important, challenging domain of biomedicine. We show that the
resulting classification, acquired from a corpus of biomedical journal
articles, is highly accurate and strongly domain specific. It can be used to
aid
Selection of Effective Contextual Information for Automatic Synonym Acquisition
Masato Hagiwara, Yasuhiro Ogawa and Katsuhiko Toyama
Various methods have been proposed for automatic synonym acquisition, as synonyms are one of the most fundamental lexical knowledge. Whereas many methods are based on contextual clues of words, little attention has been paid to what kind of categories of contextual information are useful for the purpose. This study has experimentally investigated the impact of contextual information selection, by extracting three kinds of word relationships from corpora: dependency, sentence co-occurrence, and proximity. The evaluation result shows that while dependency and proximity perform relatively well by themselves, combination of two or more kinds of contextual information gives more stable performance. We’ve further investigated useful selection of dependency relations and modification categories, and it is found that modification has the greatest contribution, even greater than the widely adopted subject object combination.
Scaling Distributional Similarity to Large Corpora
James Gorman and James R. Curran
Accurately representing
synonymy using distributional similarity requires large volumes of data to
reliably represent infrequent words. However, the naive nearest-neighbour
approach to comparing context vectors extracted from large corpora scales
poorly (O (n2) in the vocabulary size).
In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance.
6C: Summarization II
Session Chair: Simone Teufel
Extractive Summarization using Inter- and Intra- Event Relevance
Wenjie Li, Mingli Wu, Qin Lu, Wei Xu and Chunfa Yuan
Event-based summarization attempts to select and organize sentences in a summary with respect to events or sub-events that the sentences describe. Each event has its own internal structure and meanwhile relates to the other events semantically, temporally, spatially, causally or conditionally. In this paper, we define an event as one or more event terms along with the named entities associated, and present a novel approach to derive intra- and inter- event relevance using the information of internal association, semantic related-ness, distributional similarity and named entity clustering. We then apply PageRank ranking algorithm to estimate the significance of an event for inclusion in a summary from the event relevance derived. Experiments on the DUC 2001 test data shows that the relevance of the named entities involved in events achieves better result when their relevance is derived from the event terms they associate. It also reveals that the topic-specific from documents themselves outperforms the semantic relevance from a general purpose knowledge base like Word-Net.
Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures
James Clarke and Mirella Lapata
Sentence compression is the task of producing a summary at the sentence level. This paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation. We provide a novel comparison between a supervised constituent-based and a weakly supervised word-based compression algorithm and examine how these models port to different domains (written vs. spoken text). To achieve this, a human-authored compression corpus has been created and our study highlights potential problems with the automatically gathered compression corpora currently used. Finally, we assess whether automatic evaluation measures can be used to determine compression quality.
A Bottom-up Approach to Sentence Ordering for Multi-document Summarization
Danushka Bollegala, Naoaki Okazaki and Mitsuru Ishizuka
Ordering information is a difficult but important task for applications generating natural-language text. We present a bottom-up approach to arranging sentences extracted for multi-document summarization. To capture the association and order of two textual segments (eg, sentences), we define four criteria, chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion until we obtain the overall segment with all sentences arranged. Our experimental results show a significant improvement over existing sentence ordering strategies.
6D: Semantics II
Session Chair: Johan Bos
Learning Event Durations from Event Descriptions
Feng Pan, Rutu Mulkar and Jerry R. Hobbs
We have constructed a corpus of news articles in which events are annotated for estimated bounds on their durations. Here we describe a method for measuring inter-annotator agreement for these event duration distributions. We then show that machine learning techniques applied to this data yield coarse-grained event duration information, considerably outperforming a baseline and approaching human performance.
Automatic learning of textual entailments with cross-pair similarities
Fabio Massimo Zanzotto and Alessandro Moschitti
In this paper we define a novel similarity measure between examples of textual entailments and we use it as a kernel function in Support Vector Machines (SVMs). This allows us to automatically learn the rewrite rules that describe a non trivial set of entailment cases. The experiments with the data sets of the RTE 2005 challenge show an improvement of 4.4% over the state-of-the-art methods.
An Improved Redundancy Elimination Algorithm for Underspecified Representations
Alexander Koller and Stefan Thater
We present an efficient algorithm for the redundancy elimination problem: Given an underspecified semantic representation (USR) of a scope ambiguity, compute an USR with fewer mutually equivalent readings. The algorithm operates on underspecified chart representations which are derived from dominance graphs; it can be applied to the USRs computed by large-scale grammars. We evaluate the algorithm on a corpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime.
Tuesday 18th July 200pm–330pm
7A: Parsing V
Session Chair: Takashi Ninomiya
Integrating Syntactic Priming into an Incremental Probabilistic Parser, with an Application to Psycholinguistic Modeling
Amit Dubey, Frank Keller and
Patrick Sturt
The psycholinguistic literature provides evidence for syntactic priming, i.e., the tendency to repeat structures. This paper describes a method for incorporating priming into an incremental probabilistic parser. Three models are compared, which involve priming of rules between sentences, within sentences, and within coordinate structures. These models simulate the reading time advantage for parallel structures found in human data, and also yield a small increase in overall parsing accuracy.
A Fast, Accurate Deterministic Parser for Chinese
Mengqiu Wang, Kenji Sagae and Teruko Mitamura
We present a novel classifier-based deterministic parser
for Chinese constituency parsing. Our parser computes parse trees from bottom
up in one pass, and uses classifiers to make shift-reduce decisions. Trained
and evaluated on the standard training and test sets, our best model (using
stacked classifiers) runs in linear time and has labeled precision and recall
above 88% using gold-standard part-of-speech tags, surpassing the best
published results. Our
Learning Accurate, Compact, and Interpretable Tree Annotation
Slav Petrov, Leon Barrett, Romain Thibaux and
Dan Klein
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.
7B: Word Sense Disambiguation II
Session Chair: Hwee Tou Ng
Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping
Oana Frunza and Diana Inkpen
Partial cognates are pairs of words in two languages that have the same meaning in some, but not all contexts. Detecting the actual meaning of a partial cognate in con-text can be useful for Machine Translation tools and for Computer-Assisted Language Learning tools. In this paper we propose a supervised and a semi-supervised method to disambiguate partial cognates between two languages: French and English. The methods use only automatically-labeled data; therefore they can be applied for other pairs of languages as well. We also show that our methods perform well when using corpora from different domains.
Direct Word Sense Matching for Lexical Substitution
Ido Dagan, Oren Glickman, Alfio Gliozzo, Efrat Marmorshtein and Carlo Strapparava
This paper investigates
conceptually and empirically the novel sense matching task, which requires to
recognize whether the senses of two synonymous words match in context. We
suggest direct approaches to the problem, which avoid the intermediate step of
explicit word sense disambiguation, and demonstrate their appealing advantages
and stimulating potential for future research.
An Equivalent Pseudoword for Unsupervised Chinese Word Sense Disambiguation
Zhimao Lu, Haifeng Wang, Jianmin Yao, Ting Liu and Sheng Li
This paper presents a new approach based on Equivalent Pseudowords (EPs) to tackle Word Sense Disambiguation (WSD) in Chinese language. EPs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the EP solution on Senseval-3 Chinese test set. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of EP for unsupervised WSD.