Underspecifying and Predicting Voice for Surface Realisation Ranking

Sina Zarrieß,  Aoife Cahill,  Jonas Kuhn
University of Stuttgart


Abstract

This paper addresses a data-driven surface realisation model based on a large-scale reversible grammar of German. We investigate the relationship between the surface realisation performance and the character of the input to generation, i.e. its degree of underspecification. We extend a syntactic surface realisation system, which can be trained to choose among word order variants, such that the candidate set includes active and passive variants. This allows us to study the interaction of voice and word order alternations in realistic German corpus data. We show that with an appropriately underspecified input, a linguistically informed realisation model trained to regenerate strings from the underlying semantic representation achieves 91.5% accuracy (over a baseline of 82.5%) in the prediction of the original voice.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1101.pdf