Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?

Maoxi Li1,  Chengqing Zong1,  Hwee Tou Ng2
1Chinese Academy of Sciences, 2National University of Singapore


Abstract

Word is usually adopted as the smallest unit in most tasks of Chinese language processing. However, for automatic evaluation of the quality of Chinese translation output when translating from other languages, either a word-level approach or a character-level approach is possible. So far, there has been no detailed study to compare the correlations of these two approaches with human assessment. In this paper, we compare word-level metrics with character-level metrics on the submitted output of English-to-Chinese translation systems in the IWSLT’08 CT-EC and NIST’08 EC tasks. Our experimental results reveal that character-level metrics correlate with human assessment better than word-level metrics. Our analysis suggests several key reasons behind this finding.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2028.pdf