N-Best Rescoring Based on Pitch-accent Patterns

Je Hun Jeon1,  Wen Wang2,  Yang Liu1
1The University of Texas at Dallas, 2SRI International


Abstract

In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount of data and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3\%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1074.pdf