Decision tree based rate of speech modeling for speech recognition
Abstract
A real-world speech recognition system encounters several speaking styles and speaking rates and its accuracy depends highly on the speaking rate, i.e., degrades sharply with very fast or very slow speech (including hyperarticulated speech) In this paper, we propose a generic modeling scheme to capture a range of speaking rates from very slow to very fast with the use of decision trees. This approach improves recognition performance on fast and slow speech, without degrading the performance on normal speech. The main idea behind this scheme is to model the context-dependent HMM state likelihoods differently for different speaking rates as the joint probability of observing the sequence of durations given the sequence of the acoustic states, without having to rely on any explicit duration computation during run-time.