Predictive model-based compensation schemes for robust speech recognition
Abstract
For practical applications speech recognition systems need to be insensitive to differences between training and test acoustic conditions. Differences in the acoustic environment may result from various sources, such as ambient back-ground noise, channel variations and speaker stress. These differences can dramatically degrade the performance of a speech recognition system. A wide range of techniques have been proposed for achieving noise robustness. This paper considers one particular approach to model-based compensation, predictive model-based compensation, which has been shown to achieve good noise robustness in a wide range of acoustic environments. The characteristic of these schemes is that they combine a speech model with an additive noise model, a channel model and, in the general case, a speaker stress model, to generate a corrupted-speech model. The general theory of these predictive techniques is discussed. Various approximations for rapidly performing the model combination stage have been proposed and are reviewed in this paper. The advantages and the limitations of such a predictive approach to noise robustness are also discussed. In addition, methods for combining predictive schemes with schemes which make use of speech data in the new environment, adaptive schemes, are detailed. This combined approach overcomes some of the limitations of the predictive schemes. © 1998 Published by Elsevier science B.V. All rights reserved.