Improved voice activity detection using static harmonic features
Abstract
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. We have proposed a statistical-model-based VAD using the long-term temporal information in speech, which shows good robustness against noise in an automobile environment. For further improvement, this paper describes a new method to exploit harmonic structure information with statistical models. In our approach, local peaks considered to be harmonic structures are extracted, without explicit pitch detection and voiced-unvoiced classification. The proposed method including both long-term temporal and static harmonic features led to considerable improvements under low SNR conditions in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system. ©2010 IEEE.