WAVELET-BASED ENERGY BINNING CEPSTRAL FEATURES FOR AUTOMATIC SPEECH RECOGNITION
Abstract
Speech production models, coding methods as well as text to speech technology often lead to the introduction of modulation models to represent speech signals with primary components which are amplitude-and-phase-modulated sine functions. Parallelisms between properties of the wavelet transform of primary components and algorithmic representations of speech signals derived from auditory nerve models like the EIH lead to the introduction of synchrosqueezing measures. On the other hand, in automatic speech (and speaker) recognition, cepstral feature have imposed themselves quasi-universally as acoustic characteristic of speech utterances. This paper analyses cepstral representation in the context of the synchrosqueezed representation - wastrum. It discusses energy accumulation derived wastra as opposed to classical MEL and LPC derived cepstra. In the former method the primary components and formants play a primary role. Recognition results are presented on the Wall Street Journal database using IBM continuous decoder.