AUTOMATIC SPEECH RECOGNITION VIA PSEUDO-INDEPENDENT MARGINAL MIXTURES.
Abstract
Statistical models (prototypes) for the multivariate probability distribution of vectors (frames) of speech parameters may be utilized in various ways. If the stream of vectors is passed directly to the decoder of a continuous parameter speech recognizer, then the prototypes are used by the decoder; if the recognizer has a time-synchronous labeling acoustic processor, then they are used for vector quantization (labeling) and the resulting label stream is passed to the decoder; other uses are possible as well. A method for constructing such prototypes is presented. Speech recognition experiments are described in which the prototypes were trained by iteratively interleaving steps of a K-MEANS-type algorithm for clustering and steps of an expectation and maximization algorithm for reestimation. Results are presented (using a labeling acoustic processor) having significantly fewer decoding errors than previous methods do.