Boosting systems for large vocabulary continuous speech recognition

George Saon; Hagen Soltau

doi:10.1016/j.specom.2011.07.011

Speech Communication

Paper

01 Feb 2012

Boosting systems for large vocabulary continuous speech recognition

View publication

Abstract

We employ a variant of the popular Adaboost algorithm to train multiple acoustic models such that the aggregate system exhibits improved performance over the individual recognizers. Each model is trained sequentially on re-weighted versions of the training data. At each iteration, the weights are decreased for the frames that are correctly decoded by the current system. These weights are then multiplied with the frame-level statistics for the decision trees and Gaussian mixture components of the next iteration system. The composite system uses a log-linear combination of HMM state observation likelihoods. We report experimental results on several broadcast news transcription setups which differ in the language being spoken (English and Arabic) and amounts of training data. Additionally, we study the impact of boosting on maximum likelihood (ML) and discriminatively trained acoustic models. Our findings suggest that significant gains can be obtained for small amounts of training data even after feature and model-space discriminative training. © 2011 Elsevier B.V. All rights reserved.

Conference paper

Boosting systems for large vocabulary continuous speech recognition

Abstract

Related

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

Detector-Free Weakly Supervised Grounding by Separation

On the importance of event detection for ASR

Improving customization of neural transducers by mitigating acoustic mismatch of synthesized audio