The IBM RATS phase II speaker recognition system: Overview and analysis
Abstract
IBM's submission for the Phase II speaker recognition evalua- Tion of the DARPA sponsored Robust Automatic Transcription of Speech (RATS) program is examined. The objectives of the paper are three fold: (1) to provide a system description, (2) to identify key techniques for performance improvement, and (3) to quantify their contribution. In the system design, the funda- mental idea revolves around exploiting diversity and modeling complementary information at all levels. To speed up system development a push-button system is designed whereby all sys- Tem development steps could be rapidly completed. Noise ro- bustness is improved by utilizing two speech activity detectors (SADs) and five acoustic feature extractors. Furthermore, the probabilistic linear discriminant analysis (PLDA) based back- ends were trained with two different data subsets. To better ex- ploit the complementary information, system combination was performed in two modules. The first module trained new PLDA back-ends from concatenated compact representations while the second combined all the system scores and duration related side information in a neural network. The official results from the Phase II evaluation are also examined. The results indicate that for the 30s-30s task the performance of the overall system was better than the best single system by 46% and 40% on the inter- nal and evaluation test sets respectively. Copyright © 2013 ISCA.