Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

Jing Huang; Karthik Visweswariah

doi:10.1109/ASRU.2009.5373454

Publication

ASRU 2009

Conference paper

Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

ASRU 2009

View publication

Abstract

HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream. © 2009 IEEE.

Date

01 Dec 2009

Publication

ASRU 2009

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Topics

Share