Speed improvement of the tree-based time asynchronous search
Abstract
The IBM large vocabulary continuous speech recognition system is based on an asynchronous stack decoding scheme. This is essentially a tree search, as described in [1]. The main advantages - efficient memory utilization and a single-pass search strategy - make the system extremely suitable for real-time applications. This article describes further improvements in efficiency of the search method. These improvements are achieved in part by more efficient word to context dependent acoustic model expansion, producing equivalent search results and thus not affecting the recognition accuracy. Additional improvements are achieved by introducing an approximation in the computation of the likelihood of the hypothesized path. The basic idea is to allow sharing of some branches in the search tree and results in effectively a tree to network transformation.