Joint morphological-lexical language modeling (JMLLM) for Arabic
Abstract
Language modeling for inflected languages such as Arabic poses new challenges for speech recognition due to rich morphology. The rich morphology results in large increases in perplexity and out-of-vocabulary (OOV) rate. In this study, we present a new language modeling method that takes advantage of Arabic morphology by combining morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items within a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates. Preliminary experiments detailed in this paper show satisfactory improvements over word and morpheme based trigram language models and their interpolations. © 2007 IEEE.