Accessing speech documents on smartphones
Abstract
This paper introduces BBSearch, which is an experimental system for exploring the challenges of ubiquitous access to recorded speech data. BBSearch applies information retrieval techniques to transcripts obtained by automatic speech recognition and it aims to provide a uniform user experience across platforms. To provide identical search functionality and document ranking, BBSearch applications use the same IR library for indexing and retrieval, namely Apache Lucene. For Java-enabled mobile platforms, BBSearch uses our J2ME Lucene port, called LuceneME. This paper explores the resource requirements of LuceneME when used for Boolean searches and for supporting the podcast navigation GUI. On a BlackBerry smartphone, a diverse set of queries against a 70-hour corpus complete in less than 3 seconds and use less than 2MB of memory. The results of the evaluation validate our design and warrant expanding BBSearch to less capable cellphones, larger corpuses, or with more complex search capabilities.