Voice-melody transcription under a speech recognition framework
Abstract
This paper presents a robust voice-melody transcription system using a speech recognition framework. While many previous voice-melody transcription systems have utilized non-statistical approaches, statistical recognition technology can potentially achieve more robust results. A cepstrum-based acoustic model is employed to avoid the hard-decisions that have to be made when using explicit voiced-unvoiced segmentation and pitch extraction, and a key-independent 4-gram language model is employed to capture prior probabilities of different melodic sequences. Evaluations are done from the perspective of both note recognition error rate and Query-by-Humming end-to-end performance. The results are compared with three other voice-melody transcription systems. Experiments have shown that our system is state-of-the-art: it is much more robust than other systems on data containing noise, and close to the best of all the systems on the clean data set. © 2007 IEEE.