Publication
ICASSP 2007
Conference paper

Database mining for flexible concatenative text-to-speech

View publication

Abstract

In this paper we explore mining a concatenative text-to-speech database to exploit subtle, naturally-occurring stylistic and contextual variability for runtime synthesis. By making a desired style or context known to the search during synthesis, the cost function can be biased toward finding units which satisfy these additional criteria. Having the ability to bias the output of the synthesizer towards a particular voice quality, or other characteristic such as speaking rate, increases its flexibility and potential value. In this paper we illustrate the approach to synthesizing subtle speech variation by focusing on three aspects: prosodic structure (phrase-finalness), prosodic prominence (prosodic accent), and voice quality (breathiness). Target values for the first two of these are automatically generated, while the target value for breathiness is specified by the user. We present results which indicate the value of distinguishing our data along these dimensions, and discuss possible improvements and new uses in the future. © 2007 IEEE.

Date

Publication

ICASSP 2007

Authors

Share