The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation
Abstract
Large vocabulary speaker-dependent speech recognition systems adjust to the acoustic peculiarities of each new speaker based on some enrollment data provided by this speaker. As the amount of data required increases with the sophistication of the underlying acoustic models, the enrollment may get lengthy. To streamline it, it is therefore desirable to make use of previously acquired speech data. We describe a data augmentation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired data of the reference speaker onto the space of the new speaker. The performance of the resulting procedure, dubbed the metamorphic algorithm, is illustrated on an isolated utterance speech recognition task with a vocabulary of 20 000 words. Results show that the metamorphic algorithm can substantially reduce the word error rate when only a limited amount of enrollment data is available. Alternatively, it leads to a level of performance comparable to that obtained when a much greater amount of enrollment data is required from the new speaker. In addition, it can also be used for tracking spectral evolution over time, thus providing a possible means for robust speaker self-adaptation. © 1994 IEEE