printlogo
ETH Zuerich - Homepage
Computer Engineering and Networks Laboratory (TIK)
 

Publication Details for Techreport "Speaker normalization with respect to F0: a perceptual approach"

 

 Back

 New Search

 

Authors: Ulrike Glavitsch
Group: Computer Engineering
Type: Techreport
Title: Speaker normalization with respect to F0: a perceptual approach
Year: 2003
Month: December
Pub-Key: Gla03a
Keywords: SPE
Rep Nbr: 185
Abstract: A speaker normalization scheme that uses explicit knowledge of acoustic phonetics is presented. The scheme warps the frequency axis linearly in critical band rate with respect to the fundamental frequency F0. It thus allows an immediate adaption to a new speaker which is an advantage over commonly used schemes. Variants with different values of F0 and different parameters have been evaluated on several tasks of SpeechDat(II). The results show significant performance improvements on three tasks with monophone models, the most prominent result is a reduction in WER of 44.5% for an isolated digit task. However, the results achieved with tied triphone models are very modest. It is argued that the normalization scheme may still be correct but that the MFCC feature extraction erases its effect. Evidence for the need of a new feature extraction method that locates spectral peaks and ignores irrelevant portions of the spectrum is given.
Resources: [BibTeX]

 

 Back

 New Search