NASY: Improving the naturalness in TTS synthesis

The aim of this project was to improve those parts of our TTS system SVOX that are primarily responsible for the current lack of speech quality and naturalness. This includes mainly work in the two following fields:

The first field includes the realization of a prosody control for all types of sentences such as statements, questions, commands, exclamations, etc. and for various speaking styles like reading aloud, dialogue speech, spelling, and many more.

The second major working field included the preparation of a corpus of synthesis unit, which is particularly demanding in the case of mixed-lingual TTS synthesis (such a corpus has been realized in project POSSY/TTS'99). Since the prosodic modification and concatenation of the synthesis units can heavily impair the quality and the naturalness of the synthetic speech, investigations towards improved signal processing techniques have also been made.

The main achievements of this project are: An improved fundamental frequency control based on a neural network has been realized. It predicts highly natural speech melodies for all sentence types, i.e. for declarative sentences, various type of questions, commands, and emphasis. Furthermore, a new method for the modification of the duration and the fundamental frequency of speech signals has been developed.

The project results have been presented in several workshops ([Jan00], [Tra00b], [TJ01] and [Tra00a]).

Supported by: Bundesamt für Bildung und Wissenschaft

In collaboration with: This project was carried out in the collaborative framework of COST 258.

