Speech Processing

The research in this area is supervised by Dr. Beat Pfister.

The research of the speech processing group has been focused on text-to-speech (TTS) synthesis and speech recognition. These two topics share a common peculiarity, that contrasts to most other topics in speech processing, such as speech coding, speaker recognition, etc. The peculiarity of TTS synthesis and speech recognition is their involvement in both surface structures of natural language, namely text and speech. The aim is to transform one surface structure into the other, i.e., text into speech or vice versa.

It is commonly acknowledged that this cannot be achieved without the linguistic knowledge of the language(s) in concern. In order to be able to recognize speech from a specific language, we must know the set of phonemes of this language, the words that belong to this language and their pronunciation, which words can form expressions and sentences, etc. Our intention is to use always that type of representation which matches the type of knowledge to be represented.

Major research directions we are following are

  • Rule-Based Language Model for Speech Recognition,
  • Improving Speech Recognition Through Linguistics,
  • Prosody Control in Polyglot TTS Synthesis,
  • Speaker Verification.

Further Information can be found via the home page of the speech processing group or on a possibly outdated flyer.

Some selected publications since 2000: Publications Speech Processing.