printlogo
ETH Zuerich - Homepage
Computer Engineering and Networks Laboratory (TIK)
 

Publication Details for PhD Thesis "A Rule-based Language Model for Speech Recognition"

 

 Back

 New Search

 

Authors: Tobias Kaufmann
Group: Computer Engineering
Type: PhD Thesis
Title: A Rule-based Language Model for Speech Recognition
Year: 2009
Month: October
Pub-Key: Kau09
Keywords: speech processing
ETH Nbr: 18700
Pub Nbr: 109
School: ETH
Abstract: Large-vocabulary continuous speech recognition relies on prior knowledge about a natural language to complement the acoustic models. N-grams, the most widely used language models, largely disregard the structure of natural language. In the last decade, progress in the field of statistical parsing has led to the development of more powerful statistical language models that also consider syntactic structure. These models have shown that information about the structure of natural language can significantly improve the accuracy of automatic speech recognition.

Unlike statistical parsers, formal grammars are designed to discriminate between grammatical and ungrammatical sentences. Formal grammars have been successfully applied to narrow-domain natural language understanding tasks, but not to broad-domain speech recognition. In fact, it appears that grammar-based approaches to speech recognition do not easily scale up to broad domains. Some of the main difficulties that have to be faced are lack of precision, lack of coverage and a reduced benefit from pure grammaticality information.

The aim of this thesis is to demonstrate that hard linguistic constraints represented in a formal grammar can improve automatic speech recognition on a broad domain. To this end, a novel approach to integrating formal grammars into large-vocabulary continuous speech recognition is proposed. This approach is based on a discriminative reranking scheme that considers syntactic features of the N best speech recognition hypotheses. The syntactic features are extracted by means of a precise formal grammar (a Head-driven Phrase Structure Grammar) with a stochastic disambiguation component.

The feasibility of our approach is verified experimentally. For a German broadcast news transcription task, we report a statistically significant reduction of the word error rate by 1.3% absolute (9.7% relative) compared to a competitive baseline system, namely the LIMSI German broadcast news transcription system. To our knowledge, this is the first significant improvement on a broad-domain speech recognition task due to a formal grammar. Different properties of the proposed approach are investigated in a series of additional experiments.
Location: Zurich
Resources: [BibTeX]

 

 Back

 New Search