next up previous
Next: Job opportunities Up: Welcome to the speech Previous: RECO: Speaker-independent word recognizer

Publications

GH16
J.-P. Goldman and P.-E. Honnet, et al. The SIWIS database: a multilingual speech database with acted emphasis. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (175KB)

TGPV16
N. Takahashi, M. Gygli, B. Pfister, and L. Van Gool. Deep convolutional neural networks and data augmentation for acoustic event recognition. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (363KB)

TNP16
N. Takahashi, T. Naghibi, and B. Pfister. Automatic pronunciation generation by utilizing a semi-supervised deep neural networks. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (359KB)

Nag15
T. Naghibi. Towards Robust Audio-Visual Speech Recognition. PhD thesis, No. 22867, Computer Engineering and Networks Laboratory, ETH Zurich, 2015. PDF (1885KB)

NHP15
T. Naghibi, S. Hoffmann, and B. Pfister. A semidefinite programming based search strategy for feature selection with mutual information measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8):1529-1541, 2015. PDF (451KB)

GCG14
P. Garner, R. Clark, and J.-P. Goldman, et al. Translation and prosody in swiss languages. In Proceedings of 3rd Swiss Workshop on Prosody: Nouveaux cahiers de linguistique française, pages 211-221, 2014. PDF (370KB)

Hof14
S. Hoffmann. A Data-driven Model for the Generation of Prosody from Syntactic Sentence Structures. PhD thesis, No. 21991, Computer Engineering and Networks Laboratory, ETH Zurich, 2014. PDF (999KB)

LH14
H. Liang and S. Hoffmann. Capturing Speaker-Independent Prosodic Information by Syntax Tree-Based Prosody Modelling. Internal Report of the SNSF project SIWIS. TIK, ETH Zurich, June 2014.

Lia14
H. Liang. Analysis of duration: Are emphases achieved differently in different languages in terms of duration?, November 2014.

Nag14
T. Naghibi. Modality Weighting for Audio-Visual Fusion in Speech Recognition. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2014.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_14_report.pdf). PDF (237KB)

NP14
T. Naghibi and B. Pfister. A boosting framework on grounds of online learning. In Proceedings of NIPS, Montréal (Canada), December 2014.

HP13
S. Hoffmann and B. Pfister. Text-to-speech alignment of long recordings using universal phone models. In Proceedings of Interspeech, pages 1520-1524, Lyon (France), September 2013. PDF (124KB)

Nag13
T. Naghibi. Robust Feature Extraction for Bimodal Speech Recognizer. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2013.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_13_report.pdf). PDF (120KB)

NHP13a
T. Naghibi, S. Hoffmann, and B. Pfister. Convex approximation of the NP-hard search problem in feature subset selection. In Proceedings of ICASSP, pages 3273-3277, Vancouver (Canada), May 2013. PDF (167KB)

NHP13b
T. Naghibi, S. Hoffmann, and B. Pfister. An efficient method to estimate pronunciation from multiple utterances. In Proceedings of Interspeech, pages 1951-1955, Lyon (France), September 2013. PDF (170KB)

Ewe12
T. Ewender. Automatic Selection of Speech Segments for Concatenative Speech Synthesis. PhD thesis, No. 20828, Computer Engineering and Networks Laboratory, ETH Zurich, 2012. PDF (2431KB)

HP12
S. Hoffmann and B. Pfister. Employing sentence structure: Syntax trees as prosody generators. In Proceedings of Interspeech, Portland, Oregon (USA), September 2012. PDF (984KB)

KP12
T. Kaufmann and B. Pfister. Syntactic language modeling with formal grammars. Speech Communication (Elsevier), 54(6):715-731, July 2012. PDF (382KB)

Nag12
T. Naghibi. Multi-Channel Audio Processing for Human Machine Interaction Applications. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2012.
( http://www.tik.ee.ethz.ch/spr/publications/Naghibi_12_report.pdf). PDF (127KB)

NP12a
T. Naghibi and B. Pfister. An approach to prevent adaptive beamformers from cancelling the desired signal. In Proceedings of ICASSP, pages 205-208, Kyoto (Japan), March 2012. IEEE. PDF (194KB)

NP12b
T. Naghibi and B. Pfister. Beamformer design for nonstationary signals by means of interfrequency correlations. In Proceedings of SAM, pages 261-264, Hoboken, NJ (USA), June 2012. PDF (487KB)

EP11
T. Ewender and B. Pfister. Automatically creating a diphone set from a speech database. In Proceedings of Interspeech, pages 2169-2172, Florence (Italy), August 2011. PDF (159KB)

Ger11
M. Gerber. Speech Recognition Techniques for Languages with Limited Linguistic Resources. PhD thesis, No. 19507, Computer Engineering and Networks Laboratory, ETH Zurich, 2011. PDF (1264KB)

GKP11
M. Gerber, T. Kaufmann, and B. Pfister. Extended Viterbi algorithm for optimized word HMMs. In Proceedings of ICASSP, pages 4932-4935, Prague (Czech Republic), May 2011. PDF (220KB)

Nag11
T. Naghibi. VSHMI Experimentation System. Annual Report of the SNSF project no. 200021 130224/1. TIK, ETH Zurich, March 2011. PDF (267KB)

EP10
T. Ewender and B. Pfister. Accurate pitch marking for prosodic modification of speech segments. In Proceedings of Interspeech, pages 178-181, Makuhari (Japan), September 2010. PDF (291KB)

Hof10
S. Hoffmann. Preliminary Study of Prosody in Foreign Language Inclusions. Report for ETH project no. TH-22 07-2. Speech Processing Group, TIK, ETH Zurich, June 2010. PDF (13688KB)

HP10
S. Hoffmann and B. Pfister. Fully automatic segmentation for prosodic speech corpora. In Proceedings of Interspeech, pages 1389-1392, Makuhari (Japan), September 2010. PDF (204KB)

KP10
T. Kaufmann and B. Pfister. Semi-automatic extension of morphological lexica. In Workshop Computational Linguistics - Applications, Wisla (Poland), 2010. PDF (117KB)

PN10
B. Pfister and T. Naghibi. Concept of the VSHMI Experimentation System. Report of the SNSF project no. 200021 130224/1. TIK, ETH Zurich, June 2010. PDF

EHP09
T. Ewender, S. Hoffmann, and B. Pfister. Nearly perfect detection of continuous F0 contour and frame classification for TTS synthesis. In Proceedings of Interspeech, pages 100-103, Brighton (United Kingdom), September 2009. demo examples PDF (771KB)

Hof09
S. Hoffmann. Automatic Phone Segmentation. Progress report of project no. TH-22 07-2. Speech Processing Group, TIK, ETH Zurich, September 2009. PDF (5979KB)

Kau09
T. Kaufmann. A Rule-based Language Model for Speech Recognition. PhD thesis, No. 18700, Computer Engineering and Networks Laboratory, ETH Zurich, 2009. PDF (897KB)

KEP09
T. Kaufmann, T. Ewender, and B. Pfister. Improving broadcast news transcription with a precision grammar and discriminative reranking. In Proceedings of Interspeech, pages 356-359, Brighton (United Kingdom), September 2009. PDF (264KB)

Rom09a
H. Romsdorfer. Polyglot speech prosody control. In Proceedings of Interspeech, pages 488-491, Brighton (United Kingdom), September 2009. PDF (482KB)

Rom09b
H. Romsdorfer. Polyglot Text-to-Speech Synthesis: Text Analysis & Prosody Control. PhD thesis, No. 18210, ETH Zurich. Shaker Verlag Aachen (ISBN 978-3-8322-8090-1), February 2009. PDF (1223KB)

Rom09c
H. Romsdorfer. Weighted neural network ensemble models for speech prosody control. In Proceedings of Interspeech, pages 492-495, Brighton (United Kingdom), September 2009. PDF (606KB)

GP08
M. Gerber and B. Pfister. Fast search for common segments in speech signals for speaker verification. In Proceedings of Interspeech, pages 375-378, Brisbane (Australia), September 2008. PDF (204KB)

KP08
T. Kaufmann and B. Pfister. Applying a grammar-based language model to a simplified broadcast-news transcription task. In Proceedings of ACL, pages 106-113, Columbus (Ohio), June 2008. PDF (464KB)

PK08
B. Pfister und T. Kaufmann. Sprachverarbeitung: Grundlagen und Methoden der Sprachsynthese und Spracherkennung. Springer Verlag (ISBN: 978-3-540-75909-6), 2008.

Beu07
R. Beutler. Improving Speech Recognition through Linguistic Knowledge. PhD thesis, No. 17039, Computer Engineering and Networks Laboratory, ETH Zurich, January 2007. PDF (2135KB)

GBP07
M. Gerber, R. Beutler, and B. Pfister. Quasi text-independent speaker verification based on pattern matching. In Proceedings of Interspeech, pages 1993-1996, Antwerp, August 2007. PDF (658KB)

GKP07
M. Gerber, T. Kaufmann, and B. Pfister. Perceptron-based class verification. In Proceedings of NOLISP (ISCA Workshop on non linear speech processing), Paris, May 2007. PDF (170KB)

KP07
T. Kaufmann and B. Pfister. Applying licenser rules to a grammar with continuous constituents. In Stefan Müller, editor, Proceedings of the 14th International Conference on Head-Driven Phrase Structure Grammar, pages 150-162, Stanford, 2007. CSLI Publications. PDF (73KB)

RP07
H. Romsdorfer and B. Pfister. Text analysis and language identification for polyglot text-to-speech synthesis. Speech Communication (Elsevier), 49(9):697-724, September 2007. PDF (563KB)

RP06
H. Romsdorfer and B. Pfister. Character stream parsing of mixed-lingual text. In ISCA Tutorial and Research Workshop on Multilingual Speech and Language Processing (MultiLing 2006), Stellenbosch (South Africa), April 2006. PDF (122KB)

BKP05a
R. Beutler, T. Kaufmann, and B. Pfister. Integrating a non-probabilistic grammar into large vocabulary continuous speech recognition. In Proceedings of the IEEE ASRU 2005 Workshop, pages 104-109, San Juan (Puerto Rico), November 2005. PDF (124KB)

BKP05b
R. Beutler, T. Kaufmann, and B. Pfister. Using rule-based knowledge to improve LVCSR. In Proceedings of ICASSP, pages 829-832, Philadelphia (USA), March 2005. PDF (204KB)

GP05
M. Gerber and B. Pfister. Quasi text-independent speaker verification with neural networks. MLMI'05 Workshop, Edinburgh (United Kingdom), July 2005. PDF (337KB)

Kau05
T. Kaufmann. Evaluation von Grammatikformalismen in Hinblick auf die Anwendung in der Spracherkennung . Zwischenbericht zum Nationalfonds-Projekt 105211-104078/1: Rule-Based Language Model for Speech Recognition. Institut TIK, ETH Zürich, September 2005.

RP05
H. Romsdorfer and B. Pfister. Phonetic labeling and segmentation of mixed-lingual prosody databases. In Proceedings of Interspeech, pages 3281-3284, Lisbon (Portugal), September 2005. PDF (224KB)

RPB05
H. Romsdorfer, B. Pfister, and R. Beutler. A mixed-lingual phonological component which drives the statistical prosody control of a polyglot TTS synthesis system. In S. Bengio and H. Bourlard, editors, Machine Learning for Multimodal Interaction, pages 263-276. Springer-Verlag Heidelberg, January 2005. PDF (237KB)

Beu04
R. Beutler. Open vocabulary CSR by linguistic knowledge. COST 278 workshop, Mons (Belgium), January 2004.

NP04
U. Niesen and B. Pfister. Speaker verification by means of ANNs. In Proceedings of ESANN, Bruges (Belgium), pages 145-150, April 2004. PDF (63KB)

RP04
H. Romsdorfer and B. Pfister. Multi-context rules for phonological processing in polyglot TTS synthesis. In Proceedings of Interspeech - ICSLP, pages 737-740, Jeju Island (Korea), October 2004. PDF (115KB)

Beu03
R. Beutler. Improve continuous speech recognition thru linguistic knowledge. COST 278 workshop, Barcelona, February 2003.

BP03
R. Beutler and B. Pfister. Integrating statistical and rule-based knowledge for continuous German speech recognition. In Proceedings of Eurospeech, pages 937-940, Geneva, September 2003. PDF (174KB)

Gla03
U. Glavitsch. Speaker Normalization with Respect to F0: a Perceptual Approach. IM2.SP Project Report. TIK/ETH Zurich, December 2003. PDF (206KB)

PB03
B. Pfister and R. Beutler. Estimating the weight of evidence in forensic speaker verification. In Proceedings of Eurospeech, pages 701-704, Geneva, September 2003. PDF (88KB)

PR03
B. Pfister and H. Romsdorfer. Mixed-lingual text analysis for polyglot TTS synthesis. In Proceedings of Eurospeech, pages 2037-2040, Geneva, September 2003. PDF (52KB)

Beu02
R. Beutler. Recognition of continuously spoken German language using linguistic knowledge. COST 278 workshop, Eindhoven, August 2002.

Leh02
G. Lehtinen. Sprecheradaptation und Out-of-Vocabulary-Modell. Bericht zum Projekt: Einsatz von Spracherkennung in der SAPH. Institut TIK, ETH Zürich, April 2002.

PW02
B. Pfister, E. Wehrli et al. Lexical and Syntactic Analysis of Mixed-Lingual Sentences for Text-to-Speech. Final Report of SNSF Project No 21-59396.99. Institut TIK, ETH Zürich, November 2002.

Pfi01
B. Pfister. Personenidentifizierung anhand der Stimme. Kriminalistik, 55. Jahrgang, Heft 4, S. 287-292 (Fachzeitschrift des Hüthig Verlags, Heidelberg), April 2001. PDF (338KB)

PL01
B. Pfister und G. Lehtinen. Schlussbericht für das Projekt COST249: Erkennung kontinuierlicher Sprache über das Telefon. Institut TIK, ETH Zürich, Januar 2001. PostScript (210KB)

TJ01
C. Traber and V. Jantzen. The SVOX TTS System. COST258 workshop, Prague, May 2001.

Jan00
V. Jantzen. Neural network-based pitch control for various sentence types. COST258 workshop, Stockholm, April 2000. PDF (176KB)

JWLL00
F.T. Johansen, N. Warakagoda, B. Lindberg, G. Lehtinen, et al. The COST249 SpeechDat multilingual reference recogniser. In Proceedings of LREC'2000 (Conference on Language, Resources and Evaluation), Athens (Greece), June 2000. PostScript (119KB)

LJWL00
B. Lindberg, F.T. Johansen, N. Warakagoda, G. Lehtinen, et al. A noise robust multilingual reference recogniser based on SpeechDat(II). In Proceedings of ICSLP, Beijing (China), October 2000. PostScript (60KB)

LS00
G. Lehtinen, S. Safra, et al. IDAS: Interactive Directory Assistance Services. In Proceedings of the COST249 ISCA Workshop on Voice Operated Telecom Services, pages 51-54, Gent (Belgium), May 2000. PostScript (128KB)

Tra00a
C. Traber. Das Sprachsynthesesystem SVOX. 11. Konferenz Elektronische Sprachsignalverarbeitung (ESSV 2000), Cottbus, September 2000.

Tra00b
C. Traber. Spectral smoothing of diphone boundary mismatches. COST258 workshop, Stockholm, April 2000.

TH99
C. Traber, K. Huber, et al. From multilingual to polyglot speech synthesis. In Proceedings of Eurospeech, pages 835-838, Budapest, September 1999. PDF

HPT98
K. Huber, B. Pfister und Ch. Traber. POSSY: Ein Projekt zur Realisierung einer polyglotten Sprachsynthese. In DAGA-Tagungsband, S. 392-393, 1998. PostScript (33KB)

Hub98a
K. Huber. Swiss German Polyphone - Schlussbericht. TIK-Report Nr.48. Institut TIK, ETH Zürich, Juni 1998.

Hub98b
K. Huber. Zusammenstellung der Trägerwörter für Deutsch und Italienisch. Bericht Nr.1 zum Projekt TTS'97. Institut TIK, ETH Zürich, Juni 1998.

Leh98
G. Lehtinen. Einsatz des konfigurierbaren Worterkenners WOROV. Bericht Nr.2. zum Projekt: Reverse Directory Service. Institut TIK, ETH Zürich, Januar 1998.

LS98a
G. Lehtinen and S. Safra. Generation and selection of pronunciation variants for a flexible word recognizer. In Proceedings of the ESCA Workshop: Modeling Pronunciation Variation for ASR, pages 67-71, Rolduc (The Netherlands), May 1998. PostScript (98KB)

LS98b
G. Lehtinen und S. Safra. Generierung von Aussprachevariantenregeln und Verbesserung von Subwortmodellen für einen flexiblen Worterkenner. In DAGA-Tagungsband, S. 400-401, March 1998. PostScript (124KB)

PH98
B. Pfister, K. Huber et al. Das Sprachsynthesesystem SVOX und seine praktische Anwendbarkeit. In DAGA-Tagungsband, S. 338-339, 1998. PostScript (107KB)

Rie98
M. Riedi. Controlling Segmental Duration in Speech Synthesis Systems. PhD thesis, No. 12487, Computer Engineering and Networks Laboratory, ETH Zurich (TIK-Schriftenreihe Nr. 26, ISBN 3-906469-05-0), February 1998. PostScript (3168KB)

Saf98
S. Safra. A Parsing Strategy in ARCOS-G. Talk at the COST249 meeting in Porto, Portugal, February 12-13, 1998. (printed in Final Report of COST249). PDF (52KB)

SLH98
S. Safra, G. Lehtinen, and K. Huber. Modeling pronunciation variations and coarticulation with finite-state transducers in CSR. In Proceedings of the ESCA Workshop: Modeling Pronunciation Variation for ASR, pages 125-130, Rolduc (The Netherlands), May 1998. PostScript (197KB)

LP97
G. Lehtinen und B. Pfister et al. Reverse Directory Service. Projektbericht Nr.1, Institut TIK, ETH Zürich, September 1997.

Rie97
M. Riedi. Modeling segmental duration with multivariate adaptive regression splines. In Proceedings of Eurospeech, pages 2627-2630, Rhodes (Greece), September 1997. PostScript (162KB)

Saf97
S. Safra. Das Experimentalsystem ARCOS: Konzepte, Aufbau, Methoden. Zwischenbericht zum Projekt ARCOS-G. Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Juni 1997.

Tra97
C. Traber. Improvements of the Morpho-Syntactic Analysis of the SVOX Text-to-Speech System. Projektbericht, Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Mai 1997.

Hut96
H.-P. Hutter. Comparison of Classic and Hybrid HMM Approaches to Speech Recognition over Telephone Lines. PhD thesis, No. 11662, Computer Engineering and Networks Laboratory, ETH Zurich (TIK-Schriftenreihe Nr. 15, ISBN 3 7281 2424 9), October 1996.

Pfi96a
B. Pfister. High-quality prosodic modification of speech signals. In Proceedings of ICSLP, pages 2446-2449, Philadelphia, October 1996. demo examples PDF (822KB)

Pfi96b
B. Pfister. Prosodische Modifikation von Sprachsegmenten für die konkatenative Sprachsynthese. Diss. Nr. 11331, TIK-Schriftenreihe Nr. 11 (ISBN 3 7281 2316 1), ETH Zürich, März 1996. PostScript (2987KB)

Saf96
S. Safra. Chartparsing in Continuous Speech Recognition. Talk at the COST249 meeting in Kosice, Slovakia, February 29, 1996. (printed in Final Report of COST249). PDF (99KB)

Hut95
H.-P. Hutter. Comparison of a new hybrid connectionist-SCHMM approach with other hybrid approaches for speech recognition. In Proceedings of ICASSP. IEEE, 1995. PDF (427KB)

LP95
G. Lehtinen und B. Pfister. Portierung des ARA-Systems auf die SparcStation-Plattform von Sun Microsystems. Bericht Nr.3 zum Projekt Realisation einer automatischen Rufnummernauskunft. Institut TIK, ETH Zürich, Oktober 1995.

Pfi95
B. Pfister. The SVOX Text-to-Speech System. Laboratory TIK, ETH Zurich, September 1995. PDF (109KB)

Rie95
M. Riedi. A neural network-based model of segmental duration for speech synthesis. In Proceedings of Eurospeech, pages 599-602, Madrid (Spain), September 1995. PDF (365KB)

Saf95
S. Safra. Handling Pronunciation Variants and Co-articulation with Finite State Transducers. Talk at the COST249 meeting in Nancy, France (printed in Final Report of COST249), March 6/7, 1995. PDF (20KB)

Tra95
C. Traber. SVOX: The Implementation of a Text-to-Speech System for German. PhD thesis, No. 11064, Computer Engineering and Networks Laboratory, ETH Zurich, TIK-Schriftenreihe Nr. 7 (ISBN 3 7281 2239 4), March 1995. PDF (927KB) PostScript (2271KB)

HP94
H.-P. Hutter und B. Pfister. Neuartiger hybrider SKHMM/KNN-Ansatz für die Spracherkennung. In Studientexte zur Sprachkommunikation, Heft 11, S. 90-97. TU Berlin, Oktober 1994.

Hut94
H.-P. Hutter. Recognizer for isolated German digits over telephone lines: RECO. In Final Report of COST232, 1994.

PLC94
B. Pfister, G. Lehtinen und D. Christnach. ARA-V1: Systembeschreibung und Auswertung eines Testeinsatzes. Bericht Nr.2 zum Projekt Realisation einer automatischen Rufnummernauskunft. Institut für Elektronik, ETH Zürich, September 1994.

PS94
B. Pfister und A. Schaub. Automatische Rufnummern-Auskunft. Technische Mitteilungen Telecom PTT, Mai 1994.

Saf94
S. Safra. Experimentalsystem zur Erkennung kontinuierlicher Sprache. Erster Bericht zum Projekt ARCOS-G. Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Februar 1994.

SP94
S. Safra und B. Pfister. ARCOS-G: Ein Experimentalsystem zur Erkennung kontinuierlicher deutscher Sprache. In Studientexte zur Sprachkommunikation, Heft 11, S. 174-181. TU Berlin, Oktober 1994. PostScript (613KB)

Tra93
C. Traber. Syntactic processing and prosody control in the SVOX TTS system for German. In Proceedings of Eurospeech, pages 2099-2102, September 1993.

Hub91
K. Huber. Messung und Modellierung der Segmentdauer für die Synthese deutscher Lautsprache. Diss. Nr. 9535, Institut für Elektronik, ETH Zürich, Juli 1991.

Hub90
K. Huber. A statistical model of duration control for speech synthesis. In Proc. of the EUSIPCO, Barcelona, September 1990.

Rus90
T. Russi. A Framework for Syntactic and Morphological Analysis and its Application in a Text-to-Speech System. PhD thesis, No. 9328, Electronics Laboratory, ETH Zurich, December 1990.




Last updated: Thu Oct 27 14:58:12 CEST 2016 by: Beat Pfister