Example-Based Automatic Phonetic Transcription Language Resources - PowerPoint PPT Presentation

Signal Processing and Speech Communication Laboratory Example-Based Automatic Phonetic Transcription Language Resources and Evaluation Conference 2010 Christina Leitner, Martin Schickbichler, Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology, Austria 21 May 2010 C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 1/21

Signal Processing and Speech Communication Laboratory Motivation Why use automatic phonetic transcription? Phonetic transcriptions are an essential resource in speech technologies and linguistics. Speech recognizers Speech synthesis Labelling of corpora Manual transcription is time-consuming, expensive and error-prone. C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 2/21

Signal Processing and Speech Communication Laboratory Motivaton (2) Benefits of automatic phonetic transcription Creation of draft transcriptions Correction by human transcribers instead of creation from scratch Faster and cheaper More objective than transcriptions of a team of human transcribers Consistency check of already transcribed material C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 3/21

Signal Processing and Speech Communication Laboratory Existing approaches Mostly based on Hidden Markov Models (HMMs) HMM parameters “Aquarell” ❄ Language “Model-based” ✲ alignment ✛ Viterbi model (opt.) ❄ [ akva"öe fll ] C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 4/21

Signal Processing and Speech Communication Laboratory Our approach Inspired by concatenative speech synthesis and template-based speech recognition Database of examples “Aquarell” ❄ ❄ Candidate ✲ ✲ Pattern “Example-based” selection comparison (opt.) ❄ ✲ Synthesis ❄ [ akva"öe fll ] C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 5/21

Signal Processing and Speech Communication Laboratory Example-based APT 2 scenarios Constrained phone recognition Unconstrained phone recognition C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 6/21

Signal Processing and Speech Communication Laboratory Example-based APT 2 scenarios Constrained phone recognition Decision based on audio sample and intermediate transcription derived from orthographic transcription by letter-to-sound rules Unconstrained phone recognition C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 6/21

Signal Processing and Speech Communication Laboratory Example-based APT 2 scenarios Constrained phone recognition Decision based on audio sample and intermediate transcription derived from orthographic transcription by letter-to-sound rules “B¨ acker” + [ be flk5 ] → /b e k 6/ Unconstrained phone recognition C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 6/21

Signal Processing and Speech Communication Laboratory Example-based APT 2 scenarios Constrained phone recognition Decision based on audio sample and intermediate transcription derived from orthographic transcription by letter-to-sound rules “B¨ acker” + [ be flk5 ] → /b e k 6/ Unconstrained phone recognition Decision based on audio sample only [ be flk5 ] → C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 6/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview Database of examples Three-phone speech samples Phone boundaries determined by doing forced alignment with the Hidden Markov Toolkit (HTK) 12 Mel Frequency Cepstral Coefficients (MFCCs) plus overall energy, delta and acceleration coefficients: 39 parameters per frame Pattern matching Measure for similarity between two utterances Dynamic time warping (DTW) algorithm Segmental and open-begin-end DTW C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 7/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview (2) Transcription synthesis Constrained phone recognition Number of phones fixed Most frequent phones from best matching three-phone samples Unconstrained phone recognition Number of phones unknown List of n best matching samples for each frame Nearest neighbor classification C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 8/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview (2) Transcription synthesis “B¨ acker” /b e k 6/ Constrained phone recognition Number of phones fixed sil b e o k 6 sil @ u Most frequent phones from best matching @ \ o three-phone samples a Unconstrained phone recognition Number of phones unknown List of n best matching samples for each frame Nearest neighbor classification C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 8/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview (2) Transcription synthesis “B¨ acker” /b e k 6/ Constrained phone recognition Number of phones fixed b e o k 6 Most frequent phones from best matching [ be flk5 ] three-phone samples Unconstrained phone recognition Number of phones unknown List of n best matching samples for each frame Nearest neighbor classification C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 8/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview (2) Transcription synthesis “B¨ acker” /b e k 6/ Constrained phone recognition Number of phones fixed b e o k 6 Most frequent phones from best matching [ be flk5 ] three-phone samples sil b b b e o e o e o e o k k 6 6 6 sil Unconstrained phone recognition Number of phones unknown List of n best matching samples for each frame Nearest neighbor classification C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 8/21

Signal Processing and Speech Communication Laboratory Example-based APT: system overview (2) Transcription synthesis “B¨ acker” /b e k 6/ Constrained phone recognition Number of phones fixed b e o k 6 Most frequent phones from best matching [ be flk5 ] three-phone samples sil b b b e o e o e o e o k k 6 6 6 sil Unconstrained phone recognition Number of phones unknown ↓ List of n best matching samples b e o k 6 for each frame [ be flk5 ] Nearest neighbor classification C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 8/21

Signal Processing and Speech Communication Laboratory Evaluation Evaluation database: ADABA Austrian pronunciation database 6 professional speakers: Austrian, German and Swiss Narrow transcriptions: 89 phonemes - instead of 45 in SAMPA German About 12 000 utterances per speaker ( ∼ 5h speech) Recordings in studio quality Provided by Rudolf Muhr, Research Center for Austrian German http://adaba.at/ C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 9/21

Signal Processing and Speech Communication Laboratory Evaluation (2) Data set specification Restriction to a single speaker 85% training data, 5% development data, and 10% test data Evaluation measures Percentage of correct phones and phone accuracy PC = N − D − S PA = N − D − S − I × 100% × 100% N N N ... total number of phones in the reference transcription D ... number of deletions, S ... number of substitutions I ... number of insertions. C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 10/21

Signal Processing and Speech Communication Laboratory Evaluation (3) Benchmark: Comparison to a model-based transcriber Trained with Hidden Markov Toolkit (HTK) Same data and acoustic frontend 5-state left-to-right context-dependent triphone models with up to 16 GMMs For constrained phone recognition: Use of intermediate transcription for language model C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 11/21

Signal Processing and Speech Communication Laboratory Results Constrained phone recognition Int. Tr. Model-based Example-based PC 83.36% 90.88% 91.95% PA 81.22% 88.83% 89.89% Performance differences are significant at the 0.1% level using the Matched-Pairs test. C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 12/21

Signal Processing and Speech Communication Laboratory Results Constrained phone recognition Int. Tr. Model-based Example-based PC 83.36% 90.88% 91.95% PA 81.22% 88.83% 89.89% Performance differences are significant at the 0.1% level using the Matched-Pairs test. Unconstrained phone recognition Model-based Example-based PC 88.10% 85.21% PA 86.96% 82.38% Performance differences are significant at the 0.1% level using McNemar’s test. C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 12/21

Signal Processing and Speech Communication Laboratory Implementations EXTRA Standalone Java application Evaluation and analysis of transcriptions Batch transcription mode ELAN-EXTRA Extension for the ELAN linguistic annotation software http://www.spsc.tugraz.at/people/stefan-petrik/project-extra C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 13/21

Signal Processing and Speech Communication Laboratory ELAN-EXTRA C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 14/21

Signal Processing and Speech Communication Laboratory ELAN-EXTRA [ be flk5 ] C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 14/21

Signal Processing and Speech Communication Laboratory EXTRA C. Leitner, M. Schickbichler, S. Petrik 21 May 2010 page 15/21

Example-Based Automatic Phonetic Transcription Language Resources - PowerPoint PPT Presentation

Signal Processing and Speech Communication Laboratory Example-Based Automatic Phonetic Transcription Language Resources and Evaluation Conference 2010 Christina Leitner, Martin Schickbichler, Stefan Petrik Signal Processing and Speech

Why phonetic transcription? Global phonetic diversity Inconsistent orthography within

Phonetics Darrell Larsen Linguistics 101 Darrell Larsen Phonetics What Is Phonetics? Phonetic

Phonetics Darrell Larsen Linguistics 101 Darrell Larsen Phonetics What Is Phonetics? Phonetic

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

The complementarity of automatic, semi-automatic and phonetic measures of vocal tract output

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Theoretical Biology 2016 Transcription factors bind DNA to block or enhance transcription

Transcription: Pausing and Backtracking: Error Correction Mamata Sahoo and Stefan Klumpp Theory

A Phonetic Analysis of Igbo Tone Linda Chinelo Nkamigbo Department of Linguistics Nnamdi Azikiwe

1 I nternational Congress on Phonetic Sciences I CPhS 2019 Melbourne Convention Exhibition Centre

Leveraging a Corpus of Natural Language Descriptions for Program Similarity Meital Zilberstein

Training Global Linear Models for Chinese Word Segmentation Dong Song and Anoop Sarkar Natural

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of

Natural Language Processing (CSE 490U): Text Classification Noah Smith 2017 c University of

Synergy between Proteasome Inhibitors and IMiDs for the treatment of Multiple Myeloma Pr Philippe

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero