Free English and Czech telephone speech corpus shared under the - - PowerPoint PPT Presentation

free english and czech telephone speech corpus
SMART_READER_LITE
LIVE PREVIEW

Free English and Czech telephone speech corpus shared under the - - PowerPoint PPT Presentation

Data Acoustic Modelling Scripts Evaluation Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license Matj Korvas, Ondej Pltek, Ondej Duek, Luk ilka, Filip Jurek Institute of Formal and Applied


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Free English and Czech telephone speech corpus

shared under the CC-BY-SA 3.0 license Matěj Korvas, Ondřej Plátek, Ondřej Dušek, Lukáš Žilka, Filip Jurčíček

Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University in Prague

May 30th, 2014 LREC, Reykjavík, Iceland

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 0/ 10 1/ 10

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Introduction

The Vystadial 2013 telephone speech corpus

  • Two corpora of transcribed telephone speech,

English and Czech

  • Under a free license
  • Distributed with scripts for ASR training

Outline

  • 1. Acquiring the data using crowdsourcing
  • 2. ASR training scripts
  • 3. Evaluation

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 2/ 10

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Introduction

The Vystadial 2013 telephone speech corpus

  • Two corpora of transcribed telephone speech,

English and Czech

  • Under a free license
  • Distributed with scripts for ASR training

Outline

  • 1. Acquiring the data using crowdsourcing
  • 2. ASR training scripts
  • 3. Evaluation

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 2/ 10

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Motivation

ASR for a spoken dialogue system?

  • Commercial (Nuance & others) – costly, restrictive license
  • Cloud-based (Google, Nuance) – costly or unclear licensing
  • Custom ASR model – data needed
  • Available for English
  • Restrictive license and/or costly for non-LDC members

The Vystadial 2013 Speech corpus

  • English and Czech, telephone speech
  • CC-BY-SA 3.0 license: for research and commercial use
  • Training scripts for HTK and Kaldi ASR toolkits

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 3/ 10

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Motivation

ASR for a spoken dialogue system?

  • Commercial (Nuance & others) – costly, restrictive license
  • Cloud-based (Google, Nuance) – costly or unclear licensing
  • Custom ASR model – data needed
  • Available for English
  • Restrictive license and/or costly for non-LDC members

The Vystadial 2013 Speech corpus

  • English and Czech, telephone speech
  • CC-BY-SA 3.0 license: for research and commercial use
  • Training scripts for HTK and Kaldi ASR toolkits

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 3/ 10

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

English Data

Collection

  • Using crowdsourcing via Amazon Mechanical Turk
  • Most speakers: American English
  • Interaction with a spoken dialogue system – restaurant

information domain

Transcription

  • Also using Amazon Mechanical Turk
  • Quality checks, restricted to experienced workers
  • Orthographic, with non-speech events
  • __NOISE__, __LAUGH__

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 4/ 10

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

English Data

Collection

  • Using crowdsourcing via Amazon Mechanical Turk
  • Most speakers: American English
  • Interaction with a spoken dialogue system – restaurant

information domain

Transcription

  • Also using Amazon Mechanical Turk
  • Quality checks, restricted to experienced workers
  • Orthographic, with non-speech events
  • __NOISE__, __LAUGH__

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 4/ 10

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Data Collection – Czech

Collection

  • Using crowdsourcing, free Czech phone numbers (AMT

unavailable)

  • Call-a-friend
  • Repeat-after-me
  • Spoken dialogue system – public transport information
  • License agreement at the beginning of the call

Transcription

  • Similar to English
  • Hired transcribers
  • Anonymization (personal information excluded)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 5/ 10

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Data Collection – Czech

Collection

  • Using crowdsourcing, free Czech phone numbers (AMT

unavailable)

  • Call-a-friend
  • Repeat-after-me
  • Spoken dialogue system – public transport information
  • License agreement at the beginning of the call

Transcription

  • Similar to English
  • Hired transcribers
  • Anonymization (personal information excluded)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 5/ 10

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Data

Size

  • English: 41 hours, 47k sentences (178k words)
  • Czech:

15 hours, 22k sentences (126k words)

  • + 2k sents dev, 2k sents test in both languages

(ca. 1.5 hr each)

Characteristics

  • Different sources (no problem for a general acoustic model)
  • English: narrow domain
  • Czech: general domain (multiple domains)
  • 16kHz mono WAV files (X.wav)

+ matching plain text files with transcription (X.wav.trn)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 6/ 10

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Data

Size

  • English: 41 hours, 47k sentences (178k words)
  • Czech:

15 hours, 22k sentences (126k words)

  • + 2k sents dev, 2k sents test in both languages

(ca. 1.5 hr each)

Characteristics

  • Different sources (no problem for a general acoustic model)
  • English: narrow domain
  • Czech: general domain (multiple domains)
  • 16kHz mono WAV files (X.wav)

+ matching plain text files with transcription (X.wav.trn)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 6/ 10

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

ASR Acoustic Modelling Scripts

  • Scripts to create acoustic models for ASR
  • Coding recordings into MFCCs + ∆ + ∆∆ features
  • For both languages, for HTK and Kaldi
  • Easily applicable to other data sets (and other languages):
  • Just need X.wav + X.wav.trn
  • Language-specific parts:
  • List of phones in the language
  • Orthography-to-phonetics mapping (dictionary and/or rules)
  • “Phonetic questions” – to group similar triphones (HTK only)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 7/ 10

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

ASR Acoustic Modelling Scripts

  • Scripts to create acoustic models for ASR
  • Coding recordings into MFCCs + ∆ + ∆∆ features
  • For both languages, for HTK and Kaldi
  • Easily applicable to other data sets (and other languages):
  • Just need X.wav + X.wav.trn
  • Language-specific parts:
  • List of phones in the language
  • Orthography-to-phonetics mapping (dictionary and/or rules)
  • “Phonetic questions” – to group similar triphones (HTK only)

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 7/ 10

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

HTK vs. Kaldi

HTK

  • Hidden Markov models, Gaussian mixtures
  • EM training: uniform → monophone → triphone model
  • Triphones clustered using phonetic questions

Kaldi

  • Finite state transducers
  • Generative models parallel to HTK (but Viterbi training)
  • Discriminative models:
  • Multiple methods and feature transformations available
  • Our models: non-speaker-adaptive
  • BMMI training (with unigram LM), LDA + MLLT

transformations

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 8/ 10

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

HTK vs. Kaldi

HTK

  • Hidden Markov models, Gaussian mixtures
  • EM training: uniform → monophone → triphone model
  • Triphones clustered using phonetic questions

Kaldi

  • Finite state transducers
  • Generative models parallel to HTK (but Viterbi training)
  • Discriminative models:
  • Multiple methods and feature transformations available
  • Our models: non-speaker-adaptive
  • BMMI training (with unigram LM), LDA + MLLT

transformations

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 8/ 10

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

HTK vs. Kaldi

HTK

  • Hidden Markov models, Gaussian mixtures
  • EM training: uniform → monophone → triphone model
  • Triphones clustered using phonetic questions

Kaldi

  • Finite state transducers
  • Generative models parallel to HTK (but Viterbi training)
  • Discriminative models:
  • Multiple methods and feature transformations available
  • Our models: non-speaker-adaptive
  • BMMI training (with unigram LM), LDA + MLLT

transformations

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 8/ 10

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Evaluation

  • Generative with similar complexity + discriminative for Kaldi
  • 0-gram and bigram LMs (testing acoustic models & real use)
  • Czech: bigger dictionary & higher perplexity than English

Word Error Rate kit method 0-gram bigram Czech HTK tri 64.5 60.4 Kaldi tri 69.3 53.8 tri LDA + MLLT 65.4 51.2 tri LDA + MLLT / BMMI – 48.0 English HTK tri 50.0 17.5 Kaldi tri 41.1 17.5 tri LDA + MLLT 37.3 17.2 tri LDA + MLLT / BMMI – 12.0

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 9/ 10

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Evaluation

  • Generative with similar complexity + discriminative for Kaldi
  • 0-gram and bigram LMs (testing acoustic models & real use)
  • Czech: bigger dictionary & higher perplexity than English

Word Error Rate kit method 0-gram bigram Czech HTK tri ∆ + ∆∆ 64.5 60.4 Kaldi tri ∆ + ∆∆ 69.3 53.8 tri LDA + MLLT 65.4 51.2 tri LDA + MLLT / BMMI – 48.0 English HTK tri ∆ + ∆∆ 50.0 17.5 Kaldi tri ∆ + ∆∆ 41.1 17.5 tri LDA + MLLT 37.3 17.2 tri LDA + MLLT / BMMI – 12.0

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 9/ 10

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Thank you for your attention

Links

  • The corpora (CC-BY-SA 3.0 + Apache 2.0):

http://bit.ly/free-phone-corp

  • Online lattice decoding for Kaldi:

Plátek & Jurčíček: Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices. To appear at SIGDIAL in June.

  • Our spoken dialogue systems framework (Apache 2.0):

https://github.com/UFAL-DSG/alex

Contact us

Ondřej Dušek Institute of Formal and Applied Linguistics Charles University in Prague

  • dusek@ufal.mff.cuni.cz

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 10/ 10

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Thank you for your attention

Links

  • The corpora (CC-BY-SA 3.0 + Apache 2.0):

http://bit.ly/free-phone-corp

  • Online lattice decoding for Kaldi:

Plátek & Jurčíček: Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices. To appear at SIGDIAL in June.

  • Our spoken dialogue systems framework (Apache 2.0):

https://github.com/UFAL-DSG/alex

Contact us

Ondřej Dušek Institute of Formal and Applied Linguistics Charles University in Prague

  • dusek@ufal.mff.cuni.cz

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 10/ 10

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Thank you for your attention

Links

  • The corpora (CC-BY-SA 3.0 + Apache 2.0):

http://bit.ly/free-phone-corp

  • Online lattice decoding for Kaldi:

Plátek & Jurčíček: Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices. To appear at SIGDIAL in June.

  • Our spoken dialogue systems framework (Apache 2.0):

https://github.com/UFAL-DSG/alex

Contact us

Ondřej Dušek Institute of Formal and Applied Linguistics Charles University in Prague

  • dusek@ufal.mff.cuni.cz

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 10/ 10

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data Acoustic Modelling Scripts Evaluation

Thank you for your attention

Links

  • The corpora (CC-BY-SA 3.0 + Apache 2.0):

http://bit.ly/free-phone-corp

  • Online lattice decoding for Kaldi:

Plátek & Jurčíček: Free on-line speech recogniser based on Kaldi ASR toolkit producing word posterior lattices. To appear at SIGDIAL in June.

  • Our spoken dialogue systems framework (Apache 2.0):

https://github.com/UFAL-DSG/alex

Contact us

Ondřej Dušek Institute of Formal and Applied Linguistics Charles University in Prague

  • dusek@ufal.mff.cuni.cz

Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus 10/ 10