Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07

Speech Synthesis “Computers are getting smarter all the time. Scientists tell us that soon they will be able to talk with us. (By “they”, I mean computers. I doubt scientists will ever be able to talk to us.) - Dave Barry

Speech Synthesis in year 1791

Speech Synthesis in year 1835 J. Faber “Euphonia” http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1937 Riesz Model http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1939 H.Dudley “VODER” http://www.ling.su.se/staff/hartmut/kemplne.htm

Speech Synthesis in year 1953 Gunnar Fant's “OVE” (Orator Verbis Electris) Formant Synthesizer for vowels http://www.ling.su.se/staff/hartmut/kemplne.htm

Formant Synthesis

http://www.geofex.com/Article_Folders/wahpedl/voicewah.htm

Modern Speech Synthesis ● 1968 - First full TTS (Umeda et al.) ● 1977 – Diphone concat. (J. Olive) ● 1979 – MITTalk (Allen et al) ● 1984 – DECTalk (Klatt, DEC) ● 1995 – Eurovocs ● 200? - IBM

Modern Speech Synthesis ● 1968 - First full TTS (Umeda et al.) ● 1977 – Diphone concat. (J. Olive) ● 1979 – MITTalk (Allen et al) ● 1984 – DECTalk (Klatt, DEC) ● 1995 – Eurovocs Rule-based ● 200? - IBM Data-driven

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture

Text-to-Speech System Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is http://www.stanford.edu/class/linguist236/

Text-to-Speech System Data-driven? Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is

1) Text Normalization ● He stole $100 million from the bank. ● It's 13 St. Andrews St. ● The home page is http://www.ut.ee. Method: ● Split to tokens. ● Map tokens to words. ● Identify types for words.

2) Phonetic Analysis ● My latest project is to learn how to better project my voice. ● On May 5 1996, the university bought 1996 computers. ● Yesterday it rained 3 in. Take 1 out, then put 3 in.

2) Phonetic Analysis ● How to pronounce a word? – Look in the dictionary! ● But what about unknown words and names? ● Complex languages: German/French/Turkish – Letter to sound rules ● .. also neural networks (NETTalk) ● .. pr. by analogy (PRONOUNCE) ● .. case-based (MBRTalk) more later ● ... and muc uch more.

3) Prosodic Analysis ● Prosody: phrases, accents, F0 contour, duration ● The Tilt Intonation Model e.g. Trees

4) Waveform synthesis ● Articulatory synthesis (a-la VODER) ● Formant (a-la OVE) ● Concatenative synthesis – Domain-specific (“talking clock”, “weather”) – Diphones (PSOLA, MBROLA) – Unit selection

4) Waveform synthesis ● Domain-specific synthesis is easy: #!/bin/bash hours=`date +"%-l"` mins=`date +"%-M"` ampm=`date +"%-P"` play $hours.wav play $mins.wav play $ampm.wav

4) Waveform synthesis ● Diphone synthesis – Use diphones: middle of one phone to middle of next. – Just a bit of DSP to connect diphones. ● PSOLA ● MBROLA

4) Waveform synthesis ● Unit selection – Use the entire speech corpus as the acoustic inventory. – Select at runtime the longest available string of phonetic segments. – Minimize number of concatenations. – Reduce DSP.

Text-to-Speech System Data-driven? Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture ● Grapheme-to-Phoneme transcription

GTP transcription ● Lexicon: – “cepstra” -> (k eh p)' (s t r aa) – What about unknown words? – Commercial systems have 3-part system: ● Big dictionary ● Special code for names/acronyms/etc ● Mach Machine-learned ine-learned let letter ter-to-soun o-sound (LTS) syst (LTS) system em for other unknown words

Learning LTS rules ● Induce LTS from a dictionary of the language (Black et al. 1998) ● Two steps: – Alignment – Decision tree-based rule-induction

Alignment ● Letters: c h e c k e d ● Phones: ch _ eh _ k _ t ● Black et al. propose 2 methods: – Expectation-Maximization – Estimate p(letter | phone) from valid alignments, take best. ● Devil in the details

Decision trees for LTS ● Now that aligned data is available, train a decision tree: – ### c hek -> ch – che c ked -> _ ● 92-96% letter acc. (58-75% word acc.) for English

GTP transcription ● Decision-tree based (Black et al.) ● ANN-based (NETTalk, Sejnowski et al.) ● Pronunciation-by-Analogy (Damper et al.) ● Memory-based (MBRTalk, Stanfill) ● Transducer-based (I. Bulyko) ● Non-segmental (A. Cohen)

Outline ● History of Speech Synthesis ● Text-To-Speech System Architecture ● Grapheme-to-Phoneme transcription ● Conclusion

Text-to-Speech System Text Text Analysi Analysis ● Text normalization ● PoS tagging Phoneti onetic c analys nalysis ● Homonym disambiguation ● Dictionary Lookup ● Grapheme-to-Phoneme Pros rosod odic A ic Ana nalys lysis is ● Boundary placement ● Pitch accent assignment ● Duration computation Wa Wavefor orm Synth ynthes esis is http://www.stanford.edu/class/linguist236/

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07 Speech Synthesis Computers are getting smarter all the time. Scientists tell us that soon they will

Lake Toho East DRIs DRIs Lake Toho East Edgewater DRI Edgewater DRI Toho Preserve DRI Toho

So Sorting ing do documents uments by b y base se the heme me wit ith h sy synt nthe

CAL IF ORNIA HIGH- - SPE SPE E D RAIL CAL IF ORNIA HIGH E D RAIL CAL IF ORNIA HIGH-

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

PARTS OF ARTS OF SPE SPEECH ECH GRAMMAR 8 P 8 PART ARTS S OF SP OF SPEECH EECH 1) Noun

SPE STATE OF THE UNION SPE STATE OF THE UNION Presented to Presented to Mexico Section SPE

CONNECTING THE WORLD Silicon | Systems | IoT / @MosChipT ech 2 A GIMPSE OF / @MosChipT ech

Welcome City of Cortland Downtown Revitalization Initiative (DRI) Public Meeting November 2,

nd Redesign Meeting House Way 2 nd DRI 682B via Zoom Drone footage previous DRI 682 Meeting

Cont ntrolling lling p potent ntia ial g l geno notoxic xic imp impur urit itie ies s

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

SPE Competency Management Tool SPE Competency Management Tool New Technology Standards &

SPE 63086 (originally SPE 49269) Miscibility Variation in Compositionally Grading Reservoirs

Jey ONeill Catherine Ryan DRI Data Curator Digital Librarian, Digital Repository of

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

POTSDAM DRI POTSDAM DRI DOWNTOWN REVITALIZATION INITIATIVE OPEN HOUSE March 3, 2020

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Letter-to-Phoneme Conversion for a German Text-to-Speech System Vera Demberg Institut fr

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel

SI485i Natural Language Processing Set 1 Intro to NLP Fall 2013 : Chambers Assumptions about

Latvian Text-to-Speech Synthesizer Mrcis Pinnis Ilze Auzia Marcis.Pinnis@lumii.lv

StructuralTextFeatures CISC489/689010,Lecture#13 Monday,April6 th

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne

bounding-box April 9, 2019 1 Boxes in Object Detection In [1]: % matplotlib inline import d2l

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis - PowerPoint PPT Presentation

Seminar on Language Technology Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee 11.12.07 Speech Synthesis Computers are getting smarter all the time. Scientists tell us that soon they will

Lake Toho East DRIs DRIs Lake Toho East Edgewater DRI Edgewater DRI Toho Preserve DRI Toho

So Sorting ing do documents uments by b y base se the heme me wit ith h sy synt nthe

CAL IF ORNIA HIGH- - SPE SPE E D RAIL CAL IF ORNIA HIGH E D RAIL CAL IF ORNIA HIGH-

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

PARTS OF ARTS OF SPE SPEECH ECH GRAMMAR 8 P 8 PART ARTS S OF SP OF SPEECH EECH 1) Noun

SPE STATE OF THE UNION SPE STATE OF THE UNION Presented to Presented to Mexico Section SPE

CONNECTING THE WORLD Silicon | Systems | IoT / @MosChipT ech 2 A GIMPSE OF / @MosChipT ech

Welcome City of Cortland Downtown Revitalization Initiative (DRI) Public Meeting November 2,

nd Redesign Meeting House Way 2 nd DRI 682B via Zoom Drone footage previous DRI 682 Meeting

Cont ntrolling lling p potent ntia ial g l geno notoxic xic imp impur urit itie ies s

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

SPE Competency Management Tool SPE Competency Management Tool New Technology Standards &amp;

SPE 63086 (originally SPE 49269) Miscibility Variation in Compositionally Grading Reservoirs

Jey ONeill Catherine Ryan DRI Data Curator Digital Librarian, Digital Repository of

Bitly Link &amp; DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

POTSDAM DRI POTSDAM DRI DOWNTOWN REVITALIZATION INITIATIVE OPEN HOUSE March 3, 2020

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Letter-to-Phoneme Conversion for a German Text-to-Speech System Vera Demberg Institut fr

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel

SI485i Natural Language Processing Set 1 Intro to NLP Fall 2013 : Chambers Assumptions about

Latvian Text-to-Speech Synthesizer Mrcis Pinnis Ilze Auzia Marcis.Pinnis@lumii.lv

StructuralTextFeatures CISC489/689010,Lecture#13 Monday,April6 th

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne

bounding-box April 9, 2019 1 Boxes in Object Detection In [1]: % matplotlib inline import d2l

SPE Competency Management Tool SPE Competency Management Tool New Technology Standards &

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT