Language Stuff (Slides from Hal Daume III) Digitizing Speech 2 - PowerPoint PPT Presentation

Language Stuff (Slides from Hal Daume III)

Digitizing Speech 2 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Speech in an Hour ➢ Speech input is an acoustic wave form s p ee ch l a b “l” to “a” transition: Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: http://www.psyc.leeds.ac.uk/research/cogn/speech/tutorial/ 3 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Spectral Analysis Frequency gives pitch; amplitude gives volume ➢ sampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec) ➢ s p ee ch l a b e d u t i l p m a Fourier transform of wave displayed as a spectrogram ➢ darkness indicates energy at each frequency ➢ y c n e u q e r f 4 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Adding 100 Hz + 1000 Hz Waves 0.99 0 –0.9654 0 0.05 Time (s) 5 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Spectrum Frequency components (100 and 1000 Hz) on x-axis Amplitude 1000 Frequency in Hz 100 6 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Part of [ae] from “lab” Note complex wave repeating nine times in figure ➢ Plus smaller waves which repeats 4 times for every large ➢ pattern Large wave has frequency of 250 Hz (9 times in .036 ➢ seconds) Small wave roughly 4 times this, or roughly 1000 Hz ➢ Two little tiny waves on top of peak of 1000 Hz waves ➢ 7 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Back to Spectra Spectrum represents these freq components ➢ Computed by Fourier transform, algorithm which separates ➢ out each frequency component of wave. x-axis shows frequency, y-axis shows magnitude (in decibels, ➢ a log measure of amplitude) Peaks at 930 Hz, 1860 Hz, and 3020 Hz. ➢ 8 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Acoustic Feature Sequence ➢ Time slices are translated into acoustic feature vectors (~39 real numbers per slice) y c n e u q e r f …………………………………………….. e 12 e 13 e 14 e 15 e 16 ……….. ➢ These are the observations, now we need the hidden states X 9 Hal Daumé III (me@hal3.name) CS421: Intro to AI

State Space P(E|X) encodes which acoustic vectors are appropriate for ➢ each phoneme (each kind of sound) P(X|X’) encodes how sounds can be strung together ➢ We will have one state for each sound in each word ➢ From some state x, can only: ➢ Stay in the same state (e.g. speaking slowly) ➢ Move to the next position in the word ➢ At the end of the word, move to the start of the next word ➢ We build a little state graph for each word and chain them ➢ together to form our state space X 10 Hal Daumé III (me@hal3.name) CS421: Intro to AI

HMMs for Speech 11 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Markov Process with Bigrams Figure from Huang et al page 618 12 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Decoding While there are some practical issues, finding the words given ➢ the acoustics is an HMM inference problem We want to know which state sequence x 1:T is most likely ➢ given the evidence e 1:T : From the sequence x, we can simply read off the words ➢ 13 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Training (aka “preview of ML”) Two key components of a speech HMM: ➢ Acoustic model: p(E | X) ➢ Language model: p(X | X') ➢ Where do these come from? ➢ Can we estimate these models from data: ➢ p(E | X) might be estimated from transcribed speech ➢ p(X | X') might be estimated from large amounts of ➢ raw text 14 Hal Daumé III (me@hal3.name) CS421: Intro to AI

n-gram Language Models ➢ Assign a probability to a sequences of words I = ∏ p  w 1, w 2, ... ,w I  p  w i ∣ w 1, ... ,w i − 1  i = 1 I ≈ ∏ p  w i ∣ w i − k , ... ,w i − 1  i = 1 ➢ If I gave you a copy of the web, how would you estimate these probabilities? Need to “smooth” estimates intelligently to avoid zero probability n -grams. Language modeling is the art of good smoothing. See [Goodman 1998], [Teh 2007] 15 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Acoustic models ➢ What if I gave you data like: y c n e u q e r f ………………………...…………..sp ee ch l ae b...... ➢ How would you estimate p(E|X)? ➢ What's wrong with this approach? 16 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Acoustic models II ➢ What does our data really look like: Acc: yesterday I went to visit the speech lab W: ➢ We'd like to know alignments between transcript and waveform ➢ Suppose someone gave us a good speech recognizer.... could we figure out alignments from that? 17 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Expectation Maximization ➢ A general framework to do parameter estimation in the presence of hidden variables ➢ Repeat ad infinitum: E-step: make probabilistic guesses at latent variables ➢ M-step: fit parameters according to these guesses ➢ I LIKE A I W: 18 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Expectation Maximization e p( e | “I”) p( e | “LIKE”) p( e | “A”) → 2 → 1 → 1 0.33 0.33 0.33 Acc: 0.33 0.33 0.33 → 1 → 1 → 1 0.33 0.33 0.33 → 1 → 1 → 1 I LIKE A I W: 19 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Expectation Maximization e p( e | “I”) p( e | “LIKE”) p( e | “A”) → 4 → 1 → 1 0.5 0.33 0.33 Acc: 0.25 0.33 0.33 → 1 → 2 → 2 0.25 0.33 0.33 → 1 → 2 → 2 I LIKE A I W: 20 Hal Daumé III (me@hal3.name) CS421: Intro to AI

State of the Art DBNs for Speech 21 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Summary ➢ HMMs allow us to “separate” two models: acoustic model (how does what I want to say sound?) ➢ language model (what do I want to say) ➢ ➢ Speech recognition is “just” decoding in an HMM/DBN Plus a heck of a lot of engineering ➢ ➢ Expectation maximization lets us estimate parameters in models with hidden variables ➢ Most research today focuses on language modeling 22 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Translate Centauri -> Arcturan Your assignment, translate this Centauri sentence to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok sprok . 7a. lalok farok ororok lalok sprok izok enemok . 1b. at-voon bichat dat . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok sprok . 8a. lalok brok anok plok nok . 2b. at-drubel at-voon pippat rrat dat . 8b. iat lat pippat rrat nnat . 3a. erok sprok izok hihok ghirok . 9a. wiwok nok izok kantok ok-yurp . 3b. totat dat arrat vat hilat . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 10a. lalok mok nok yorok ghirok clok . 4b. at-voon krat pippat sat lat . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 11a. lalok nok crrrok hihok yorok zanzanok . 5b. totat jjat quat cat . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 12a. lalok rarok nok izok hihok mok . 6b. wat dat krat quat cat . 12b. wat nnat forat arrat vat gat . 23 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Topology of the Field SIGIR ICASSP NLP Generation Summarization Automatic Speech Information Recognition Retrieval Question Answering Machine Translation Human Language Technologies “Understanding” Computational Natural Language Information Extraction Linguistics Processing Parsing ??? ACL 24 Hal Daumé III (me@hal3.name) CS421: Intro to AI

A Bit of History 1940s Computations begins, AI hot, Turing test Machine translation = Code-breaking? 1950s Cold war continues 1960s Chomsky and statistics, ALPAC report 1970s Dry spell 1980s Statistics makes significant advances in speech 1990s Web arrives Statistical revolution in machine translation, parsing, IE, etc Serious “corpus” work, increasing focus on evaluation 2000s Focus on optimizing loss functions, reranking How much can we automate? Huge process in machine translation Gigantic corpora become available, scaling New challenges 25 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Ready-to-use Data 180 ) 160 e d i s 140 h s i l g 120 n E ( French-English 100 s d Chinese-English r o Arabic-English W 80 f o 60 s n o i 40 l l i M 20 0 1994 1996 1998 2000 2002 2004 26 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Classical MT (1970s and 1980s) Source Knowledge Target Text Base Text Source Transfer/ Target Language Interlingua Language Analysis Representation Generation Source Transfer Target Lexicon Rules Lexicon 27 Hal Daumé III (me@hal3.name) CS421: Intro to AI

Language Stuff (Slides from Hal Daume III) Digitizing Speech 2 - PowerPoint PPT Presentation

Language Stuff (Slides from Hal Daume III) Digitizing Speech 2 Hal Daum III (me@hal3.name) CS421: Intro to AI Speech in an Hour Speech input is an acoustic wave form s p ee ch l a b

Ultrafast spectroscopy Detector Stuff Generally you have some stuff and you want to

CRITICAL INFORMATICS Our stuff keeps your stuff from becoming their stuff CRITICAL INFORMATICS

XQUERY THE GETTING STUFF DONE LANGUAGE Jim Fuller, Principle Consultant MarkLogic XQuery

Music: April Showers By ProleteR Brought to you by: Stuff You Knead! Stuff You Knead!

Stuff Ive Seen: Retrospective and Prospective Susan Dumais SIGIR Desktop

Localization Radius Yicun Zhen Sep 13, 2013 Scheme Mathematical Stuff and Algorithm Numerical

College Student Success How Universities Can Impact Outcomes Some stuff you know and other stuff

Biohacking: Some/mes Spooky Stuff and Some/mes Wonderful Stuff

SELL MORE PRODUCTS ONLINE STEP FIVE SELL THEM STUFF PART 1 jane hamill 1 STEP 5: SELL STUFF

Random Stuff I Find Interesting Random Stuff I Find Interesting Matthew Dockrey

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

GROWTH MINDSET IN MATHEMATICS The Power of Positive Thinking Presented By: Naomi Church, FDLRS

Orders of Growth and Tree Recursion CoSc 450: Programming Paradigms 04 Graphics primitive

In the name of Allah f the compassionate, the merciful p , Digital Video Systems g y S.

ELG5377 Illustration of Performance of LMS for AR(2) Process Eric Dubois School of Electrical

Oscillation Results from Oscillation Results from MiniBooNE MiniBooNE Chris Polly, Univ. of

Advanced Camera Control From Introduction to: M. Haigh-Hutchinson, IMGD 4000 Real-Time Cameras:

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

St Statistical De Deobfuscati tion fo for Android Applications Benjamin Veselin Petar