Unit 2: Natural Language Learning Unsupervised Learning (EM, - PowerPoint PPT Presentation

Natural Language Processing Spring 2017 Unit 2: Natural Language Learning Unsupervised Learning (EM, forward-backward, inside-outside) Liang Huang liang.huang.sh@gmAYl.com

Review of Noisy-Channel Model CS 562 - EM 2

Example 1: Part-of-Speech Tagging • use tag bigram as a language model • channel model is context-indep. CS 562 - EM 3

Ideal vs. AvAYlable Data ideal avAYlable CS 562 - EM 4

Ideal vs. AvAYlable Data HW2: ideal HW4: realistic EY B AH L EY B AH L A B E R U A B E R U 1 2 3 4 4 AH B AW T AH B AW T A B A U T O A B A U T O 1 2 3 3 4 4 AH L ER T AH L ER T A R A A T O A R A A T O 1 2 3 3 4 4 EY S EY S E E S U E E S U 1 1 2 2 CS 562 - EM 5

Incomplete Data / Model CS 562 - EM 6

EM: Expectation-Maximization CS 562 - EM 7

How to Change m ? 1) Hard CS 562 - EM 8

How to Change m ? 1) Hard CS 562 - EM 9

How to Change m ? 2) Soft CS 562 - EM 10

Fractional Counts • distribution over all possible hallucinated hidden variables • W AY N W AY N W AY N W AY N | | / \ | |\ \ |\ \ \ W A I N W A I N W A I N W A I N hard-EM counts 1 0 0 fractional counts 0.333 0.333 0.333 AY|-> A: 0.333 A I: 0.333 I: 0.333 W|-> W: 0.667 W A: 0.333 N|-> N: 0.667 I N: 0.333 regenerate: 2/3*1/3*1/3 2/3*1/3*2/3 1/3*1/3*2/3 fractional counts 0.25 0.5 0.25 AY|-> A I: 0.500 A: 0.250 I: 0.250 W|-> W: 0.750 W A: 0.250 N|-> N: 0.750 I N: 0.250 eventually ... 0 ... 1 ... 0 CS 562 - EM 11

Is EM magic? well, sort of... • how about W EH T W E T O B IY B IY | |\ |\ \ B I I B I I • so EM can possibly: (1) learn something correct (2) learn something wrong (3) doesn’t learn anything • but with lots of data => likely to learn something good CS 562 - EM 12

EM: slow version (non-DP) • initialize the conditional prob. table to uniform • repeat until converged: W AY N W AY N W AY N | | /\ | |\ \ |\ \ \ W A I N W A I N W A I N • E-step: z z’ z ’’ ( z 1 z 2 z 3 ) • for each training example x (here: (e...e, j...j) pAYr): • for each hidden z: compute p ( x, z ) from the current model • p ( x ) = sum z p ( x, z ); [debug: corpus prob p (data) *= p ( x )] • for each hidden z = ( z 1 z 2 ... z n ) : for each i : • #( z i ) += p ( x, z ) / p ( x ); #(LHS (z i )) += p ( x, z ) / p ( x ) • M-step: count-n-divide on fraccounts => new model p (A I|AY)=#(AY->A I)/#(AY) • p (RHS( z i ) | LHS( z i )) = #( z i ) / #(LHS( z i )) CS 562 - EM 13

EM: slow version (non-DP) • distribution over all possible hallucinated hidden variables • W AY N W AY N W AY N W AY N | | / \ | |\ \ |\ \ \ W A I N W A I N W A I N W A I N fractional counts 1/3 1/3 1/3 AY|-> A: 0.333 A I: 0.333 I: 0.333 W|-> W: 0.667 W A: 0.333 N|-> N: 0.667 I N: 0.333 regenerate p( x , z ) : 2/3*1/3*1/3 2/3*1/3*2/3 1/3*1/3*2/3 renormalize by p (x) = 2/27 + 4/27 + 2/27 = 8/27 fractional counts 1/4 1/2 1/4 AY|-> A I: 0.500 A: 0.250 I: 0.250 ++ W|-> W: 0.750 W A: 0.250 N|-> N: 0.750 I N: 0.250 regenerate p( x , z ) : 3/4*1/4*1/4 3/4*1/2*3/4 1/4*1/4*3/4 renormalize by p (x) = 3/64 + 18/64 + 3/64 = 3/8 fractional counts 1/8 3/4 1/8 CS 562 - EM 14

EM: fast version (DP) • initialize the conditional prob. table to uniform • repeat until converged: back [ v ] v forw [ u ] s t u • E-step: forw [ t ] = back [ s ] = p ( x ) = sum z p ( x, z ) • for each training example x (here: (e...e, j...j) pAYr): • forward from s to t ; note: forw [ t ] = p ( x ) = sum z p ( x, z ) • backward from t to s ; note: back [ t ]=1; back [ s ] = forw [ t ] • for each edge ( u, v) in the DP graph with label ( u, v ) = z i • fraccount( z i ) += forw [ u ] * back [ v ] * prob ( u , v ) / p ( x ) • M-step: count-n-divide on fraccounts => new model sum z : ( u, v ) in z p ( x, z ) CS 562 - EM 15

How to avoid enumeration? • dynamic programming: the forward-backward algorithm • forward is just like Viterbi, replacing max by sum • backward is like reverse Viterbi (also with sum) POS tagging, alignment, crypto, ... edit-distance, ... inside-outside: PCFG, SCFG, ... CS 562 - EM 16

Example Forward Code • for HW5. this example shows forward only. n, m = len(eprons), len(jprons) forward[0][0] = 1 for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score 2 3 4 0 1 totalprob *= forward[n][m] W A I N 0 W 1 AY 2 N CS 562 - EM 17 3

Example Forward Code • for HW5. this example shows forward only. n, m = len(eprons), len(jprons) forward[0][0] = 1 back [ v ] v forw [ u ] s t u for i in xrange(0, n): epron = eprons[i] forw [ s ] = back [ t ] = 1.0 for j in forward[i]: for k in range(1, min(m-j, 3)+1): forw [ t ] = back [ s ] = p ( x ) jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score m j j+k 0 totalprob *= forward[n][m] ... ... A I ... ... s 0 forw [ i ][ forw [ i ][ forw forw rw [ i ][ j ] rw [ i ][ j ] j ] j ] i u AY i+ 1 v back [ i +1][ back [ back [ i +1][ back [ [ i +1][ j + k ] [ i +1][ j + k ] CS 562 - EM 18 n t

EM: fast version (DP) • initialize the conditional prob. table to uniform • repeat until converged: back [ v ] v forw [ u ] s t u • E-step: forw [ t ] = back [ s ] = p ( x ) = sum z p ( x, z ) • for each trAYning example x (here: (e...e, j...j) pAYr): • forward from s to t ; note: forw [ t ] = p ( x ) = sum z p ( x, z ) • backward from t to s ; note: back [ t ]=1; back [ s ] = forw [ t ] • for each edge ( u, v) in the DP graph with label ( u, v ) = z i • fraccount( z i ) += forw [ u ] * back [ v ] * prob ( u , v ) / p ( x ) • M-step: count-n-divide on fraccounts => new model sum z : ( u, v ) in z p ( x, z ) CS 562 - EM 19

EM CS 562 - EM 20

Why EM increases p (data) iteratively? CS 562 - EM 21

Why EM increases p (data) iteratively? converge to local maxima KL-divergence convex auxiliary function CS 562 - EM 22

How to maximize the auxiliary? W AY N W AY N W AY N | | /\ | |\ \ |\ \ \ W A I N W A I N W A I N p(z’|x)=0.3 p(z’’ | x )= 0.2 p(z|x)=0.5 just count-n-divide on the fractional data! (as if MLE on complete data) W AY N W AY N W AY N | | /\ |\ \ \ | |\ \ W A I N W A I N W A I N 2x 3x 5x CS 562 - EM 23

Unit 2: Natural Language Learning Unsupervised Learning (EM, - PowerPoint PPT Presentation

Natural Language Processing Spring 2017 Unit 2: Natural Language Learning Unsupervised Learning (EM, forward-backward, inside-outside) Liang Huang liang.huang.sh@gmAYl.com Review of Noisy-Channel Model CS 562 - EM 2 Example 1:

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Regeneration: A New Algorithm in Numerical Algebraic Geometry Charles Wampler General Motors

NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds Yuchong Hu 1 , Henry

Fishbanks Jason Jay Senior Lecturer in Sustainability Director, Sustainability Initiative at MIT

INTRODUCING SIMULATED STEM CELLS INTO A BIO-INSPIRED CELL- CELL COMMUNICATION MECHANISM FOR

ProtoDUNE-SP Purity Monitor Operation Jianming Bian, Isloo Seong (UCI) Casandra Morris, Andrew

S iRiUS: Securing Remote Untrusted S torage NDS S 2003 Eu-Jin Goh, Hovav S hacham,

Identifying Load-Balanced Backends Ian Rodney 1 Why does it matter? Targeted DDoS

Welcome. @wordpressscv meetup.com/wordpressscv twitter.com/wordpressscv