unsupervised vocabulary induction
play

Unsupervised Vocabulary Induction 8 month-old babies exposed to - PowerPoint PPT Presentation

Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After only 2 minutes of exposure, infants


  1. Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction • 8 month-old babies exposed to stream of syllables • Stream composed of synthetic words (pabikumalikiwabufa) • After only 2 minutes of exposure, infants can MIT distinguish words from non-words (e.g., pabiku vs. kumali) Today: Unsupervised Vocabulary Vocabulary Induction Induction Task: Unsupervised learning of word boundary segmentation • Simple: Ourenemiesareinnovativeandresourceful,andsoarewe. • Vocabulary Induction from Unsegmented Text Theyneverstopthinkingaboutnewwaystoharmourcountry • Vocabulary Induction from Speech Signal andourpeople,andneitherdowe. – Sequence Alignment Algorithms • More ambitious:

  2. Word Segmentation (Ando&Lee, 2000) Algorithm for Word Segmentation (Cont.) Key idea: for each candidate boundary, compare the frequency of the n-grams adjacent to the proposed boundary with the frequency of the n-grams that Place boundary at all locations l such that either: straddle it. • l is a local maximum: v N ( l ) > v N ( l − 1) and ? S S v N ( l ) > v N ( l + 1) 1 2 T I N G E V I D • v N ( l ) ≥ t , a threshold parameter t 1 t 2 t 3 t V (k) For N = 4 , consider the 6 questions of the form: N A B | C D | W X | Y| Z ”Is # ( s i ) ≥ # ( t j ) ?”, where #(x) is the number of occurrences of x Example: Is “TING” more frequent in the corpus than ”INGE”? Algorithm for Word Segmentation Experimental Framework s n non-straddling n-grams to the left of location k 1 s n non-straddling n-grams to the right of location k 2 t n straddling n-gram with j characters to the right of location k j I ≥ ( y, z ) indicator function that is 1 when y ≥ z , and 0 otherwise. • Corpus: 150 megabytes of 1993 Nikkei newswire 1. Calculate the fraction of affirmative answers for • Manual annotations: 50 sequences for development each n ≤ N : set (parameter tuning) and 50 sequences for test set 2 n − 1 1 � � I ≥ (#( s n i ) , #( t n v n ( k ) = j )) • Baseline algorithms: Chasen and Juman 2 ∗ ( n − 1) i =1 j =1 morphological analyzers (115,000 and 231,000 2. Average the contributions of each n-gram order words) v N ( k ) = 1 � v n ( k ) N n ∈ N

  3. Evaluation Today: Unsupervised Vocabulary Induction • Precision (P): the percentage of proposed brackets that exactly match word-level brackets in the annotation • Vocabulary Induction from Unsegmented Text • Recall (R): the percentage of word-level annotation • Vocabulary Induction from Speech Signal brackets that are proposed by the algorithm – Sequence Alignment Algorithms P R • F = 2 ( P + R ) • F = 82% (improvement of 1.38% over Jumann and of 5.39% over Chasen) Performance on other datasets Aligning Two Sequences Given two possibly related strings S 1 and S 2 , find the longest common subsequence Orwell(English) 79.8 Song lyrics (Romaji) 67.6 Cheng & Mitzenmacher Goethe (German) 75.2 Verne (French) 72.9 Arrighi (Italian) 73.1

  4. How can We Compute Best Alignment Key Insight: Score is Additive • We need a scoring system for ranking alignments – Substitution Cost A G T C Compute best alignment recursively A 1 0.5 -1 -1 • For a given aligned pair ( i, j ) , the best alignment is: G -0.5 1 -1 -1 Best alignment of S 1[1 . . . i ] and S 2[1 . . . j ] T -1 -1 +1 -0.5 + Best alignment of S 1[ i . . . n ] and S 2[ j . . . m ] C -1 -1 -0.5 1 – Gap (insertion&deletion) Cost Can We Simply Enumerate All Possible Alignment Matrix Alignments? Alignment of two sequences can be modeled as a task of • Naive enumeration is prohibitively expensive finding the path with the highest weight in a matrix � � 2 m + n n + m = ( m + n )! ≈ H E A G A W G ( m !) 2 m � ( n ∗ m ) Alignment: - - P - A W - n=m Enumeration Corresponding Path: 10 184,756 H E A G A W G 20 1.4E+11 + + + 100 9.00E+58 P + + • Alignment using dynamic programming can be done in A + O ( n · m ) W + +

  5. Global Alignment: Needleman-Wunsch Dynamic Programming Formulation Algorithm • To align two strings x , y , we construct a matrix F • We know how to compute the best score – F(i,j): the score of the best alignment between – The number at the bottom right entry (i.e., the initial segment s 1 ...i of x up to x i and the F ( n, m ) ) initial segment y 1 ...j of y up to y j • But we need to remember where it came from • We compute F recursively: F (0 , 0) = 0 – Pointer to the choice we made at each step F(i−1,j−1) F(i,j−1) • Retrace path through the matrix s(xi,yj) −d – Need to remember all the pointers F(i−1,j) F(i,j) −d Time: O ( m · n ) Dynamic Programming Formulation Local alignment: Smith-Waterman Algorithm s ( x i , y j ) similarity between x i and y j gap penalty d • Global alignment: find the best match between  F ( i − 1 , j − 1) + s ( x i , y j )  sequences from one end to the other   F ( i, j ) = max F ( i − 1 , j ) − d • Local alignment: find the best match between   F ( i, j − 1) − d  subsequences of two sequences Boundary conditions: – Useful for comparing highly divergent sequences when only local similarity is expected • The top row: F ( i, 0) = − id F ( i, 0) represents alignments of prefix x to all gaps in y • The left column: F (0 , j ) = − jd

  6. Dynamic Programming Formulation Today: Unsupervised Vocabulary Induction  0     F ( i − 1 , j − 1) + s ( x i , y j )   F ( i, j ) = max F ( i − 1 , j ) − d      F ( i, j − 1) − d • Vocabulary Induction from Unsegmented Text  • Vocabulary Induction from Speech Signal Boundary conditions: F ( i, 0) = F (0 , j ) = 0 – Sequence Alignment Algorithms Finding the best local alignment • Find the highest value of F ( i, j ) , and start the traceback from there • The traceback ends when a cell with value 0 is found Local vs. Global Alignment Finding Words in Speech H E A G A W G P -2 -1 -1 -2 -1 -4 -2 SimilarityMatrix A -2 -1 5 0 5 -3 0 • Traditional approached to speech recognition are W -3 -3 -3 -3 -3 15 -3 supervised: H E A G A W G 0 -8 -16 -24 -32 -40 -48 -56 – Recognizers are trained using a large corpus of GlobalAlignment P -8 -2 -9 -17 -25 -33 -42 -49 A -16 -10 -3 -4 -12 -20 -28 -36 speech with corresponding transcripts W -24 -18 -11 -6 -7 -15 -5 -13 – During the training process, a recognizer is H E A G A W G 0 0 0 0 0 0 0 0 provided with a vocabulary LocalAlignment P 0 0 0 0 0 0 0 0 A 0 0 0 5 0 5 0 0 • Is it possible to learn vocabulary directly from W 0 0 0 0 2 0 20 12 speech signal?

  7. Vocabulary Induction: Outline Spectral Vectors • Spectral vector is a vector where each component is a measure of energy in a particular frequency band • We divide acoustic signal (a one dimensional wave form) into short overlapping intervals (25 msec with 15 msec overlap) • We convert each overlapping window using Fourier transform Comparing Acoustic Signals Example of Spectral Vectors he too was diagnosed with paranoid schizophrenia 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (sec) were willing to put nash’s schizophrenia on record 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 Time (sec)

  8. Comparing Spectral Vectors Computing Local Alignment • Divide acoustic signal to “word segments” based on pauses • Compute spectral vectors for each segment • Build a distance matrix for each pair of “word segments” – use Euclidean distance to compare between spectral vectors Example of Distance Matrix Clustering Similar Utterance

  9. Examples of Computed Clusters

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend