Unsupervised Vocabulary Induction 8 month-old babies exposed to - - PowerPoint PPT Presentation

unsupervised vocabulary induction
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Vocabulary Induction 8 month-old babies exposed to - - PowerPoint PPT Presentation

Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After only 2 minutes of exposure, infants


slide-1
SLIDE 1

Unsupervised Vocabulary Induction

MIT

Today: Unsupervised Vocabulary Induction

  • Vocabulary Induction from Unsegmented Text
  • Vocabulary Induction from Speech Signal

– Sequence Alignment Algorithms

Infant Language Acquisition

(Saffran et al., 1997)

  • 8 month-old babies exposed to stream of syllables
  • Stream composed of synthetic words

(pabikumalikiwabufa)

  • After only 2 minutes of exposure, infants can

distinguish words from non-words (e.g., pabiku vs. kumali)

Vocabulary Induction

Task: Unsupervised learning of word boundary segmentation

  • Simple:

Ourenemiesareinnovativeandresourceful,andsoarewe.

Theyneverstopthinkingaboutnewwaystoharmourcountry andourpeople,andneitherdowe.

  • More ambitious:
slide-2
SLIDE 2

Word Segmentation (Ando&Lee, 2000)

Key idea: for each candidate boundary, compare the frequency of the n-grams adjacent to the proposed boundary with the frequency of the n-grams that straddle it. ?

S S t t

1 2 1 2

T I N G E V I D

t3

For N = 4, consider the 6 questions of the form: ”Is #(si) ≥ #(tj)?”, where #(x) is the number of occurrences

  • f x

Example: Is “TING” more frequent in the corpus than ”INGE”?

Algorithm for Word Segmentation

sn

1

non-straddling n-grams to the left of location k sn

2

non-straddling n-grams to the right of location k tn

j

straddling n-gram with j characters to the right of location k I≥(y, z) indicator function that is 1 when y ≥ z, and 0 otherwise.

  • 1. Calculate the fraction of affirmative answers for

each n ≤ N: vn(k) = 1 2 ∗ (n − 1)

2

  • i=1

n−1

  • j=1

I≥(#(sn

i ), #(tn j ))

  • 2. Average the contributions of each n-gram order

vN(k) = 1 N

  • n∈N

vn(k)

Algorithm for Word Segmentation (Cont.)

Place boundary at all locations l such that either:

  • l is a local maximum: vN(l) > vN(l − 1) and

vN(l) > vN(l + 1)

  • vN(l) ≥ t, a threshold parameter

A B | C D | W X | Y| Z

V (k) N t

Experimental Framework

  • Corpus: 150 megabytes of 1993 Nikkei newswire
  • Manual annotations: 50 sequences for development

set (parameter tuning) and 50 sequences for test set

  • Baseline algorithms: Chasen and Juman

morphological analyzers (115,000 and 231,000 words)

slide-3
SLIDE 3

Evaluation

  • Precision (P): the percentage of proposed brackets

that exactly match word-level brackets in the annotation

  • Recall (R): the percentage of word-level annotation

brackets that are proposed by the algorithm

  • F = 2

P R (P +R)

  • F = 82% (improvement of 1.38% over Jumann and
  • f 5.39% over Chasen)

Performance on other datasets

Cheng & Mitzenmacher Orwell(English) 79.8 Song lyrics (Romaji) 67.6 Goethe (German) 75.2 Verne (French) 72.9 Arrighi (Italian) 73.1

Today: Unsupervised Vocabulary Induction

  • Vocabulary Induction from Unsegmented Text
  • Vocabulary Induction from Speech Signal

– Sequence Alignment Algorithms

Aligning Two Sequences

Given two possibly related strings S1 and S2, find the longest common subsequence

slide-4
SLIDE 4

How can We Compute Best Alignment

  • We need a scoring system for ranking alignments

– Substitution Cost A G T C A 1 0.5

  • 1
  • 1

G

  • 0.5

1

  • 1
  • 1

T

  • 1
  • 1

+1

  • 0.5

C

  • 1
  • 1
  • 0.5

1 – Gap (insertion&deletion) Cost

Can We Simply Enumerate All Possible Alignments?

  • Naive enumeration is prohibitively expensive
  • n + m

m

  • = (m + n)!

(m!)2 ≈ 2m+n

  • (n ∗ m)

n=m Enumeration 10 184,756 20 1.4E+11 100 9.00E+58

  • Alignment using dynamic programming can be done in

O(n · m)

Key Insight: Score is Additive

Compute best alignment recursively

  • For a given aligned pair (i, j), the best alignment is:

Best alignment of S1[1 . . . i] and S2[1 . . . j] + Best alignment of S1[i . . . n] and S2[j . . . m]

Alignment Matrix

Alignment of two sequences can be modeled as a task of finding the path with the highest weight in a matrix Alignment: H E A G A W G

  • P
  • A

W

  • Corresponding Path:

H E A G A W G + + + P + + A + W + +

slide-5
SLIDE 5

Global Alignment: Needleman-Wunsch Algorithm

  • To align two strings x, y, we construct a matrix F

– F(i,j): the score of the best alignment between the initial segment s1...i of x up to xi and the initial segment y1...j of y up to yj

  • We compute F recursively: F(0, 0) = 0

−d −d s(xi,yj)

F(i−1,j) F(i,j) F(i,j−1) F(i−1,j−1)

Dynamic Programming Formulation

s(xi, yj) similarity between xi and yj d gap penalty F(i, j) = max

      

F(i − 1, j − 1) + s(xi, yj) F(i − 1, j) − d F(i, j − 1) − d Boundary conditions:

  • The top row: F(i, 0) = −id

F(i, 0) represents alignments of prefix x to all gaps in y

  • The left column: F(0, j) = −jd

Dynamic Programming Formulation

  • We know how to compute the best score

– The number at the bottom right entry (i.e., F(n, m))

  • But we need to remember where it came from

– Pointer to the choice we made at each step

  • Retrace path through the matrix

– Need to remember all the pointers Time: O(m · n)

Local alignment: Smith-Waterman Algorithm

  • Global alignment: find the best match between

sequences from one end to the other

  • Local alignment: find the best match between

subsequences of two sequences – Useful for comparing highly divergent sequences when only local similarity is expected

slide-6
SLIDE 6

Dynamic Programming Formulation

F(i, j) = max              F(i − 1, j − 1) + s(xi, yj) F(i − 1, j) − d F(i, j − 1) − d Boundary conditions: F(i, 0) = F(0, j) = 0 Finding the best local alignment

  • Find the highest value of F(i, j), and start the

traceback from there

  • The traceback ends when a cell with value 0 is found

Local vs. Global Alignment

SimilarityMatrix H E A G A W G P

  • 2
  • 1
  • 1
  • 2
  • 1
  • 4
  • 2

A

  • 2
  • 1

5 5

  • 3

W

  • 3
  • 3
  • 3
  • 3
  • 3

15

  • 3

GlobalAlignment H E A G A W G

  • 8
  • 16
  • 24
  • 32
  • 40
  • 48
  • 56

P

  • 8
  • 2
  • 9
  • 17
  • 25
  • 33
  • 42
  • 49

A

  • 16
  • 10
  • 3
  • 4
  • 12
  • 20
  • 28
  • 36

W

  • 24
  • 18
  • 11
  • 6
  • 7
  • 15
  • 5
  • 13

LocalAlignment H E A G A W G P A 5 5 W 2 20 12

Today: Unsupervised Vocabulary Induction

  • Vocabulary Induction from Unsegmented Text
  • Vocabulary Induction from Speech Signal

– Sequence Alignment Algorithms

Finding Words in Speech

  • Traditional approached to speech recognition are

supervised: – Recognizers are trained using a large corpus of speech with corresponding transcripts – During the training process, a recognizer is provided with a vocabulary

  • Is it possible to learn vocabulary directly from

speech signal?

slide-7
SLIDE 7

Vocabulary Induction: Outline Comparing Acoustic Signals Spectral Vectors

  • Spectral vector is a vector where each component is

a measure of energy in a particular frequency band

  • We divide acoustic signal (a one dimensional wave

form) into short overlapping intervals (25 msec with 15 msec overlap)

  • We convert each overlapping window using Fourier

transform

Example of Spectral Vectors

were willing to put nash’s schizophrenia

  • n

record Time (sec) Freq (Hz) 0.5 1 1.5 2 2.5 3 3.5 2000 4000 6000 he too was diagnosed with paranoid schizophrenia Time (sec) Freq (Hz) 0.5 1 1.5 2 2.5 3 3.5 4 2000 4000 6000

slide-8
SLIDE 8

Comparing Spectral Vectors

  • Divide acoustic signal to “word segments” based on

pauses

  • Compute spectral vectors for each segment
  • Build a distance matrix for each pair of “word

segments” – use Euclidean distance to compare between spectral vectors

Example of Distance Matrix Computing Local Alignment Clustering Similar Utterance

slide-9
SLIDE 9

Examples of Computed Clusters