Unsupervised Vocabulary Induction
MIT
Today: Unsupervised Vocabulary Induction
- Vocabulary Induction from Unsegmented Text
- Vocabulary Induction from Speech Signal
– Sequence Alignment Algorithms
Infant Language Acquisition
(Saffran et al., 1997)
- 8 month-old babies exposed to stream of syllables
- Stream composed of synthetic words
(pabikumalikiwabufa)
- After only 2 minutes of exposure, infants can
distinguish words from non-words (e.g., pabiku vs. kumali)
Vocabulary Induction
Task: Unsupervised learning of word boundary segmentation
- Simple:
Ourenemiesareinnovativeandresourceful,andsoarewe.
Theyneverstopthinkingaboutnewwaystoharmourcountry andourpeople,andneitherdowe.
- More ambitious: