Lecture 18
Natural Language Processing
Marco Chiarandini
Department of Mathematics & Computer Science University of Southern Denmark
Natural Language Processing Marco Chiarandini Department of - - PowerPoint PPT Presentation
Lecture 18 Natural Language Processing Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Dan Klein at Berkeley Recap Speech Recognition Course Overview Machine Translation
Department of Mathematics & Computer Science University of Southern Denmark
Recap Speech Recognition Machine Translation
◮ Games and Adversarial Search
◮ Minimax search and
◮ Multiagent search
◮ Knowledge representation and
◮ Propositional logic ◮ First order logic ◮ Inference ◮ Plannning 2
Recap Speech Recognition Machine Translation
3
Recap Speech Recognition Machine Translation
4
Recap Speech Recognition Machine Translation
5
Recap Speech Recognition Machine Translation
◮ State trellis: graph of states and transitions over time ◮ Each arc represents some transition xt−1 → xt ◮ Each arc has weight Pr(xt | xt−1) Pr(et | xt) ◮ Each path is a sequence of states ◮ The product of weights on a path is the seq’s probability ◮ Can think of the Forward (and now Viterbi) algorithms as computing
6
Recap Speech Recognition Machine Translation
7
Recap Speech Recognition Machine Translation
8
Recap Speech Recognition Machine Translation
◮ 100.000 years ago humans started to speak ◮ 7.000 years ago humans started to write
◮ acquire information ◮ communicate with humans
9
Recap Speech Recognition Machine Translation
◮ Speech technologies
◮ Automatic speech recognition (ASR) ◮ Text-to-speech synthesis (TTS) ◮ Dialog systems
◮ Language processing technologies
◮ Machine translation ◮ Information extraction ◮ Web search, question answering ◮ Text classification, spam filtering, etc. 10
Recap Speech Recognition Machine Translation
11
Recap Speech Recognition Machine Translation
12
Recap Speech Recognition Machine Translation
13
Recap Speech Recognition Machine Translation
14
Recap Speech Recognition Machine Translation
◮ Pr(E|X) encodes which acoustic vectors are appropriate for each
◮ Pr(X|X ′) encodes how sounds can be strung together ◮ We will have one state for each sound in each word ◮ From some state x, can only:
◮ Stay in the same state (e.g. speaking slowly) ◮ Move to the next position in the word ◮ At the end of the word, move to the start of the next word
◮ We build a little state graph for each word and chain them together to
15
Recap Speech Recognition Machine Translation
16
Recap Speech Recognition Machine Translation
17
Recap Speech Recognition Machine Translation
◮ While there are some practical issues, finding the words given the
◮ We want to know which state sequence x1:T is most likely given the
◮ From the sequence x, we can simply read off the words
18
Recap Speech Recognition Machine Translation
19
Recap Speech Recognition Machine Translation
◮ Fundamental goal: analyze and process human language, broadly, robustly,
◮ End systems that we want to build:
20
Recap Speech Recognition Machine Translation
◮ Language defined by a sequence of strings and rules called grammars. ◮ Formal languages also need semantics that define meaning. ◮ Natural Languages:
21
Recap Speech Recognition Machine Translation
◮ n-gram sequence of n characters or sequence of n words, syllables ◮ n-gram models: define probability distributions for these sequences ◮ n-gram model is defined as a Markov chain of order n − 1.
N
N
◮ 100 chars millions of entries ◮ with words even worse ◮ Corpus body of text
22
Recap Speech Recognition Machine Translation
N
23
Recap Speech Recognition Machine Translation
24
Recap Speech Recognition Machine Translation
25
Recap Speech Recognition Machine Translation
26
Recap Speech Recognition Machine Translation
◮ Interlingual model: the source language, i.e. the text to be translated is
◮ Transfer model: the source language is transformed into an abstract, less
◮ Direct model: words are translated directly without passing through an
27
Recap Speech Recognition Machine Translation
28
Recap Speech Recognition Machine Translation
29
Recap Speech Recognition Machine Translation
30
Recap Speech Recognition Machine Translation
32
Recap Speech Recognition Machine Translation
◮ e sequence of strings in English ◮ f sequence of strings in French
◮ Pr(e | f ) learned from bilingual (parallel) corpus made of phrases seen
33
Recap Speech Recognition Machine Translation
n
34
Recap Speech Recognition Machine Translation
35
Recap Speech Recognition Machine Translation
36
Recap Speech Recognition Machine Translation
37
Recap Speech Recognition Machine Translation
39
Recap Speech Recognition Machine Translation
◮ context free grammars (see
◮ probabilistic context free grammars ◮ lexicalized probabilistic context free grammars
40
Recap Speech Recognition Machine Translation
41
Recap Speech Recognition Machine Translation
42
Recap Speech Recognition Machine Translation
43
Recap Speech Recognition Machine Translation
◮ Translate text from one language to another ◮ Recombines fragments of example translations ◮ Challenges:
◮ What fragments? [learning to translate] ◮ How to make efficient? [fast translation search] 44
Recap Speech Recognition Machine Translation
◮ After a first bubble now full speed in the sector ◮ In spite of the economical crisis 7% growth on world basis ◮ Commercial and technological focus ◮ Danish is a marginal language and existing systems cannot be applied
◮ www.eicom.dk and www.oversaetterhuset.dk search development in
45
Recap Speech Recognition Machine Translation
◮ Visual Interactive Syntax Learning project at the Institute for Language
◮ Eckhard Bick project leader
46