Natural Language Processing Marco Chiarandini Department of - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing Marco Chiarandini Department of - - PowerPoint PPT Presentation

Lecture 18 Natural Language Processing Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Dan Klein at Berkeley Recap Speech Recognition Course Overview Machine Translation


slide-1
SLIDE 1

Lecture 18

Natural Language Processing

Marco Chiarandini

Department of Mathematics & Computer Science University of Southern Denmark

Slides by Dan Klein at Berkeley

slide-2
SLIDE 2

Recap Speech Recognition Machine Translation

Course Overview

✔ Introduction

✔ Artificial Intelligence ✔ Intelligent Agents

✔ Search

✔ Uninformed Search ✔ Heuristic Search

✔ Uncertain knowledge and Reasoning

✔ Probability and Bayesian approach ✔ Bayesian Networks ✔ Hidden Markov Chains ✔ Kalman Filters

✔ Learning

✔ Supervised Decision Trees, Neural Networks Learning Bayesian Networks ✔ Unsupervised EM Algorithm

✔ Reinforcement Learning

◮ Games and Adversarial Search

◮ Minimax search and

Alpha-beta pruning

◮ Multiagent search

◮ Knowledge representation and

Reasoning

◮ Propositional logic ◮ First order logic ◮ Inference ◮ Plannning 2

slide-3
SLIDE 3

Recap Speech Recognition Machine Translation

Outline

  • 1. Recap
  • 2. Speech Recognition
  • 3. Machine Translation

Statistical MT Rule-based MT

3

slide-4
SLIDE 4

Recap Speech Recognition Machine Translation

Recap: Sequential data

4

slide-5
SLIDE 5

Recap Speech Recognition Machine Translation

Recap: Filtering

5

slide-6
SLIDE 6

Recap Speech Recognition Machine Translation

Recap: State Trellis

◮ State trellis: graph of states and transitions over time ◮ Each arc represents some transition xt−1 → xt ◮ Each arc has weight Pr(xt | xt−1) Pr(et | xt) ◮ Each path is a sequence of states ◮ The product of weights on a path is the seq’s probability ◮ Can think of the Forward (and now Viterbi) algorithms as computing

sums of all paths (best paths) in this graph

6

slide-7
SLIDE 7

Recap Speech Recognition Machine Translation

Recap: Forward/Viterbi

7

slide-8
SLIDE 8

Recap Speech Recognition Machine Translation

Recap: Particle Filtering

Particles: track samples of states rather than an explicit distribution

8

slide-9
SLIDE 9

Recap Speech Recognition Machine Translation

Natural Language

◮ 100.000 years ago humans started to speak ◮ 7.000 years ago humans started to write

Machines process natural language to:

◮ acquire information ◮ communicate with humans

9

slide-10
SLIDE 10

Recap Speech Recognition Machine Translation

Natural Language Processing

◮ Speech technologies

◮ Automatic speech recognition (ASR) ◮ Text-to-speech synthesis (TTS) ◮ Dialog systems

◮ Language processing technologies

◮ Machine translation ◮ Information extraction ◮ Web search, question answering ◮ Text classification, spam filtering, etc. 10

slide-11
SLIDE 11

Recap Speech Recognition Machine Translation

Outline

  • 1. Recap
  • 2. Speech Recognition
  • 3. Machine Translation

Statistical MT Rule-based MT

11

slide-12
SLIDE 12

Recap Speech Recognition Machine Translation

Digitalizing Speech

Speech input is an acoustic wave form

12

slide-13
SLIDE 13

Recap Speech Recognition Machine Translation

Spectral Analysis

13

slide-14
SLIDE 14

Recap Speech Recognition Machine Translation

Acoustic Feature Sequence

14

slide-15
SLIDE 15

Recap Speech Recognition Machine Translation

State Space

◮ Pr(E|X) encodes which acoustic vectors are appropriate for each

phoneme (each kind of sound)

◮ Pr(X|X ′) encodes how sounds can be strung together ◮ We will have one state for each sound in each word ◮ From some state x, can only:

◮ Stay in the same state (e.g. speaking slowly) ◮ Move to the next position in the word ◮ At the end of the word, move to the start of the next word

◮ We build a little state graph for each word and chain them together to

form our state space X

15

slide-16
SLIDE 16

Recap Speech Recognition Machine Translation

HMM for speech

16

slide-17
SLIDE 17

Recap Speech Recognition Machine Translation

Transition with Bigrams

17

slide-18
SLIDE 18

Recap Speech Recognition Machine Translation

Decoding

◮ While there are some practical issues, finding the words given the

acoustics is an HMM inference problem

◮ We want to know which state sequence x1:T is most likely given the

evidence e1:T:

◮ From the sequence x, we can simply read off the words

18

slide-19
SLIDE 19

Recap Speech Recognition Machine Translation

Outline

  • 1. Recap
  • 2. Speech Recognition
  • 3. Machine Translation

Statistical MT Rule-based MT

19

slide-20
SLIDE 20

Recap Speech Recognition Machine Translation

Machine Translation

◮ Fundamental goal: analyze and process human language, broadly, robustly,

accurately...

◮ End systems that we want to build:

Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering... Modest: spelling correction, text categorization, language recognition, genre classification.

20

slide-21
SLIDE 21

Recap Speech Recognition Machine Translation

Language Models

◮ Language defined by a sequence of strings and rules called grammars. ◮ Formal languages also need semantics that define meaning. ◮ Natural Languages:

  • 1. not definitive: is disagreement with grammar rules

“Not to be invited is sad” “To be not invited is sad”

  • 2. ambiguous:

“Entire store 25% off” “I will bring my bike tomorrow if it looks nice in the morning.”

  • 3. large and constantly changing

21

slide-22
SLIDE 22

Recap Speech Recognition Machine Translation

◮ n-gram sequence of n characters or sequence of n words, syllables ◮ n-gram models: define probability distributions for these sequences ◮ n-gram model is defined as a Markov chain of order n − 1.

For a trigram: p(ci | c1:i−1) = p(ci | ci−2:i−1) p(c1:N) =

N

  • i=1

Pr(ci | c1:i−1) =

N

  • i=1

Pr(ci | ci−2:i−1)

◮ 100 chars millions of entries ◮ with words even worse ◮ Corpus body of text

22

slide-23
SLIDE 23

Recap Speech Recognition Machine Translation

Language identification

Learned from corpus: p(ci | ci−2:i−1, l) Most probable language: l∗ = argmaxl p(l | c1:N) = argmaxl p(l)p(c1:N | l) (Bayes) = argmaxl p(l)

N

  • i=1

p(ci | ci−2:i−1, l) (Markov property) Computers can reach 99% accuracy

23

slide-24
SLIDE 24

Recap Speech Recognition Machine Translation

Machine Translation

Rough translation: gives the main point but contains errors Pre-edited translation: original text written in constrained language easier to translate automatically Restricted-source translation: fully automatic but only on technical content as e.g. weather forecast

24

slide-25
SLIDE 25

Recap Speech Recognition Machine Translation

Machine Translation Systems

Very simplified there are three types of machine translation Statistical machine translation (SMT) learn relational dependencies of features such as grams, lemmas, etc. • Requires large data sets

  • Example: google translate • Relatively easy to implement

Rule-based machine translation (RBMT) use grammatical rules and language constructions to analyze syntax and semantics • Use moderate size data sets • Long development time and expertise Hybrid machine translation either construct from RBMT and use SMT to post-process and optimize the result • Or use grammatical rules to derive further features to then be fed in the statistical learning machine • New direction of research.

25

slide-26
SLIDE 26

Recap Speech Recognition Machine Translation

Brief History

26

slide-27
SLIDE 27

Recap Speech Recognition Machine Translation

◮ Interlingual model: the source language, i.e. the text to be translated is

transformed into an interlingua, i.e., an abstract language-independent

  • representation. The target language is then generated from the

interlingua.

◮ Transfer model: the source language is transformed into an abstract, less

language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.

◮ Direct model: words are translated directly without passing through an

additional representation.

27

slide-28
SLIDE 28

Recap Speech Recognition Machine Translation

Levels of Transfer

Interlingua Semantics Attraction(NamedJohn, NamedMary, High) English Words John loves Mary French Words Jean aime Marie English Syntax S(NP(John), VP(loves, NP(Mary))) S(NP(Jean), VP(aime, NP(Marie))) French Syntax English Semantics Loves(John, Mary) Aime(Jean, Marie) French Semantics

Vauquois pyramid

28

slide-29
SLIDE 29

Recap Speech Recognition Machine Translation

Levels of Transfer

29

slide-30
SLIDE 30

Recap Speech Recognition Machine Translation

The problem with dictionary look ups

30

slide-31
SLIDE 31

Recap Speech Recognition Machine Translation

Statistical machine translation

Data driven MT

32

slide-32
SLIDE 32

Recap Speech Recognition Machine Translation

◮ e sequence of strings in English ◮ f sequence of strings in French

f ∗ = argmaxf Pr(f | e) = argmaxf Pr(e | f ) Pr(f )

◮ Pr(e | f ) learned from bilingual (parallel) corpus made of phrases seen

before

33

slide-33
SLIDE 33

Recap Speech Recognition Machine Translation

There is a smelly wumpus sleeping in 2 2 Il y a un wumpus qui dort malodorant à 2 2 e1 e2 e3 e4 e5 d1 = 0 d3 = -2 d2 = +1 d4 = +1 d5 = 0 f1 f3 f2 f4 f5

Given English sentence e find French sentence f ∗:

  • 1. break English e into phrases e1, . . . , en
  • 2. ∀ei choose the French fi: Pr(fi | ei)
  • 3. choose a permutation of phrases f1, . . . , fn

∀fi choose distortion di: num. of words that phrase fi has moved wrt fi−1 Pr(f , d | e) =

n

  • i=1

Pr(fi | ei) Pr(di) with 100 French phrases for a 5-gram English there are 1005 different 5-gram and 5! reorderings.

34

slide-34
SLIDE 34

Recap Speech Recognition Machine Translation

Learn probabilities

  • 1. Parallel corpus: parliamentary debates, web pages
  • 2. Segment into sentences. Periods are good indicators with some care.
  • 3. Align sentences. length of sentences is an indicator, landmarks another
  • 4. Align phrases within sentence: iterative process,aggregation of evidence,

no other pair appear so frequently in the corpus. Pr(fi | ei)

  • 5. Extract distortions: count how often distortions appear in the corpus

after phrase alignment (smoothing)

  • 6. Improve estimates of Pr(f | e) and Pr(d) with EM.

35

slide-35
SLIDE 35

Recap Speech Recognition Machine Translation

Learning to translate

36

slide-36
SLIDE 36

Recap Speech Recognition Machine Translation

An HMM model

37

slide-37
SLIDE 37

Recap Speech Recognition Machine Translation

Machine translation systems

39

slide-38
SLIDE 38

Recap Speech Recognition Machine Translation

Grammars

Grammars: set of rules (from left to right) that describe how to form strings from the language’s alphabet that are valid according to the language’s syntax (Language generator). Parsing is the process of recognizing a string in natural languages by breaking it down to a set of symbols and analyzing each one against the grammar of the language, ie, determining whether the string belongs to the language or is grammatically incorrect. The result is a parse tree.

◮ context free grammars (see

http://en.wikipedia.org/wiki/Chomsky_hierarchy)

◮ probabilistic context free grammars ◮ lexicalized probabilistic context free grammars

40

slide-39
SLIDE 39

Recap Speech Recognition Machine Translation

Parsing as search

41

slide-40
SLIDE 40

Recap Speech Recognition Machine Translation

Probabilistic Context Free Grammars

42

slide-41
SLIDE 41

Recap Speech Recognition Machine Translation

Hybrid Systems

The translated sentence can be checked against a monolingual corpus.

43

slide-42
SLIDE 42

Recap Speech Recognition Machine Translation

Machine Translation

◮ Translate text from one language to another ◮ Recombines fragments of example translations ◮ Challenges:

◮ What fragments? [learning to translate] ◮ How to make efficient? [fast translation search] 44

slide-43
SLIDE 43

Recap Speech Recognition Machine Translation

Machine Translation

◮ After a first bubble now full speed in the sector ◮ In spite of the economical crisis 7% growth on world basis ◮ Commercial and technological focus ◮ Danish is a marginal language and existing systems cannot be applied

reliably

◮ www.eicom.dk and www.oversaetterhuset.dk search development in

collaboration with research institutions (SDU, CBS, ASB)

45

slide-44
SLIDE 44

Recap Speech Recognition Machine Translation

Announcement

Need for human resources, possibilties for thesis and individual study activities together with:

◮ Visual Interactive Syntax Learning project at the Institute for Language

and Communication of SDU http://beta.visl.sdu.dk/constraint_grammar.html

◮ Eckhard Bick project leader

http://en.wikipedia.org/wiki/Eckhard_Bick If interested contact me.

46