Computational Models of Language Learning Jelle Zuidema Institute - - PowerPoint PPT Presentation

computational models of language learning
SMART_READER_LITE
LIVE PREVIEW

Computational Models of Language Learning Jelle Zuidema Institute - - PowerPoint PPT Presentation

Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language


slide-1
SLIDE 1

MoL Guest Lecture Computational Models of Language Learning

Computational Models of Language Learning

Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic

slide-2
SLIDE 2

MoL Guest Lecture Computational Models of Language Learning

Plan for today

  • Introduction: Grammars in cognitive science and

language technology

  • What kind of grammars do we need? A quick intro

to probabilistic grammars

  • How do we learn them? A quick intro to statistical

inference

  • Efficiency
  • Accuracy
slide-3
SLIDE 3

2;5 *CHI: seen one those . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a bandana . 3;0 *CHI: I never seen a monkey train . 3;0 *CHI: I never seen a tree dance . 3;2 *CHI: I never seen a duck like that # riding@o on a pony . 3;2 *CHI: I never seen (a)bout dat [: that] . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen a Sky_Dart . 3;5 *CHI: I never seen this before . 3;8 *CHI: yeah # I seen carpenters too . 3;8 *CHI: where had you seen carpenters do that ? 3;8 *CHI: I never seen her . 3;8 *CHI: I never seen people wear de [: the] fish flies . 3;8 *CHI: where have you seen a whale ? 3;8 *CHI: I never seen a bird talk . 3;11 *CHI: I never seen a kangaroo knit . 3;11 *CHI: I never seen dat [: that] to play . 3;11 *CHI: I never seen a dog play a piano # have you ? 3;11 *CHI: I never seen a rhinoceros eat with a hands . 4;7 *CHI: I seen one in the store some days .

slide-4
SLIDE 4

Adam, 3;11.01

Grammar in child language

MacWhinney et al. (1983) Sagae et al. (2007) Borensztajn, Zuidema & Bod (CogSci, 2008)

slide-5
SLIDE 5

MoL Guest Lecture Computational Models of Language Learning

Grammar in NLP applications

  • E.g., speech recognition

– please, right this down – write now – who's write, and who's wrong

  • E.g., anaphora resolution

– Mary didn't know who John was married to. He told her, and it turned out, she already knew her.

  • E.g., machine translation
slide-6
SLIDE 6

MoL Guest Lecture Computational Models of Language Learning

Steedman, 2008, CL

slide-7
SLIDE 7

MoL Guest Lecture Computational Models of Language Learning

slide-8
SLIDE 8

MoL Guest Lecture Computational Models of Language Learning

slide-9
SLIDE 9

MoL Guest Lecture Computational Models of Language Learning

Learning grammars from data

  • Syntactically annotated corpora

– Penn WSJ Treebank trainset: 38k sentences, ~1M words – Tuebingen spoken/written English/German – Corpus Gesproken Nederlands

  • Unannotated corpora

– the web ... – Google's ngram corpora

slide-10
SLIDE 10

MoL Guest Lecture Computational Models of Language Learning

www.culturomics.org Penn WSJ: 0 counts.

Spam

slide-11
SLIDE 11

MoL Guest Lecture Computational Models of Language Learning

www.culturomics.org Penn WSJ: 0 counts.

Kick the bucket

slide-12
SLIDE 12

MoL Guest Lecture Computational Models of Language Learning

www.culturomics.org Penn WSJ: 0 counts.

... know but were afraid to ...

slide-13
SLIDE 13

MoL Guest Lecture Computational Models of Language Learning

Probabilistic Grammar Paradigm

  • Generative models define the process by

which sentences are generated, and assign probabilities to sentences.

  • Statistical inference lets us search through

the space of possible generative models.

  • Empirical evaluation against a manually

written 'gold standard' allows us to more-or- less objectively compare different models.

slide-14
SLIDE 14

MoL Guest Lecture Computational Models of Language Learning

A very brief tour of generative models

slide-15
SLIDE 15

MoL Guest Lecture Computational Models of Language Learning

Sequences: e.g., Hidden Markov Model

slide-16
SLIDE 16

MoL Guest Lecture Computational Models of Language Learning

Syntax: e.g., Probabilistic Contextfree Grammars

slide-17
SLIDE 17

MoL Guest Lecture Computational Models of Language Learning

slide-18
SLIDE 18

MoL Guest Lecture Computational Models of Language Learning

Semantics: e.g. Discourse Representation Structure

  • “It is not clear”

negation present tense agent anaphor resolution

slide-19
SLIDE 19

MoL Guest Lecture Computational Models of Language Learning

slide-20
SLIDE 20

MoL Guest Lecture Computational Models of Language Learning

Semantics, e.g. Discourse Representation Structure

(Le & Zuidema, 2012, Coling)

slide-21
SLIDE 21

MoL Guest Lecture Computational Models of Language Learning

A very brief tour of statistical learning

slide-22
SLIDE 22

MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule

P(D|G) P(G) P(D) P(G|D)=

slide-23
SLIDE 23

MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule

P(D|G) P(G) P(D)

posterior likelihood probability of data prior

P(G|D)=

slide-24
SLIDE 24

MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule

P(D|G) P(G) P(D) P(G|D)=

posterior likelihood probability of data prior

G G G

slide-25
SLIDE 25

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(G|D) G

slide-26
SLIDE 26

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(D|G) P(G) P(D) P(G|D)=

G G P(G|D) P(D|G)

slide-27
SLIDE 27

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(D|G) P(G) P(D) P(G|D)=

P(G|D) P(D|G)

slide-28
SLIDE 28

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

Bayesian inversion Generative model

P(G|D) P(D|G)

slide-29
SLIDE 29

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

P(G|D)

slide-30
SLIDE 30

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

P(G|D)

slide-31
SLIDE 31

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

P(G|D)

slide-32
SLIDE 32

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

P(G|D)

slide-33
SLIDE 33

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

P(G|D)

slide-34
SLIDE 34

MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing

slide-35
SLIDE 35

MoL Guest Lecture Computational Models of Language Learning

Local optimum

P(G|D)

slide-36
SLIDE 36

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(G|D)

Bayesian inversion Generative model

P(D|G)

slide-37
SLIDE 37

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(G|D)

Bayesian inversion Generative model

P(D|G)

slide-38
SLIDE 38

MoL Guest Lecture Computational Models of Language Learning

Statistical inference

P(G|D)

Bayesian inversion Generative model

P(D|G)

slide-39
SLIDE 39

MoL Guest Lecture Computational Models of Language Learning

MAP

P(G|D)

Bayesian inversion Generative model

P(D|G)

slide-40
SLIDE 40

MoL Guest Lecture Computational Models of Language Learning

Maximum likelihood

P(D|G)

slide-41
SLIDE 41

MoL Guest Lecture Computational Models of Language Learning

Learning a grammar

  • Choose a generative model

– HMM, PCFG, PTSG, PTAG, …

  • Choose an objective function

– Maximum Likelihood, Bayesian …

  • Choose an optimization strategy

– Stochastic hillclimbing

  • Choose a dataset

– Penn WSJ treebank

  • Find the generative model that maximizes

the objective function on the dataset!

slide-42
SLIDE 42

MoL Guest Lecture Computational Models of Language Learning

Two major issues for research

  • Efficiency: How can we optimize our
  • bjective functions given exponentially

many grammars that assign exponentially many analyses to sentences?

  • Accuracy: Which combination of generative

models, objective functions and efficiency heuristics actually works best?

Does it work in practice?

slide-43
SLIDE 43

MoL Guest Lecture Computational Models of Language Learning

slide-44
SLIDE 44

MoL Guest Lecture Computational Models of Language Learning

slide-45
SLIDE 45

MoL Guest Lecture Computational Models of Language Learning

slide-46
SLIDE 46

MoL Guest Lecture Computational Models of Language Learning

slide-47
SLIDE 47

MoL Guest Lecture Computational Models of Language Learning

Evaluation

Treebank Parse

slide-48
SLIDE 48

MoL Guest Lecture Computational Models of Language Learning

Evaluation

Treebank Parse The screen was a sea of red

slide-49
SLIDE 49

MoL Guest Lecture Computational Models of Language Learning

Evaluation

Treebank Parse Unsupervised Parse The screen was a sea of red induction/parsing

slide-50
SLIDE 50

MoL Guest Lecture Computational Models of Language Learning

Evaluation

Treebank Parse Unsupervised Parse The screen was a sea of red induction/parsing

slide-51
SLIDE 51

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-52
SLIDE 52

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-53
SLIDE 53

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-54
SLIDE 54

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-55
SLIDE 55

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-56
SLIDE 56

MoL Guest Lecture Computational Models of Language Learning

Evaluation

  • Precision: number of constituents in the

unsupervised parse that are also in the treebank parse;

  • Recall: number of constituents in the

treebank parse that are also in the unsupervised parse;

  • F-score: geometric mean, i.e.

F=2*P*R / (P+R)

  • Labels usually ignored.

correctness completeness

slide-57
SLIDE 57

MoL Guest Lecture Computational Models of Language Learning

slide-58
SLIDE 58

MoL Guest Lecture Computational Models of Language Learning

Learning more

  • 'Speech and Language Processing', Jurafsky

& Martin (2009, 2nd ed)

  • Take the course 'Unsupervised Language

Learning', in February/March 2012

  • Contact me at: zuidema@uva.nl

Credits:

  • Slides on PCFGs and Inside-Outside by Mark Johnson
  • Figure to illustrate precision/recall by Dan Klein