MoL Guest Lecture Computational Models of Language Learning
Computational Models of Language Learning Jelle Zuidema Institute - - PowerPoint PPT Presentation
Computational Models of Language Learning Jelle Zuidema Institute - - PowerPoint PPT Presentation
Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language
MoL Guest Lecture Computational Models of Language Learning
Plan for today
- Introduction: Grammars in cognitive science and
language technology
- What kind of grammars do we need? A quick intro
to probabilistic grammars
- How do we learn them? A quick intro to statistical
inference
- Efficiency
- Accuracy
2;5 *CHI: seen one those . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a bandana . 3;0 *CHI: I never seen a monkey train . 3;0 *CHI: I never seen a tree dance . 3;2 *CHI: I never seen a duck like that # riding@o on a pony . 3;2 *CHI: I never seen (a)bout dat [: that] . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen a Sky_Dart . 3;5 *CHI: I never seen this before . 3;8 *CHI: yeah # I seen carpenters too . 3;8 *CHI: where had you seen carpenters do that ? 3;8 *CHI: I never seen her . 3;8 *CHI: I never seen people wear de [: the] fish flies . 3;8 *CHI: where have you seen a whale ? 3;8 *CHI: I never seen a bird talk . 3;11 *CHI: I never seen a kangaroo knit . 3;11 *CHI: I never seen dat [: that] to play . 3;11 *CHI: I never seen a dog play a piano # have you ? 3;11 *CHI: I never seen a rhinoceros eat with a hands . 4;7 *CHI: I seen one in the store some days .
Adam, 3;11.01
Grammar in child language
MacWhinney et al. (1983) Sagae et al. (2007) Borensztajn, Zuidema & Bod (CogSci, 2008)
MoL Guest Lecture Computational Models of Language Learning
Grammar in NLP applications
- E.g., speech recognition
– please, right this down – write now – who's write, and who's wrong
- E.g., anaphora resolution
– Mary didn't know who John was married to. He told her, and it turned out, she already knew her.
- E.g., machine translation
MoL Guest Lecture Computational Models of Language Learning
Steedman, 2008, CL
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Learning grammars from data
- Syntactically annotated corpora
– Penn WSJ Treebank trainset: 38k sentences, ~1M words – Tuebingen spoken/written English/German – Corpus Gesproken Nederlands
- Unannotated corpora
– the web ... – Google's ngram corpora
MoL Guest Lecture Computational Models of Language Learning
www.culturomics.org Penn WSJ: 0 counts.
Spam
MoL Guest Lecture Computational Models of Language Learning
www.culturomics.org Penn WSJ: 0 counts.
Kick the bucket
MoL Guest Lecture Computational Models of Language Learning
www.culturomics.org Penn WSJ: 0 counts.
... know but were afraid to ...
MoL Guest Lecture Computational Models of Language Learning
Probabilistic Grammar Paradigm
- Generative models define the process by
which sentences are generated, and assign probabilities to sentences.
- Statistical inference lets us search through
the space of possible generative models.
- Empirical evaluation against a manually
written 'gold standard' allows us to more-or- less objectively compare different models.
MoL Guest Lecture Computational Models of Language Learning
A very brief tour of generative models
MoL Guest Lecture Computational Models of Language Learning
Sequences: e.g., Hidden Markov Model
MoL Guest Lecture Computational Models of Language Learning
Syntax: e.g., Probabilistic Contextfree Grammars
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Semantics: e.g. Discourse Representation Structure
- “It is not clear”
negation present tense agent anaphor resolution
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Semantics, e.g. Discourse Representation Structure
(Le & Zuidema, 2012, Coling)
MoL Guest Lecture Computational Models of Language Learning
A very brief tour of statistical learning
MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule
P(D|G) P(G) P(D) P(G|D)=
MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule
P(D|G) P(G) P(D)
posterior likelihood probability of data prior
P(G|D)=
MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule
P(D|G) P(G) P(D) P(G|D)=
posterior likelihood probability of data prior
G G G
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(G|D) G
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(D|G) P(G) P(D) P(G|D)=
G G P(G|D) P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(D|G) P(G) P(D) P(G|D)=
P(G|D) P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
Bayesian inversion Generative model
P(G|D) P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing
MoL Guest Lecture Computational Models of Language Learning
Local optimum
P(G|D)
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(G|D)
Bayesian inversion Generative model
P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(G|D)
Bayesian inversion Generative model
P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Statistical inference
P(G|D)
Bayesian inversion Generative model
P(D|G)
MoL Guest Lecture Computational Models of Language Learning
MAP
P(G|D)
Bayesian inversion Generative model
P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Maximum likelihood
P(D|G)
MoL Guest Lecture Computational Models of Language Learning
Learning a grammar
- Choose a generative model
– HMM, PCFG, PTSG, PTAG, …
- Choose an objective function
– Maximum Likelihood, Bayesian …
- Choose an optimization strategy
– Stochastic hillclimbing
- Choose a dataset
– Penn WSJ treebank
- Find the generative model that maximizes
the objective function on the dataset!
MoL Guest Lecture Computational Models of Language Learning
Two major issues for research
- Efficiency: How can we optimize our
- bjective functions given exponentially
many grammars that assign exponentially many analyses to sentences?
- Accuracy: Which combination of generative
models, objective functions and efficiency heuristics actually works best?
Does it work in practice?
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Evaluation
Treebank Parse
MoL Guest Lecture Computational Models of Language Learning
Evaluation
Treebank Parse The screen was a sea of red
MoL Guest Lecture Computational Models of Language Learning
Evaluation
Treebank Parse Unsupervised Parse The screen was a sea of red induction/parsing
MoL Guest Lecture Computational Models of Language Learning
Evaluation
Treebank Parse Unsupervised Parse The screen was a sea of red induction/parsing
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
Evaluation
- Precision: number of constituents in the
unsupervised parse that are also in the treebank parse;
- Recall: number of constituents in the
treebank parse that are also in the unsupervised parse;
- F-score: geometric mean, i.e.
F=2*P*R / (P+R)
- Labels usually ignored.
correctness completeness
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Learning more
- 'Speech and Language Processing', Jurafsky
& Martin (2009, 2nd ed)
- Take the course 'Unsupervised Language
Learning', in February/March 2012
- Contact me at: zuidema@uva.nl
Credits:
- Slides on PCFGs and Inside-Outside by Mark Johnson
- Figure to illustrate precision/recall by Dan Klein