Language Modeling
- Prof. Sameer Singh
CS 295: STATISTICAL NLP WINTER 2017
January 24, 2017
Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from.
Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP - - PowerPoint PPT Presentation
Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 24, 2017 Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from. Outline Wrapup Word Embeddings Introduction to Language Models
January 24, 2017
Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from.
CS 295: STATISTICAL NLP (WINTER 2017) 2
CS 295: STATISTICAL NLP (WINTER 2017) 3
CS 295: STATISTICAL NLP (WINTER 2017) 4
A bottle of tezguino is on the table. u v
CS 295: STATISTICAL NLP (WINTER 2017) 5
CS 295: STATISTICAL NLP (WINTER 2017) 6
CS 295: STATISTICAL NLP (WINTER 2017) 7
Variations
Uses
CS 295: STATISTICAL NLP (WINTER 2017) 8
CS 295: STATISTICAL NLP (WINTER 2017) 9
Probability of a Sentence
Probability of the Next Word
CS 295: STATISTICAL NLP (WINTER 2017) 10
“eyes awe of an” “I saw a van”
OR
CS 295: STATISTICAL NLP (WINTER 2017) 11
CS 295: STATISTICAL NLP (WINTER 2017) 12
http://www.cedar.buffalo.edu/handwriting/HRoverview.html
CS 295: STATISTICAL NLP (WINTER 2017) 13
CS 295: STATISTICAL NLP (WINTER 2017) 14
The office is about fifteen minuets from my house P(about fifteen minutes from) >> P(about fifteen minuets from)
CS 295: STATISTICAL NLP (WINTER 2017) 15
Summarization Question Answering Dialog Systems
CS 295: STATISTICAL NLP (WINTER 2017) 16
Best choice: Extrinsic 2nd choice: Intrinsic
CS 295: STATISTICAL NLP (WINTER 2017) 17
CS 295: STATISTICAL NLP (WINTER 2017) 18
CS 295: STATISTICAL NLP (WINTER 2017) 19
CS 295: STATISTICAL NLP (WINTER 2017) 20
P(“I do not like green eggs and ham”) P(w | “I do not like green eggs and ”)
CS 295: STATISTICAL NLP (WINTER 2017) 21
CS 295: STATISTICAL NLP (WINTER 2017)
22
CS 295: STATISTICAL NLP (WINTER 2017) 23
CS 295: STATISTICAL NLP (WINTER 2017) 24
CS 295: STATISTICAL NLP (WINTER 2017) 25
CS 295: STATISTICAL NLP (WINTER 2017) 26
CS 295: STATISTICAL NLP (WINTER 2017) 27
“The computer which I had just put into the dining room on the fifth floor crashed.” “The computer which I had just put into the dining room on the fifth floor had lunch.”
CS 295: STATISTICAL NLP (WINTER 2017) 28
CS 295: STATISTICAL NLP (WINTER 2017) 29
CS 295: STATISTICAL NLP (WINTER 2017) 30
Use Logs
Filter out n-grams
CS 295: STATISTICAL NLP (WINTER 2017) 31
CS 295: STATISTICAL NLP (WINTER 2017) 32
New words Rare words/combinations Mispellings
Training set: … denied the allegations … denied the reports … denied the claims … denied the request P(“offer” | denied the) = 0
… denied the offer … denied the loan
CS 295: STATISTICAL NLP (WINTER 2017) 33
CS 295: STATISTICAL NLP (WINTER 2017) 34
When we have sparse statistics:
P(w | denied the) 3 allegations 2 reports 1 claims 1 request 7 total
allegations reports claims
attack
request
man
P(w | denied the) 2.5 allegations 1.5 reports 0.5 claims 0.5 request 2 other 7 total
allegations
attack man
allegations reports
claims
request
Steal probability mass to generalize better
CS 295: STATISTICAL NLP (WINTER 2017) 35
CS 295: STATISTICAL NLP (WINTER 2017) 36
CS 295: STATISTICAL NLP (WINTER 2017) 37
Backoff
Interpolation
CS 295: STATISTICAL NLP (WINTER 2017) 38
Homework
Project