Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP - - PowerPoint PPT Presentation

language modeling
SMART_READER_LITE
LIVE PREVIEW

Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP - - PowerPoint PPT Presentation

Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 24, 2017 Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from. Outline Wrapup Word Embeddings Introduction to Language Models


slide-1
SLIDE 1

Language Modeling

  • Prof. Sameer Singh

CS 295: STATISTICAL NLP WINTER 2017

January 24, 2017

Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from.

slide-2
SLIDE 2

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 2

Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models

slide-3
SLIDE 3

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 3

Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models

slide-4
SLIDE 4

Predict surrounding words

CS 295: STATISTICAL NLP (WINTER 2017) 4

A bottle of tezguino is on the table. u v

slide-5
SLIDE 5

Negative Sampling

CS 295: STATISTICAL NLP (WINTER 2017) 5

slide-6
SLIDE 6

Neural View of Embeddings

CS 295: STATISTICAL NLP (WINTER 2017) 6

slide-7
SLIDE 7

Word embeddings

CS 295: STATISTICAL NLP (WINTER 2017) 7

Variations

  • Skip-gram: predict context from word
  • CBOW: predict word from context bag of words
  • Dependencies: a better description of context

Uses

  • Similarity:
  • Grammar:
  • Analogies
  • Gender:
  • Facts:
slide-8
SLIDE 8

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 8

Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models

slide-9
SLIDE 9

Language Models

CS 295: STATISTICAL NLP (WINTER 2017) 9

Probability of a Sentence

  • Is a given sentence something you would expect to see?
  • Syntactically (grammar) and Semantically (meaning)

Probability of the Next Word

  • Predict what comes next for a given sequence of words.
  • Think of it as V-way classification
slide-10
SLIDE 10

Task: Speech Recognition

CS 295: STATISTICAL NLP (WINTER 2017) 10

“eyes awe of an” “I saw a van”

OR

slide-11
SLIDE 11

Task: Machine Translation

CS 295: STATISTICAL NLP (WINTER 2017) 11

slide-12
SLIDE 12

Task: Handwriting Recognition

CS 295: STATISTICAL NLP (WINTER 2017) 12

http://www.cedar.buffalo.edu/handwriting/HRoverview.html

slide-13
SLIDE 13

Task: Image Captioning

CS 295: STATISTICAL NLP (WINTER 2017) 13

slide-14
SLIDE 14

Task: Spelling Correction

CS 295: STATISTICAL NLP (WINTER 2017) 14

The office is about fifteen minuets from my house P(about fifteen minutes from) >> P(about fifteen minuets from)

slide-15
SLIDE 15

Other Applications

CS 295: STATISTICAL NLP (WINTER 2017) 15

Summarization Question Answering Dialog Systems

slide-16
SLIDE 16

Evaluating Language Models

CS 295: STATISTICAL NLP (WINTER 2017) 16

Best choice: Extrinsic 2nd choice: Intrinsic

slide-17
SLIDE 17

Perplexity

CS 295: STATISTICAL NLP (WINTER 2017) 17

slide-18
SLIDE 18

Generating Text from an LM

CS 295: STATISTICAL NLP (WINTER 2017) 18

slide-19
SLIDE 19

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 19

Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models

slide-20
SLIDE 20

Direct Language Modeling

CS 295: STATISTICAL NLP (WINTER 2017) 20

P(“I do not like green eggs and ham”) P(w | “I do not like green eggs and ”)

slide-21
SLIDE 21

Applying the Chain Rule

CS 295: STATISTICAL NLP (WINTER 2017) 21

slide-22
SLIDE 22

Markov Assumption

CS 295: STATISTICAL NLP (WINTER 2017)

22

slide-23
SLIDE 23

Unigram Language Model

CS 295: STATISTICAL NLP (WINTER 2017) 23

slide-24
SLIDE 24

Bigram Language Model

CS 295: STATISTICAL NLP (WINTER 2017) 24

slide-25
SLIDE 25

Berkeley Restaurant Project

CS 295: STATISTICAL NLP (WINTER 2017) 25

slide-26
SLIDE 26

Berkeley Restaurant Project

CS 295: STATISTICAL NLP (WINTER 2017) 26

slide-27
SLIDE 27

N-Gram Language Models

CS 295: STATISTICAL NLP (WINTER 2017) 27

“The computer which I had just put into the dining room on the fifth floor crashed.” “The computer which I had just put into the dining room on the fifth floor had lunch.”

slide-28
SLIDE 28

Shakespeare

CS 295: STATISTICAL NLP (WINTER 2017) 28

slide-29
SLIDE 29

Wall Street Journal

CS 295: STATISTICAL NLP (WINTER 2017) 29

slide-30
SLIDE 30

Implementation Tips

CS 295: STATISTICAL NLP (WINTER 2017) 30

Use Logs

  • Prevent underflow
  • Sums, instead of products

Filter out n-grams

  • Rare n-grams are noisy/have low prob
  • Use unigrams to filter bigrams…
slide-31
SLIDE 31

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 31

Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models

slide-32
SLIDE 32

Zero Probability Problem

CS 295: STATISTICAL NLP (WINTER 2017) 32

  • Truthiness
  • #letalonethehashtags
  • bigly

New words Rare words/combinations Mispellings

  • “minuets”
  • Because corpus is finite..

Training set: … denied the allegations … denied the reports … denied the claims … denied the request P(“offer” | denied the) = 0

  • Test set

… denied the offer … denied the loan

slide-33
SLIDE 33

Laplace Smoothing

CS 295: STATISTICAL NLP (WINTER 2017) 33

slide-34
SLIDE 34

Intuition Behind Smoothing

CS 295: STATISTICAL NLP (WINTER 2017) 34

When we have sparse statistics:

P(w | denied the) 3 allegations 2 reports 1 claims 1 request 7 total

allegations reports claims

attack

request

man

  • utcome

P(w | denied the) 2.5 allegations 1.5 reports 0.5 claims 0.5 request 2 other 7 total

allegations

attack man

  • utcome

allegations reports

claims

request

Steal probability mass to generalize better

slide-35
SLIDE 35

Berkeley Restaurant Project

CS 295: STATISTICAL NLP (WINTER 2017) 35

slide-36
SLIDE 36

Berkeley Restaurant Project

CS 295: STATISTICAL NLP (WINTER 2017) 36

slide-37
SLIDE 37

Backoff and Interpolation

CS 295: STATISTICAL NLP (WINTER 2017) 37

  • Use trigram, unless rare
  • Then use bigram, unless rare
  • Then use unigram..

Backoff

  • Combine all three!
  • Linear function with parameters
  • Learn on held out data

Interpolation

slide-38
SLIDE 38

Upcoming…

CS 295: STATISTICAL NLP (WINTER 2017) 38

  • Homework 1 is due: January 26, 2017
  • Write-up, data, and code for Homework 2 is up
  • Homework 2 is due: February 9, 2017

Homework

  • Proposal is due: February 7, 2017 (~2 weeks)
  • Make things more concrete: approach, metrics, baselines
  • Mention progress, and address my concerns, if any
  • Only 2 pages

Project