Lecture 2: N-gram
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 CS 6501: Natural Language Processing
Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia - - PowerPoint PPT Presentation
Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS 6501: Natural Language Processing 1 This lecture Language Models What are N-gram models? How to use
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 CS 6501: Natural Language Processing
CS 6501: Natural Language Processing 2
CS 6501: Natural Language Processing 3
CS 6501: Natural Language Processing 4
CS 6501: Natural Language Processing 5
CS 6501: Natural Language Processing 6
CS 6501: Natural Language Processing 7
CS 6501: Natural Language Processing 8
http://recognize-speech.com/language-model/n-gram-model/comparison
CS 6501: Natural Language Processing 9
CS 6501: Natural Language Processing 10
CS 6501: Natural Language Processing 11
CS 6501: Natural Language Processing 12
Example from Julia hockenmaier, Intro to NLP
CS 6501: Natural Language Processing 13
CS 6501: Natural Language Processing 14
Example from Julia hockenmaier, Intro to NLP
CS 6501: Natural Language Processing 15
Example from Julia hockenmaier, Intro to NLP
𝑄(𝐵)
CS 6501: Natural Language Processing 16
CS 6501: Natural Language Processing 17
CS 6501: Natural Language Processing 18
Chain rule: from conditional probability to joint probability
We need independence assumptions!
CS 6501: Natural Language Processing 19
CS 6501: Natural Language Processing 20
CS 6501: Natural Language Processing 21
CS 6501: Natural Language Processing 22
CS 6501: Natural Language Processing 23
CS 6501: Natural Language Processing 24
CS 6501: Natural Language Processing 25
CS 6501: Natural Language Processing 26
First Citizen: Nay, then, that was hers, It speaks against your other service: But since the youth of the circumstance be spoken: Your uncle and one Baptista's daughter. SEBASTIAN: Do I stand till the break off. BIRON: Hide thy head.
CS 6501: Natural Language Processing 27
~~/* * linux/kernel/time.c * Please report this on hardware. */ void irq_mark_irq(unsigned long old_entries, eval); /* * Divide only 1000 for ns^2 -> us^2 conversion values don't
seq_puts(m, "\ttramp: %pS", (void *)class->contending_point]++; if (likely(t->flags & WQ_UNBOUND)) { /* * Update inode information. If the * slowpath and sleep time (abs or rel) * @rmtp: remaining (either due * to consume the state of ring buffer size. */ header_size - size, in bytes, of the chain. */ BUG_ON(!error); } while (cgrp) { if (old) { if (kdb_continue_catastrophic; #endif
CS 6501: Natural Language Processing 28
Unigram Language Model p(w| )=? Document
text 10 mining 5 association 3 database 3 algorithm 2 … query 1 efficient 1 … text ? mining ? assocation ? database ? … query ? …
Estimation A paper (total #words=100)
10/100 5/100 3/100 3/100 1/100
CS 6501: Natural Language Processing 29
CS 6501: Natural Language Processing 30
CS 6501: Natural Language Processing 31
32
𝑞 𝑋 𝜄 = 𝑂 𝑑 𝑥1 , … , 𝑑(𝑥𝑂) ෑ
𝑗=1 𝑂
𝜄𝑗
𝑑(𝑥𝑗) ∝ ෑ 𝑗=1 𝑂
𝜄𝑗
𝑑(𝑥𝑗)
⇒ log 𝑞 𝑋 𝜄 =
𝑗=1 𝑂
𝑑 𝑥𝑗 log 𝜄𝑗 + 𝑑𝑝𝑜𝑡𝑢
CS 6501: Natural Language Processing
𝑗=1 𝑂
33
Set partial derivatives to zero ML estimate 𝑀 𝑋, 𝜄 =
𝑗=1 𝑂
𝑑 𝑥𝑗 log 𝜄𝑗 + 𝜇
𝑗=1 𝑂
𝜄𝑗 − 1 𝜖𝑀 𝜖𝜄𝑗 = 𝑑 𝑥𝑗 𝜄𝑗 + 𝜇 → 𝜄𝑗 = − 𝑑 𝑥𝑗 𝜇 σ𝑗=1
𝑂
𝜄𝑗=1 𝜇 = −
𝑗=1 𝑂
𝑑 𝑥𝑗 Since we have 𝜄𝑗 = 𝑑 𝑥𝑗 σ𝑗=1
𝑂
𝑑 𝑥𝑗 Requirement from probability
CS 6501: Natural Language Processing
𝑂
𝑑(𝑥𝑗,𝑥𝑗−1,…,𝑥𝑗−𝑜+1) 𝑑(𝑥𝑗−1,…,𝑥𝑗−𝑜+1)
CS 6501: Natural Language Processing 34
CS 6501: Natural Language Processing 35
CS 6501: Natural Language Processing 36
CS 6501: Natural Language Processing 37
File sizes: approx. 24 GB compressed (gzip'ed) text files Number of tokens: 1,024,908,267,229 Number of sentences: 95,119,665,584 Number of unigrams: 13,588,391 Number of bigrams: 314,843,401 Number of trigrams: 977,069,902 Number of fourgrams: 1,313,818,354 Number of fivegrams: 1,176,470,663
CS 6501: Natural Language Processing 38
CS 6501: Natural Language Processing 39
CS 6501: Natural Language Processing 40
CS 6501: Natural Language Processing 41
CS 6501: Natural Language Processing 42
CS 6501: Natural Language Processing 43
CS 6501: Natural Language Processing 44
CS 6501: Natural Language Processing 45