CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) - - PowerPoint PPT Presentation

cs325 artificial intelligence natural language processing
SMART_READER_LITE
LIVE PREVIEW

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) - - PowerPoint PPT Presentation

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) Dr. Cengiz Gnay, Emory Univ. Spring 2013 Gnay Natural Language Processing I (Ch. 22) Spring 2013 1 / 30 AI in Natural Language Processing (NLP) Whats NLP? Gnay


slide-1
SLIDE 1

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22)

  • Dr. Cengiz Günay, Emory Univ.

Spring 2013

Günay Natural Language Processing I (Ch. 22) Spring 2013 1 / 30

slide-2
SLIDE 2

AI in Natural Language Processing (NLP)

What’s NLP?

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-3
SLIDE 3

AI in Natural Language Processing (NLP)

What’s NLP? Computers understanding our languages: English, French, Japanese, . . .

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-4
SLIDE 4

AI in Natural Language Processing (NLP)

What’s NLP? Computers understanding our languages: English, French, Japanese, . . . Why?

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-5
SLIDE 5

AI in Natural Language Processing (NLP)

What’s NLP? Computers understanding our languages: English, French, Japanese, . . . Why? We can talk to the computer

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-6
SLIDE 6

AI in Natural Language Processing (NLP)

What’s NLP? Computers understanding our languages: English, French, Japanese, . . . Why? We can talk to the computer It can talk to us, too

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-7
SLIDE 7

AI in Natural Language Processing (NLP)

What’s NLP? Computers understanding our languages: English, French, Japanese, . . . Why? We can talk to the computer It can talk to us, too And it can read our stuff

Günay Natural Language Processing I (Ch. 22) Spring 2013 2 / 30

slide-8
SLIDE 8

Entry/Exit Surveys

Exit survey: Robotics II – Navigation

Why do we normalize particle weights? Where are they used next? How can we force a robot whether or not to choose actions like taking a left turn?

Entry survey: Natural Language Processing I (0.25 pts)

Give some examples, other than what I showed, where NLP would be useful. Explain briefly how the spam filter in the machine learning lecture worked.

Günay Natural Language Processing I (Ch. 22) Spring 2013 3 / 30

slide-9
SLIDE 9

What Can We Do With NLP?

What NLP task we can do with the following? Classification:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-10
SLIDE 10

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-11
SLIDE 11

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-12
SLIDE 12

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . .

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-13
SLIDE 13

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-14
SLIDE 14

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-15
SLIDE 15

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect,

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-16
SLIDE 16

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-17
SLIDE 17

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-18
SLIDE 18

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews.

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-19
SLIDE 19

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-20
SLIDE 20

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines.

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-21
SLIDE 21

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-22
SLIDE 22

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson.

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-23
SLIDE 23

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson. Translation:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-24
SLIDE 24

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson. Translation: Google translate, Altavista Babelfish.

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-25
SLIDE 25

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson. Translation: Google translate, Altavista Babelfish. Speech recognition:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-26
SLIDE 26

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson. Translation: Google translate, Altavista Babelfish. Speech recognition: Diction programs, Siri. Learning:

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-27
SLIDE 27

What Can We Do With NLP?

What NLP task we can do with the following? Classification: Spam vs. Ham Clustering: News articles, emails, . . . Spelling: Atuo-crorect, auto-correct Product ranking: Read user reviews. Information retrieval: Search engines. Answering questions: IBM’s Watson. Translation: Google translate, Altavista Babelfish. Speech recognition: Diction programs, Siri. Learning: Tap into the world’s knowledge. . .

Günay Natural Language Processing I (Ch. 22) Spring 2013 4 / 30

slide-28
SLIDE 28

Learning From Language

Günay Natural Language Processing I (Ch. 22) Spring 2013 5 / 30

slide-29
SLIDE 29

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-30
SLIDE 30

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic Word-based

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-31
SLIDE 31

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic Word-based Learned

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-32
SLIDE 32

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic Word-based Learned Logical

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-33
SLIDE 33

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic Word-based Learned Logical Grammar

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-34
SLIDE 34

How Can We Understand Language?

Two methods:

Model How? Construct? Probabilistic Word-based Learned Logical Grammar Programmed

Günay Natural Language Processing I (Ch. 22) Spring 2013 6 / 30

slide-35
SLIDE 35

Remember Bag of Words?

P(Hello) =?

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-36
SLIDE 36

Remember Bag of Words?

P(Hello) = 2

5

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-37
SLIDE 37

Remember Bag of Words?

P(Hello) = 2

5

P(I) =?

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-38
SLIDE 38

Remember Bag of Words?

P(Hello) = 2

5

P(I) = 1

5

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-39
SLIDE 39

Remember Bag of Words?

P(Hello) = 2

5

P(I) = 1

5

= P(Will) = P(Say)

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-40
SLIDE 40

Remember Bag of Words?

P(Hello) = 2

5

P(I) = 1

5

= P(Will) = P(Say) Words are independent?

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-41
SLIDE 41

Remember Bag of Words?

P(Hello) = 2

5

P(I) = 1

5

= P(Will) = P(Say) Words are independent? Called unigram or 1-gram:

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-42
SLIDE 42

Remember Bag of Words?

P(Hello) = 2

5

P(I) = 1

5

= P(Will) = P(Say) Words are independent? Called unigram or 1-gram: P(w1, w2, . . . , wn) =

  • i

P(wi)

Günay Natural Language Processing I (Ch. 22) Spring 2013 7 / 30

slide-43
SLIDE 43

Can we get more from Bayes?

Distinguish between:

“I will say hello” “I hello say will”

Günay Natural Language Processing I (Ch. 22) Spring 2013 8 / 30

slide-44
SLIDE 44

Can we get more from Bayes?

Distinguish between:

“I will say hello” “I hello say will” P(”hello”|”I will say”) > < P(”will”|”I hello say”)

Günay Natural Language Processing I (Ch. 22) Spring 2013 8 / 30

slide-45
SLIDE 45

Can we get more from Bayes?

Distinguish between:

“I will say hello” “I hello say will” P(”hello”|”I will say”) > P(”will”|”I hello say”)

Günay Natural Language Processing I (Ch. 22) Spring 2013 8 / 30

slide-46
SLIDE 46

Can we get more from Bayes?

Distinguish between:

“I will say hello” “I hello say will” P(”hello”|”I will say”) > P(”will”|”I hello say”) Words dependent on previous words: called N-gram

Günay Natural Language Processing I (Ch. 22) Spring 2013 8 / 30

slide-47
SLIDE 47

Can we get more from Bayes?

Distinguish between:

“I will say hello” “I hello say will” P(”hello”|”I will say”) > P(”will”|”I hello say”) Words dependent on previous words: called N-gram P(w1, w2, . . . , wn) = P(w1:n) =

  • i

P(wi|w1:(i−1))

Günay Natural Language Processing I (Ch. 22) Spring 2013 8 / 30

slide-48
SLIDE 48

Must Remember All Words That Came Before?

P(”1752”|”Thomas Bayes . . . ”) =?

Günay Natural Language Processing I (Ch. 22) Spring 2013 9 / 30

slide-49
SLIDE 49

Must Remember All Words That Came Before?

P(”1752”|”Thomas Bayes . . . ”) =? Markov assumption: Only remember last N words: N-gram.

Günay Natural Language Processing I (Ch. 22) Spring 2013 9 / 30

slide-50
SLIDE 50

Must Remember All Words That Came Before?

P(”1752”|”Thomas Bayes . . . ”) =? Markov assumption: Only remember last N words: N-gram. P(w1:k) =

k

  • i

P(wi|w(i−N):(i−1))

Günay Natural Language Processing I (Ch. 22) Spring 2013 9 / 30

slide-51
SLIDE 51

Let’s Read Shakespeare. . . In Unigram

Unigram=1-gram

Günay Natural Language Processing I (Ch. 22) Spring 2013 10 / 30

slide-52
SLIDE 52

Shakespeare In Bigram

N = 2: bigram

Günay Natural Language Processing I (Ch. 22) Spring 2013 11 / 30

slide-53
SLIDE 53

Shakespeare In Trigram

N = 3: trigram

Günay Natural Language Processing I (Ch. 22) Spring 2013 12 / 30

slide-54
SLIDE 54

Shakespeare In 4-gram

Günay Natural Language Processing I (Ch. 22) Spring 2013 13 / 30

slide-55
SLIDE 55

Shakespeare N-gram Quiz

Find: 1 real quote 3x unigram picks 3x bigram picks 3x trigram picks

Günay Natural Language Processing I (Ch. 22) Spring 2013 14 / 30

slide-56
SLIDE 56

Shakespeare N-gram Quiz

Find: 1 real quote 3x unigram picks 3x bigram picks 3x trigram picks

Günay Natural Language Processing I (Ch. 22) Spring 2013 14 / 30

slide-57
SLIDE 57

Bigram Probability Question

P(ˆwoe is me|ˆ) =?

Given that:

ˆ: symbol showing start of sentence P(woei|ˆi−1) = .0002 P(isi|woei−1) = .07 P(mei|isi−1) = .0005

Günay Natural Language Processing I (Ch. 22) Spring 2013 15 / 30

slide-58
SLIDE 58

Bigram Probability Question

P(ˆwoe is me|ˆ) =.0002 × .07 × .0005 = 7 × 10−9

Given that:

ˆ: symbol showing start of sentence P(woei|ˆi−1) = .0002 P(isi|woei−1) = .07 P(mei|isi−1) = .0005

Günay Natural Language Processing I (Ch. 22) Spring 2013 15 / 30

slide-59
SLIDE 59

Other Tricks

Stationarity assumption: Context doesn’t change over time.

Günay Natural Language Processing I (Ch. 22) Spring 2013 16 / 30

slide-60
SLIDE 60

Other Tricks

Stationarity assumption: Context doesn’t change over time. Smoothing: Remember Laplace smooting?

Günay Natural Language Processing I (Ch. 22) Spring 2013 16 / 30

slide-61
SLIDE 61

Other Tricks

Stationarity assumption: Context doesn’t change over time. Smoothing: Remember Laplace smooting? Hidden variables: E.g., identify what a “noun” is.

Günay Natural Language Processing I (Ch. 22) Spring 2013 16 / 30

slide-62
SLIDE 62

Other Tricks

Stationarity assumption: Context doesn’t change over time. Smoothing: Remember Laplace smooting? Hidden variables: E.g., identify what a “noun” is. Use abstractions: Group “New York City”, or just look at letters.

Günay Natural Language Processing I (Ch. 22) Spring 2013 16 / 30

slide-63
SLIDE 63

Smaller Than Words?

What if we cannot distinguish words?

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-64
SLIDE 64

Smaller Than Words?

What if we cannot distinguish words?

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-65
SLIDE 65

Smaller Than Words?

What if we cannot distinguish words? English: “choosespain.com” “Choose Spain” OR “Chooses Pain”?

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-66
SLIDE 66

Smaller Than Words?

What if we cannot distinguish words? English: “choosespain.com” “Choose Spain” OR “Chooses Pain”? Segmentation: Dividing into words.

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-67
SLIDE 67

Smaller Than Words?

What if we cannot distinguish words? English: “choosespain.com” “Choose Spain” OR “Chooses Pain”? Segmentation: Dividing into words. Use Bayes again: s∗ = max P(w1:n) = max

  • i

P(wi|w1:i)

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-68
SLIDE 68

Smaller Than Words?

What if we cannot distinguish words? English: “choosespain.com” “Choose Spain” OR “Chooses Pain”? Segmentation: Dividing into words. Use Bayes again: s∗ = max P(w1:n) = max

  • i

P(wi|w1:i) Or Markov assumption (e.g., unigram): s∗ = max

  • i

P(wi)

Günay Natural Language Processing I (Ch. 22) Spring 2013 17 / 30

slide-69
SLIDE 69

Segmentation Complexity

s∗ = max

  • i

P(wi) What’s the complexity of segmenting: ”nowisthetime”?

Günay Natural Language Processing I (Ch. 22) Spring 2013 18 / 30

slide-70
SLIDE 70

Segmentation Complexity

s∗ = max

  • i

P(wi) What’s the complexity of segmenting: ”nowisthetime”?

1 n − 1 2 (n − 1)2 3 (n − 1)! 4 2n−1 5 (n − 1)n Günay Natural Language Processing I (Ch. 22) Spring 2013 18 / 30

slide-71
SLIDE 71

Segmentation Complexity

s∗ = max

  • i

P(wi) What’s the complexity of segmenting: ”nowisthetime”?

1 n − 1 2 (n − 1)2 3 (n − 1)! 4 2n−1 5 (n − 1)n Günay Natural Language Processing I (Ch. 22) Spring 2013 18 / 30

slide-72
SLIDE 72

Segmentation Complexity

s∗ = max

  • i

P(wi) What’s the complexity of segmenting: ”nowisthetime”?

1 n − 1 2 (n − 1)2 3 (n − 1)! 4 2n−1 5 (n − 1)n

Solution: Separate each character with n − 1 divisions, form words by whether division exists or not.

Günay Natural Language Processing I (Ch. 22) Spring 2013 18 / 30

slide-73
SLIDE 73

Reducing Segmentation Complexity

Exploit independence: ”nowisthetime”?

Günay Natural Language Processing I (Ch. 22) Spring 2013 19 / 30

slide-74
SLIDE 74

Reducing Segmentation Complexity

Exploit independence: ”nowisthetime”? Divide into first, f , and recurse for rest, r: s∗ = max

s=f +r P(f ) · s∗(r)

Günay Natural Language Processing I (Ch. 22) Spring 2013 19 / 30

slide-75
SLIDE 75

Reducing Segmentation Complexity

Exploit independence: ”nowisthetime”? Divide into first, f , and recurse for rest, r: s∗ = max

s=f +r P(f ) · s∗(r)

Gives 99% accuracy and easy implementation!

Günay Natural Language Processing I (Ch. 22) Spring 2013 19 / 30

slide-76
SLIDE 76

Segmentation Problems

Günay Natural Language Processing I (Ch. 22) Spring 2013 20 / 30

slide-77
SLIDE 77

Segmentation Problems

How can we improve?

1 More Data 2 Markov 3 Smoothing Günay Natural Language Processing I (Ch. 22) Spring 2013 20 / 30

slide-78
SLIDE 78

Segmentation Problems

How can we improve?

1 More Data 2 Markov 3 Smoothing Günay Natural Language Processing I (Ch. 22) Spring 2013 20 / 30

slide-79
SLIDE 79

Segmentation Problems

How can we improve?

1 More Data 2 Markov 3 Smoothing

Need to get the context.

Günay Natural Language Processing I (Ch. 22) Spring 2013 20 / 30

slide-80
SLIDE 80

Segmentation Problems (2)

Günay Natural Language Processing I (Ch. 22) Spring 2013 21 / 30

slide-81
SLIDE 81

Segmentation Problems (2)

How can we improve?

1 More Data 2 Markov 3 Smoothing Günay Natural Language Processing I (Ch. 22) Spring 2013 21 / 30

slide-82
SLIDE 82

Segmentation Problems (2)

How can we improve?

1 More Data 2 Markov 3 Smoothing Günay Natural Language Processing I (Ch. 22) Spring 2013 21 / 30

slide-83
SLIDE 83

Segmentation Problems (2)

How can we improve?

1 More Data 2 Markov 3 Smoothing Günay Natural Language Processing I (Ch. 22) Spring 2013 21 / 30

slide-84
SLIDE 84

Segmentation Problems (2)

How can we improve?

1 More Data 2 Markov 3 Smoothing

Need to know more words.

Günay Natural Language Processing I (Ch. 22) Spring 2013 21 / 30

slide-85
SLIDE 85

What Else Can We Do with Letters?

Language identification?

Günay Natural Language Processing I (Ch. 22) Spring 2013 22 / 30

slide-86
SLIDE 86

What Else Can We Do with Letters?

Language identification?

Günay Natural Language Processing I (Ch. 22) Spring 2013 22 / 30

slide-87
SLIDE 87

Bigram Recognition with Letters

Günay Natural Language Processing I (Ch. 22) Spring 2013 23 / 30

slide-88
SLIDE 88

Bigram Recognition with Letters

Günay Natural Language Processing I (Ch. 22) Spring 2013 23 / 30

slide-89
SLIDE 89

Trigram Recognition with Letters

Günay Natural Language Processing I (Ch. 22) Spring 2013 24 / 30

slide-90
SLIDE 90

Trigram Recognition with Letters

Günay Natural Language Processing I (Ch. 22) Spring 2013 24 / 30

slide-91
SLIDE 91

Trigram Recognition with Letters

Günay Natural Language Processing I (Ch. 22) Spring 2013 24 / 30

slide-92
SLIDE 92

Trigram Recognition with Letters

99% accuracy from trigrams!

Günay Natural Language Processing I (Ch. 22) Spring 2013 24 / 30

slide-93
SLIDE 93

Can We Identify Categories Too?

Günay Natural Language Processing I (Ch. 22) Spring 2013 25 / 30

slide-94
SLIDE 94

Can We Identify Categories Too?

Text classification

Günay Natural Language Processing I (Ch. 22) Spring 2013 25 / 30

slide-95
SLIDE 95

Text Classification

What algorithms can we use? Naive Bayes:

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-96
SLIDE 96

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-97
SLIDE 97

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor:

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-98
SLIDE 98

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-99
SLIDE 99

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines:

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-100
SLIDE 100

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-101
SLIDE 101

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning Regression:

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-102
SLIDE 102

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning Regression: Prediction

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-103
SLIDE 103

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning Regression: Prediction Zip:

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-104
SLIDE 104

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning Regression: Prediction Zip: What??

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-105
SLIDE 105

Text Classification

What algorithms can we use? Naive Bayes: Spam vs. Ham k-Nearest Neighbor: Similar words Support Vector Machines: Supervised learning Regression: Prediction Zip: What??

Günay Natural Language Processing I (Ch. 22) Spring 2013 26 / 30

slide-106
SLIDE 106

Compression As Classifier

Günay Natural Language Processing I (Ch. 22) Spring 2013 27 / 30

slide-107
SLIDE 107

Compression As Classifier

Günay Natural Language Processing I (Ch. 22) Spring 2013 27 / 30

slide-108
SLIDE 108

Spelling Correction

Correction, c, for word, w: c∗ = max

c

P(c|w)

Günay Natural Language Processing I (Ch. 22) Spring 2013 28 / 30

slide-109
SLIDE 109

Spelling Correction

Correction, c, for word, w: c∗ = max

c

P(c|w) Use Bayes Rule: c∗ = max

c

P(w|c)P(c) where P(c) from data counts P(w|c) from spelling correction data

Günay Natural Language Processing I (Ch. 22) Spring 2013 28 / 30

slide-110
SLIDE 110

Spelling Correction Data

Günay Natural Language Processing I (Ch. 22) Spring 2013 29 / 30

slide-111
SLIDE 111

Spelling Correction Example

Günay Natural Language Processing I (Ch. 22) Spring 2013 30 / 30