Nave Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky - - PowerPoint PPT Presentation

na ve bayes
SMART_READER_LITE
LIVE PREVIEW

Nave Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky - - PowerPoint PPT Presentation

Classification, Linear Models, Nave Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein Today Text classification problems and their evaluation Linear classifiers Features &


slide-1
SLIDE 1

Classification, Linear Models, Naïve Bayes

CMSC 470 Marine Carpuat

Slides credit: Dan Jurafsky & James Martin, Jacob Eisenstein

slide-2
SLIDE 2

Today

  • Text classification problems
  • and their evaluation
  • Linear classifiers
  • Features & Weights
  • Bag of words
  • Naïve Bayes
slide-3
SLIDE 3

Classification problems

slide-4
SLIDE 4

Multiclass Classification

label1 label2 label3 label4 Classifier supervised machine learning algorithm

?

unlabeled document label1? label2? label3? label4?

Testing Training

training data

Feature Functions

slide-5
SLIDE 5

Is this spam?

From: "Fabian Starr“ <Patrick_Freeman@pamietaniepeerelu.pl> Subject: Hey! Sofware for the funny prices! Get the great discounts on popular software today for PC and Macintosh http://iiled.org/Cj4Lmx 70-90% Discounts from retail price!!! All sofware is instantly available to download - No Need Wait!

slide-6
SLIDE 6

What is the subject of this article?

  • Antogonists and

Inhibitors

  • Blood Supply
  • Chemistry
  • Drug Therapy
  • Embryology
  • Epidemiology

MeSH Subject Category Hierarchy

?

MEDLINE Article

slide-7
SLIDE 7

Text Classification

  • Assigning subject categories, topics, or genres
  • Spam detection
  • Authorship identification
  • Age/gender identification
  • Language Identification
  • Sentiment analysis
slide-8
SLIDE 8

Text Classification: definition

  • Input:
  • a document d
  • a fixed set of classes Y = {y1, y2,…, yJ}
  • Output: a predicted class y  Y
slide-9
SLIDE 9

Classification Methods: Supervised Machine Learning

  • Input
  • a document d
  • a fixed set of classes Y = {y1, y2,…, yJ}
  • a training set of m hand-labeled documents (d1,y1),....,(dm,ym)
  • Output
  • a learned classifier d  y
slide-10
SLIDE 10

Aside: getting examples for supervised learning

  • Human annotation
  • By experts or non-experts (crowdsourcing)
  • Found data
  • How do we know how good a classifier is?
  • Compare classifier predictions with human annotation
  • On held out test examples
  • Evaluation metrics: accuracy, precision, recall
slide-11
SLIDE 11

The 2-by-2 contingency table

correct not correct selected tp fp not selected fn tn

slide-12
SLIDE 12

Precision and recall

  • Precision: % of selected items that are correct

Recall: % of correct items that are selected

correct not correct selected tp fp not selected fn tn

slide-13
SLIDE 13

A combined measure: F

  • A combined measure that assesses the P/R tradeoff is F measure

(weighted harmonic mean):

  • People usually use balanced F1 measure
  • i.e., with  = 1 (that is,  = ½):

F = 2PR/(P+R)

R P PR R P F + + =

  • +

=

2 2

) 1 ( 1 ) 1 ( 1 1 b b a a

slide-14
SLIDE 14

Linear Models for Multiclass Classification

slide-15
SLIDE 15

Linear Models for Classification

Feature function representation Weights

slide-16
SLIDE 16

Defining features: Bag of words

slide-17
SLIDE 17

Defining features

slide-18
SLIDE 18

Linear Classification

slide-19
SLIDE 19

Linear Models for Classification

Feature function representation Weights

slide-20
SLIDE 20

How can we learn weights?

  • By hand
  • Probability
  • e.g.,Naïve Bayes
  • Discriminative training
  • e.g., perceptron, support vector machines
slide-21
SLIDE 21

Naïve Bayes Models for Text Classification

slide-22
SLIDE 22

Generative Story for Multinomial Naïve Bayes

  • A hypothetical stochastic process describing how training examples

are generated

slide-23
SLIDE 23

Prediction with Naïve Bayes

Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

slide-24
SLIDE 24

Prediction with Naïve Bayes

Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

slide-25
SLIDE 25

Prediction with Naïve Bayes

Score(x,y) Definition of conditional probability Generative story assumptions This is a linear model!

slide-26
SLIDE 26

Parameter Estimation

  • “count and normalize”
  • Parameters of a multinomial distribution
  • Relative frequency estimator
  • Formally: this is the maximum likelihood estimate
  • See CIML for derivation
slide-27
SLIDE 27

Smoothing (add alpha)

slide-28
SLIDE 28

Naïve Bayes recap

slide-29
SLIDE 29

Why is this model called “Naïve Bayes”? Another view of the same model

𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦) = 𝑏𝑠𝑕𝑛𝑏𝑦𝑧𝑄(𝑍 = 𝑧)𝑄 𝑌 = 𝑦 𝑍 = 𝑧) = 𝑏𝑠𝑕𝑛𝑏𝑦𝑧𝑄(𝑍 = 𝑧)

𝑗=1 𝑒

𝑄 𝑌𝑗 = 𝑦𝑗 𝑍 = 𝑧) Bayes rule + Conditional independence assumption

slide-30
SLIDE 30

Today

  • Text classification problems
  • and their evaluation
  • Linear classifiers
  • Features & Weights
  • Bag of words
  • Naïve Bayes