c NB argmax log P ( c j ) log P ( x i | c j ) - - PowerPoint PPT Presentation

c nb argmax log p c j log p x i c j
SMART_READER_LITE
LIVE PREVIEW

c NB argmax log P ( c j ) log P ( x i | c j ) - - PowerPoint PPT Presentation

V ECTOR V ECTOR ECTOR S PACE ECTOR S PACE PACE C LASSIFICATION PACE C LASSIFICATION LASSIFICATION LASSIFICATION Christopher D. Christopher D. Manning, Manning, Prabhaka Prabhakar Raghavan aghavan and Hinrich nd Hinrich Schtze, chtze, I


slide-1
SLIDE 1

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

Christopher D. Christopher D. Manning, Manning, Prabhaka Prabhakar Raghavan aghavan and Hinrich nd Hinrich Schütze, chütze, I d I d I f I f R l R l C b C b d U P Intro ntroductio uction to to Information

  • rmation Retrie

etrieval, , Cam ambri ridge ge University niversity Pre ress ss. . Cha Chapter ter 14 p Wei Wei wwei@idi.ntnu.no

Lect Lecture serie eries

TDT4215

Vector Space Classification

1

slide-2
SLIDE 2

RecALL RecALL: Naïve : Naïve Bayes Bayes classifiers classifiers RecALL RecALL: Naïve : Naïve Bayes Bayes classifiers classifiers

  • Classify based on prior weight of class and conditional parameter for
  • Classify based on prior weight of class and conditional parameter for

what each word says:

    cNB  argmax

cj C

logP(c j)  logP(xi |c j)

ipositions

     

  • Training is done by counting and dividing:

P(c j)  Nc j N

P(xk |c j)  Tc j xk   [Tc x   ]

  • Don’t forget to smooth

[

c j xi

]

xi V

2 TDT4215

Vector Space Classification

slide-3
SLIDE 3

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification

T d :

  • Today:

– Vector space methods for Text l f Classification

  • Rocchio classification
  • K Nearest Neighbors

– Linear classifier and non-linear classifier Linear classifier and non-linear classifier – Classification with more than two classes

3 TDT4215

Vector Space Classification

slide-4
SLIDE 4

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification

Vector space methods for Vector space methods for Text Classification

4 TDT4215

Vector Space Classification

slide-5
SLIDE 5

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

  • Vector Space Representation
  • Vector Space Representation

– Each document is a vector, one component for each term (= word) each term (= word). – Normally normalize vectors to unit length. – High dimensional vector space: – High-dimensional vector space:

  • Terms are axes
  • 10 000+ dimensions or even 100 000+
  • 10,000+ dimensions, or even 100,000+
  • Docs are vectors in this space

– How can we do classification in this space?

5 TDT4215

Vector Space Classification

slide-6
SLIDE 6

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

VECTOR

ECTOR SPACE PACE CLASSIFICATION LASSIFICATION

  • As before the training set is a set of documents
  • As before, the training set is a set of documents,

each labeled with its class (e.g., topic)

  • In vector space classification this set corresponds
  • In vector space classification, this set corresponds

to a labeled set of points (or, equivalently, vectors) in the vector space p

  • Hypothesis 1: Documents in the same class form a

contiguous region of space

  • Hypothesis 2: Documents from different classes

don’t overlap

  • We define surfaces to delineate classes in the space

6 TDT4215

Vector Space Classification

slide-7
SLIDE 7

Documents Documents in a vector in a vector space space Documents Documents in a vector in a vector space space

Government Science Arts

7 TDT4215

Vector Space Classification

slide-8
SLIDE 8

Test document: which class? Test document: which class? Test document: which class? Test document: which class?

Government Science Arts

8 TDT4215

Vector Space Classification

slide-9
SLIDE 9

Test document Test document = government government Test document Test document government government

Government Science Arts

9

Our main topic today is how to find good separators

9 TDT4215

Vector Space Classification

slide-10
SLIDE 10

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification

R hi l ifi i Rocchio text classification

10 TDT4215

Vector Space Classification

slide-11
SLIDE 11

Rocchio Rocchio text classification text classification Rocchio Rocchio text classification text classification

  • Rocchio Text Classification
  • Rocchio Text Classification

– Use standard tf-idf weighted vectors to represent text documents represent text documents – For training documents in each category, compute a prototype vector by summing the vectors of the a prototype vector by summing the vectors of the training documents in the category

  • Prototype = centroid of members of class

yp f m m f – Assign test documents to the category with the closest prototype vector based on cosine p yp similarity

11 TDT4215

Vector Space Classification

slide-12
SLIDE 12

DEFINITION

EFINITION OF OF CENTROID ENTROID

DEFINITION

EFINITION OF OF CENTROID ENTROID

  (c)  1 | D |  v (d)

  • Where Dc is the set of all documents that belong

 | Dc | d Dc

c

g to class c and v (d) is the vector space representation of d.

  • Note that centroid will in general not be a unit

h h vector even when the inputs are unit vectors.

12 TDT4215

Vector Space Classification

slide-13
SLIDE 13

ROCCHIO

CCHIO TEXT TEXT CLASSIFICA CLASSIFICATION ON

ROCCHIO

CCHIO TEXT TEXT CLASSIFICA CLASSIFICATION ON

r r1 r2 r3 t b t=b b1 b2 b

13 TDT4215

Vector Space Classification

slide-14
SLIDE 14

ROCCHIO

CCHIO PROPERTIES ROPERTIES

ROCCHIO

CCHIO PROPERTIES ROPERTIES

  • Forms a simple generalization of the
  • Forms a simple generalization of the

examples in each class (a prototype).

  • Prototype vector does not need to be
  • Prototype vector does not need to be

averaged or otherwise normalized for length since cosine similarity is insensitive to since cosine similarity is insensitive to vector length.

  • Classification is based on similarity to class

Classification is based on similarity to class prototypes.

  • Does not guarantee classifications are

Does not guarantee classifications are consistent with the given training data.

14 TDT4215

Vector Space Classification

slide-15
SLIDE 15

ROCCHIO

CCHIO ANOMA NOMALY LY

ROCCHIO

CCHIO ANOMA NOMALY LY

  • Prototype models have problems with polymorphic (disjunctive)

categories.

r r1 r2 t b t=r r3 b1 b2 r4

15 TDT4215

Vector Space Classification

slide-16
SLIDE 16

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification

k N N i hb Cl ifi i k Nearest Neighbor Classification

16 TDT4215

Vector Space Classification

slide-17
SLIDE 17

K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON

  • kNN = k Nearest Neighbor
  • kNN = k Nearest Neighbor

T l ssif d m nt d int l ss :

  • To classify document d into class c:
  • Define k-neighborhood N as k nearest neighbors of

d

  • Count number of documents i in N that belong to c
  • Estimate P(c|d) as i/k
  • Estimate P(c|d) as i/k
  • Choose as class argmaxc P(c|d) [ = majority class]

17 TDT4215

Vector Space Classification

slide-18
SLIDE 18

K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON

  • Unlike Rocchio kNN classification determines the
  • Unlike Rocchio, kNN classification determines the

decision boundary locally.

  • For 1NN (k=1) we assign each document to the class
  • For 1NN (k=1), we assign each document to the class
  • f its closest neighbor.
  • For kNN we assign each document to the majority

For kNN, we assign each document to the majority class of its k closest neighbors. K here is a parameter.

  • The rationale of kNN: contiguity hypothesis.

– We expect a test document d to have the same p label as the training documents located nearly.

18 TDT4215

Vector Space Classification

slide-19
SLIDE 19

Knn Knn: k : k=1 Knn Knn: k : k 1

19 TDT4215

Vector Space Classification

slide-20
SLIDE 20

Knn Knn: k : k=1 5 1 1 5 10 Knn Knn: k : k 1,5,10 10

20 TDT4215

Vector Space Classification

slide-21
SLIDE 21

KNN: weighted sum voting KNN: weighted sum voting KNN: weighted sum voting KNN: weighted sum voting

21 TDT4215

Vector Space Classification

slide-22
SLIDE 22

K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON

Test

Government Science Arts

22 TDT4215

Vector Space Classification

slide-23
SLIDE 23

K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON K NEAREST EAREST NEIGHBOR EIGHBOR CLASSIFICA LASSIFICATION ON

  • Learning is just storing the representations of the

training examples in D.

  • Testing instance x (under 1NN):
  • Testing instance x (under 1NN):

– Compute similarity between x and all examples in D. – Assign x the category of the most similar example in D. g g y p

  • Does not explicitly compute a generalization or

category prototypes.

  • Also called:
  • Also called:

– Case-based learning – Memory-based learning y g – Lazy learning

  • Rationale of kNN: contiguity hypothesis

23 TDT4215

Vector Space Classification

slide-24
SLIDE 24

SIMILARITY

IMILARITY METRICS ETRICS

SIMILARITY

IMILARITY METRICS ETRICS

  • Nearest neighbor method depends on a
  • Nearest neighbor method depends on a

similarity (or distance) metric.

  • Simplest for continuous m dimensional
  • Simplest for continuous m-dimensional

instance space is Euclidean distance.

  • Simplest for m dimensional binary instance
  • Simplest for m-dimensional binary instance

space is Hamming distance (number of feature values that differ) feature values that differ).

  • For text, cosine similarity of tf.idf weighted

vectors is typically most effective vectors is typically most effective.

24 TDT4215

Vector Space Classification

slide-25
SLIDE 25

An Example: An Example: consine consine similarity similarity An Example: An Example: consine consine similarity similarity

r1 r2 r3 t t=b b1 b2

25 TDT4215

Vector Space Classification

slide-26
SLIDE 26

Knn Knn discusion discusion Knn Knn discusion discusion

  • Functional definition of “similarity”

Functional definition of similarity

  • e.g. cos, Euclidean, kernel functions, ...
  • How many neighbors do we consider?
  • Value of k determined empirically (normally 3

5 )

  • r 5 )
  • Does each neighbor get the same weight?
  • Does each neighbor get the same weight?
  • Weighted-sum or not

26 TDT4215

Vector Space Classification

slide-27
SLIDE 27

Knn Knn discusion discusion (cont (cont ) Knn Knn discusion discusion (cont (cont.)

  • No feature selection necessary
  • No feature selection necessary
  • Scales well with large number of classes

D n’t n d t t in n l ssifi s f n l ss s – Don t need to train n classifiers for n classes

  • Classes can influence each other

S ll h t l h i l ff t – Small changes to one class can have ripple effect

  • Scores can be hard to convert to probabilities

N t i i

  • No training necessary

– Actually: perhaps not true. (Data editing, etc.) M b

  • May be more expensive at test time

27 TDT4215

Vector Space Classification

slide-28
SLIDE 28

Text Classification Text Classification Text Classification Text Classification

Linear classifier and non-linear classifier Linear classifier and non linear classifier

28 TDT4215

Vector Space Classification

slide-29
SLIDE 29

Linear Classifier Linear Classifier Linear Classifier Linear Classifier

  • Many common text classifiers are linear
  • Many common text classifiers are linear

classifiers Naïve Ba es – Naïve Bayes – Perceptron – Rocchio – Logistic regression g g – Support vector machines (with linear kernel) – Linear regression

29 TDT4215

Vector Space Classification

slide-30
SLIDE 30

Linear Classifier: 2d Linear Classifier: 2d Linear Classifier: 2d Linear Classifier: 2d

  • In two dimensions a linear classifier is a line
  • In two dimensions, a linear classifier is a line.

These lines have functional form

b x w x w  

2 2 1 1

The classification rules: if , => c if , => not-c

b x w x w  

2 2 1 1

b x w x w  

2 2 1 1

Here: : 2D vector representation

  • f the document

th t t th t

T

x x ) , (

2 1 T

) (

: the parameter vector that defines the boundary

T

w w ) , (

2 1

30 TDT4215

Vector Space Classification

slide-31
SLIDE 31

Linear Classifier Linear Classifier Linear Classifier Linear Classifier

  • We can generalize this 2D linear classifier to higher
  • We can generalize this 2D linear classifier to higher

dimensions by defining a hyperplane:

  • The classification rules is then:

– – Why Rocchio and Naive Bayes classifiers are linear

  • Why Rocchio and Naive Bayes classifiers are linear

classfiers.

31 TDT4215

Vector Space Classification

slide-32
SLIDE 32

Non Non-linear classifier linear classifier Non Non-linear classifier linear classifier

  • Non linear classifier: k NN
  • Non-linear classifier: k-NN
  • A linear classifier e.g. Naive Bayes does badly on the

task: task:

kNN will do very well (assuming enough training data)

32 TDT4215

Vector Space Classification

slide-33
SLIDE 33

Text classification Text classification Text classification Text classification

Cl ifi i i h h l Classification with more than two classes

33 TDT4215

Vector Space Classification

slide-34
SLIDE 34

Classification with more than two Classification with more than two classes classes

  • Classification for classes that are not mutually
  • Classification for classes that are not mutually

exclusive is called any-of classification problem.

  • Classification for classes that are mutually exclusive
  • Classification for classes that are mutually exclusive

is called one-of classification problem.

  • We have learned two-class linear classifiers

We have learned two class linear classifiers. – linear classifier that can classify d to c or not-c.

  • How can we extend the two-class linear classifiers

How can we extend the two class linear classifiers to J>2 classes. – to classify a document d to one of or any of to classify a document d to one of or any of classes c1, c2, c3…

34 TDT4215

Vector Space Classification

slide-35
SLIDE 35

Classification with more than two Classification with more than two classes classes

  • For one of classification tasks:
  • For one-of classification tasks:

1. Build a classifier for each class, where the training set consists of the set of documents in training set consists of the set of documents in the class and its complement. 2 Given the test document apply each classifier

  • 2. Given the test document, apply each classifier

separately.

  • 3. Assign the document to the class with

g m

  • the maximum score
  • the maximum confidence value

the maximum confidence value

  • r the maximum probability

35 TDT4215

Vector Space Classification

slide-36
SLIDE 36

Classification with more than two Classification with more than two classes classes

  • For any of classification tasks:
  • For any-of classification tasks:

1. Build a classifier for each class, where the training set consists of the set of documents in training set consists of the set of documents in the class and its compement. 2 Given the test document apply each classifier

  • 2. Given the test document, apply each classifier
  • separately. The decision of one classifier has no

influence on the decisions of the other classfier.

36 TDT4215

Vector Space Classification

slide-37
SLIDE 37

THE

HE TEXT EXT CLASSIFICATION LASSIFICATION PROBLEM ROBLEM

THE

HE TEXT EXT CLASSIFICATION LASSIFICATION PROBLEM ROBLEM

An example: An example:

  • Document d with only a sentance:

“London is planning to organize the 2012 Olympics.”

  • We have six classes:

<UK>, <China>, <car>, <coffee>, <elections>, <sports>

  • Determined:

<UK> and <sports> Determined UK p(UK)p(d|UK) > t1 p(sports)p(d|sports) > t2 p

37 TDT4215

Naive Bayes Text Classification

slide-38
SLIDE 38

summary summary summary summary

V t sp m th ds f T xt

  • Vector space methods for Text

Classification h l f – Rocchio classification – K Nearest Neighbors g

  • Linear classifier and non-linear classifier
  • Classification with more than two classes

Classification with more than two classes

38 TDT4215

Vector Space Classification