Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier - - PowerPoint PPT Presentation

natural language processing csci 4152 6509 lecture 12
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier - - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier Evaluation Instructor: Vlado Keselj Time and date: 09:3510:25, 31-Jan-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 12 1 / 29 Previous Lecture IR


slide-1
SLIDE 1

Natural Language Processing CSCI 4152/6509 — Lecture 12 Classifier Evaluation

Instructor: Vlado Keselj Time and date: 09:35–10:25, 31-Jan-2020 Location: Dunn 135

CSCI 4152/6509, Vlado Keselj Lecture 12 1 / 29

slide-2
SLIDE 2

Previous Lecture

IR evaluation measures

◮ Precision, Recall, F-measure

Precision-recall curves, example Other evaluation measures Text classification

◮ Text classification as a text mining problem ◮ Types of text classification CSCI 4152/6509, Vlado Keselj Lecture 12 2 / 29

slide-3
SLIDE 3

Evaluation Measures for Text Classification

Contingency table (confusion matrix) and Accuracy Example (classes A, B, and C): Model classification Gold standard A B C A 5 1 1 7 B 3 10 2 15 C 2 10 12 8 13 13 34 Accuracy: percentage of correct classifications; in the example, = 25/34 ≈ 0.7353 = 73.53%

CSCI 4152/6509, Vlado Keselj Lecture 12 3 / 29

slide-4
SLIDE 4

Per class: Precision, Recall, and F-measure

For each class: Yes = in class, No = not in class Yes is correct No is correct Yes assigned a b No assigned c d precision ( a

a+b), recall ( a a+c), fallout ( b b+d),

F-measure: F = (β2 + 1)PR β2P + R If β = 1 ⇒ Precision and Recall treated equally macro-averaging (equal weight to each class) and micro-averaging (equal weight to each object) (2×2 contingency tables vs. one large contingency table)

CSCI 4152/6509, Vlado Keselj Lecture 12 4 / 29

slide-5
SLIDE 5

Example: Classification Results

System response Gold standard A1 A2 A3 A1 5 1 1 7 A2 3 10 2 15 A3 2 10 12 8 13 13 34 Or, we can create contingency tables for each class separately: Gold standard A1 not A1 A1 5 2 7 not A1 3 24 27 8 26 34 Gold standard A2 not A2 A2 10 5 15 not A2 3 16 19 13 21 34

CSCI 4152/6509, Vlado Keselj Lecture 12 5 / 29

slide-6
SLIDE 6

Gold standard A3 not A3 A3 10 2 12 not A3 3 19 22 13 21 34 The overall accuracy can be calculated using the overall table; Accuracy = 5 + 10 + 10 34 Per-class precisions are: PA1 = 5 7 PA2 = 10 15 PA3 = 10 12 Per-class recalls are: RA1 = 5 8 RA2 = 10 13 RA3 = 10 13

CSCI 4152/6509, Vlado Keselj Lecture 12 6 / 29

slide-7
SLIDE 7

Macro-averaged precision, recall, and F-measure are: Pmacro = 5/7 + 10/15 + 10/12 3 Rmacro = 5/8 + 10/13 + 10/13 3 Fmacro = 2 · Pmacro · Rmacro Pmacro + Rmacro

CSCI 4152/6509, Vlado Keselj Lecture 12 7 / 29

slide-8
SLIDE 8

To calculate micro-averaged precision, recall, and F-measure, we calculate cumulative per-class table: Gold standard A not A A 25 9 34 not A 9 59 68 34 68 102 and then we calculate the micro-averaged measures: Pmicro = 25 34 Rmicro = 25 34 Fmicro = 2 · Pmicro · Rmicro Pmicro + Rmicro = 25 34

CSCI 4152/6509, Vlado Keselj Lecture 12 8 / 29

slide-9
SLIDE 9

Evaluation Methods for Classification

General issues in classification

◮ Underfitting and Overfitting

Example with polynomial-based function learning

◮ Underfitting and Overfitting CSCI 4152/6509, Vlado Keselj Lecture 12 9 / 29

slide-10
SLIDE 10

Evaluation Methods for Text Classifiers

Training Error Train and Test N-fold Cross-validation

CSCI 4152/6509, Vlado Keselj Lecture 12 10 / 29

slide-11
SLIDE 11

Train and Test

Labeled data is divided into training and testing data Typically training data size : testing data size = 9 : 1, sometimes 2 : 1

training data training classifier testing data evaluation

CSCI 4152/6509, Vlado Keselj Lecture 12 11 / 29

slide-12
SLIDE 12

N-fold Cross-Validation

classifier 1 fold 3 fold 2 . . . fold 1 fold n fold n−1 evaluation training fold n−1 fold 3 fold 2 evaluation training fold n fold 1 . . . fold 3 fold 2 . . . fold 1 evaluation training fold n fold n−1 classifier 2 classifier n

. . .

CSCI 4152/6509, Vlado Keselj Lecture 12 12 / 29

slide-13
SLIDE 13

Text Clustering

Text clustering is an interesting text mining task It is relevant to the course and a clustering task can be a project topic Since it is covered in some other courses, we will not cover it in much detail here Some notes are provided for your information

CSCI 4152/6509, Vlado Keselj Lecture 12 13 / 29

slide-14
SLIDE 14

Similarity-based Text Classification

Aggregate training text for each class into a profile Aggregate testing text into another profile Classify according to profile similarity If a profile is a vector, we can use different similarity measures; e.g.,

◮ cosine similarity, ◮ Euclidean similarity, or ◮ some other type of vector similarity CSCI 4152/6509, Vlado Keselj Lecture 12 14 / 29

slide-15
SLIDE 15

CNG Method for Text Classification

A simple method, initially used for authorship attribution Authorship attribution problem:

CSCI 4152/6509, Vlado Keselj Lecture 12 15 / 29

slide-16
SLIDE 16

CNG Method Overview

Method based on character n-grams Language independent Based on creating n-gram based author profiles Similarity based (a type of kNN method—k Nearest Neighbours) Similarity measure:

  • g∈D1∪D2
  • f1(g) − f2(g)

f1(g)+f2(g) 2

2 =

  • g∈D1∪D2

2 · (f1(g) − f2(g)) f1(g) + f2(g) 2 (1) where fi(g) = 0 if g ∈ Di.

CSCI 4152/6509, Vlado Keselj Lecture 12 16 / 29

slide-17
SLIDE 17

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 17 / 29

slide-18
SLIDE 18

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 18 / 29

slide-19
SLIDE 19

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 19 / 29

slide-20
SLIDE 20

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 20 / 29

slide-21
SLIDE 21

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 21 / 29

slide-22
SLIDE 22

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley ey_ n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 22 / 29

slide-23
SLIDE 23

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley ey_ y_w n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 23 / 29

slide-24
SLIDE 24

Example of Creating an Author Profile

M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley ey_ y_w _wa was ... n=3 Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 24 / 29

slide-25
SLIDE 25

Example of Creating an Author Profile

_th 0.015 ___ 0.013 the 0.013 he_ 0.011 and 0.007 _an 0.007 nd_ 0.007 ed_ 0.006 M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley ey_ y_w _wa was ... n=3 sort by frequency Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 25 / 29

slide-26
SLIDE 26

Example of Creating an Author Profile

_th 0.015 ___ 0.013 the 0.013 he_ 0.011 and 0.007 _an 0.007 nd_ 0.007 ed_ 0.006 M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) Mar arl rle ley ey_ y_w _wa was ... n=3 L=5 sort by frequency Preparing character n−gram profile (n=3, L=5)

CSCI 4152/6509, Vlado Keselj Lecture 12 26 / 29

slide-27
SLIDE 27

How to measure profile similarity?

CSCI 4152/6509, Vlado Keselj Lecture 12 27 / 29

slide-28
SLIDE 28

CNG Distance Measure

Euclidean-style distance with relative differences, rather than absolute Example: instead of using 0.88 − 0.80 = 0.10, we say it is about 10% difference, which is the same for 0.088 and 0.080 To be symmetric, divide by the arithmetic average: d(f1, f2) = Σn∈dom(f1)∪dom(f2)

  • f1(n) − f2(n)

f1(n)+f2(n) 2

2 dom(fi) is the domain of function fi, i.e., of the profile i

CSCI 4152/6509, Vlado Keselj Lecture 12 28 / 29

slide-29
SLIDE 29

Classification using CNG

Create profile for each class using training text

◮ done by merging all texts in each class into one

long document

◮ another option: centroid of profiles of

individual documents Create profile for the test document Assign class to the document according to the closest class profile according to the CNG distance

CSCI 4152/6509, Vlado Keselj Lecture 12 29 / 29