1 '%& - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 '%& - - PDF document


slide-1
SLIDE 1

1

  • Data Mining Lecture 3: Classification 1

2

  • Data Mining Lecture 3: Classification 1

3

  • !"#$%$&$'

!"#$&$'$( )) * +(( ,

  • $ . ,

)$ +( ,

Data Mining Lecture 3: Classification 1 4

!

/ 0($ ((,

1 ( (, *) ( ),

/ - (( ,2$( $+( ( ,

Data Mining Lecture 3: Classification 1 5

"#$

Apply Model Induction Deduction Learn Model Model

Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes

10

Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ?

10

Test Set Learning algorithm Training Set

Data Mining Lecture 3: Classification 1 6

%&$

  • ))((3$($
  • 4+$

+($$)$

slide-2
SLIDE 2

2

Data Mining Lecture 3: Classification 1 7

'%& 56$$2, 7(), 8$8)$, 7 +(8, )(, ),

Data Mining Lecture 3: Classification 1 8

%& (#%

739!%:(!6 7#:;!3;%:(! 73;#:(!2 >=28 <28 x >=18 <18 x U G VG

Data Mining Lecture 3: Classification 1 9

%& )#

View letters as constructed from 5 components: Letter C Letter E Letter A Letter D Letter F Letter B

Data Mining Lecture 3: Classification 1 10

  • *&&

/ ) 8+ 3), /

  • )) )+,

), <(.$ +8$ (,

Data Mining Lecture 3: Classification 1 11

!#

Partitioning Based

Class A Class B Class C

2 4 8 6 1 3 5 7 5 10

Distance Based

Class A Class B Class C 2 4 8 6 1 3 5 7 5 10

x x x x x x x x x x x x x x x x x Data Mining Lecture 3: Classification 1 12

" <

/ 7 / )+(

=> <

/ / 3

slide-3
SLIDE 3

3

Data Mining Lecture 3: Classification 1 13

+##! With Outliers Without Outliers ?

Data Mining Lecture 3: Classification 1 14

+#%&!

Nam e Gender Height Output1 Output2 Kristina F 1.60 Short Medium Jim M 2.02 Tall Medium Maggie F 1.90 Medium Tall Martha F 1.88 Medium Tall Stephanie F 1.71 Short Medium Bob M 1.85 Medium Medium Kathy F 1.60 Short Medium Dave M 1.72 Short Medium W orth M 2.12 Tall Tall Steven M 2.10 Tall Tall Debbie F 1.78 Medium Medium Todd M 1.95 Medium Medium Kim F 1.89 Medium Tall Amy F 1.81 Medium Medium W ynette F 1.75 Medium Medium

Data Mining Lecture 3: Classification 1 15

  • True Positive

True Negative False Positive False Negative

Tall Classified Tall Not Tall Classified Tall Not Tall Classified Not Tall Tall Classified Not Tall 20 10 45 25 Data Mining Lecture 3: Classification 1 16

'%& 2((3)+(>)# >)%

A c tu a l A s s ig n m e n t M e m b e rs h ip S h o rt M e d iu m T a ll S h o rt 4 M e d iu m 5 3 T a ll 1 2

Data Mining Lecture 3: Classification 1 17

)#

  • )

# ?$#$&$,

  • *!?@#3#@&@3@ε

0. *

Data Mining Lecture 3: Classification 1 18

)#,

slide-4
SLIDE 4

4

Data Mining Lecture 3: Classification 1 19

  • #)#

! 2 , * 2) (),7) ,

Data Mining Lecture 3: Classification 1 20

!

Data Mining Lecture 3: Classification 1 21

  • Data Mining Lecture 3: Classification 1

22

". 03)*

/

<4) ( (3)3

/ (

28AB)()

  • Data Mining Lecture 3: Classification 1

23

# C*

/ 7+888$.888$(5 )8

Training Records Test Record Compute Distance Choose k of the “nearest” records

Data Mining Lecture 3: Classification 1 24

  • #!'

(+((( AB, <+ , )

/ * , / '*) ), / - )

  • (*/0#/
slide-5
SLIDE 5

5

Data Mining Lecture 3: Classification 1 25

0#

  • Requires three things

– The set of stored records – Distance Metric to compute distance between records – The value of k, the number of nearest neighbors to retrieve

  • To classify an unknown record:

– Compute distance to other training records – Identify k nearest neighbors – Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Unknown record Data Mining Lecture 3: Classification 1 26

!#

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have the k smallest distance to x

Data Mining Lecture 3: Classification 1 27

#

Voronoi Diagram

Data Mining Lecture 3: Classification 1 28

# )++)*

/ 0

((

  • / 8(D (8

( / E((

+($+!#F%

− =

i i i

q p q p d

2

) ( ) , (

Data Mining Lecture 3: Classification 1 29

/#/ , 03, +)+(( , >. (), =. (4(,

Data Mining Lecture 3: Classification 1 30

/*#

"& FF FF( FF) & FF+(( /*# !∅ G ∈ HH; !∪ "'G

  • !(+(3 G

$;$ !/ "'∪ "'G !(+((∈ G

slide-6
SLIDE 6

6

Data Mining Lecture 3: Classification 1 31

/*# ?

Data Mining Lecture 3: Classification 1 32

# " (( 8*

/ 78$ ) / 78$(()(

  • X

Data Mining Lecture 3: Classification 1 33

# ' I

/ -( )

  • (

/ 03)*

(() #,J#,: +() K?L?? ) M#?M#<

Data Mining Lecture 3: Classification 1 34

# 1&0& 84

/ (3) / 28(

  • / 8+ 3)