1 Attribute Description Examples Operations Attribute - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Attribute Description Examples Operations Attribute - - PDF document


slide-1
SLIDE 1

1

  • Data Mining Lecture 2

2

  • Data Mining Lecture 2

3

  • !"#$"

# %# #$

& '( ))* & %#+, #)") )

%# "##$

&

  • #$+,

")))) )

Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes

1 0

Attributes Objects

Data Mining Lecture 2 4

  • %###"

# #,#"#

& ##""#

  • '(#"

& ##"

  • '(%#"
  • .##"

& #'"

Data Mining Lecture 2 5

  • "#

& /

  • '(#))"

& -"

  • '(+0**)

12134)")5)")6

&

  • '("")!

*

&

  • '(7)))

Data Mining Lecture 2 6

  • #"",

,(

& ( 8≠ & -"( 9: & %""( ;2 & ( <= & /#(" & -"#("" & #(")""" & #(>

slide-2
SLIDE 2

2

Attribute Type Description Examples Operations

Nominal The values of a nominal attribute are just different names, i.e., nominal attributes provide only enough information to distinguish one object from another. (=, ≠) zip codes, employee ID numbers, eye color, sex: {male, female} mode, entropy, contingency correlation, χ2 test Ordinal The values of an ordinal attribute provide enough information to order

  • bjects. (<, >)

hardness of minerals, {good, better, best}, grades, street numbers median, percentiles, rank correlation, run tests, sign tests Interval For interval attributes, the differences between values are meaningful, i.e., a unit of measurement exists. (+, - ) calendar dates, temperature in Celsius

  • r Fahrenheit

mean, standard deviation, Pearson's correlation, t and F tests Ratio For ratio variables, both differences and ratios are meaningful. (*, /) temperature in Kelvin, monetary quantities, counts, age, mass, length, electrical current geometric mean, harmonic mean, percent variation

Attribute Level Transformation Comments Nominal Any permutation of values If all employee ID numbers were reassigned, would it make any difference? Ordinal An order preserving change of values, i.e., new_value = f(old_value) where f is a monotonic function. An attribute encompassing the notion of good, better best can be represented equally well by the values {1, 2, 3} or by {0.5, 1, 10}. Interval new_value =a * old_value + b where a and b are constants Thus, the Fahrenheit and Celsius temperature scales differ in terms of where their zero value is and the size of a unit (degree). Ratio new_value = a * old_value Length can be measured in meters or feet.

Data Mining Lecture 2 9

!

%#

& ?# & '("))," " &

  • "#*

& /(##" #

!%#

& ?## & '()), & @)#""" #" & !#"2 #

Data Mining Lecture 2 10

"

& ' & &

A

& B"B"B# &

  • ""

& & & & A

Data Mining Lecture 2 11

!"

& ! &

  • &

@""

Data Mining Lecture 2 12

# ") ,'" #

Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes

1
slide-3
SLIDE 3

3

Data Mining Lecture 2 13

$

"#$'" #)"#$# 2"), ""# "#"##') ,,)#$)" )#

1.1 2.2 16.22 6.25 12.65 1.2 2.7 15.22 5.27 10.23 Thickness Load Distance Projection

  • f y load

Projection

  • f x Load

1.1 2.2 16.22 6.25 12.65 1.2 2.7 15.22 5.27 10.23 Thickness Load Distance Projection

  • f y load

Projection

  • f x Load

Data Mining Lecture 2 14

% "#CD)

& 0#4) & # ""*

Document 1 season timeout lost wi n game score ball pla y coach team Document 2 Document 3 3 5 2 6 2 2 7 2 1 3 1 1 2 2 3 Data Mining Lecture 2 15

  • %""),

& "04* & ')"* ""#" ), """," *

TID Items

1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk

Data Mining Lecture 2 16

& '(A"?+

5 2 1 2 5

<a href="papers/papers.html#bbbb"> Data Mining </a> <li> <a href="papers/papers.html#aaaa"> Graph Partitioning </a> <li> <a href="papers/papers.html#aaaa"> Parallel Solution of Sparse Linear System of Equations </a> <li> <a href="papers/papers.html#ffff"> N-Body Computation and Dense Linear System Solvers Data Mining Lecture 2 17

!% .(!E?E

Data Mining Lecture 2 18

An element of the sequence Items/Events

slide-4
SLIDE 4

4

Data Mining Lecture 2 19

A "

GGTTCCGCCTTCAGCCCCGCGCC CGCAGGGCCCGCCCCGCGCCGTC GAGAAGGGCCCGCCTGGCGGGCG GGGGGAGGCGGGGCCGCCCGAGC CCAACCGAGTCCGACCAGGTGCC CCCTCTGCTCGGCCTAGACCTGA GCTCATTAGGCGGCAGCGGACAG GCCAAGTAGAACACGCGAAGCGC TGGGCTGCCTGCTGCGACCAGGG

Data Mining Lecture 2 20

2

Average Monthly Temperature of land and ocean

Data Mining Lecture 2 21

' B+"" #F ?,,"#,"F B,"##F '" #(

& " & & ""

Data Mining Lecture 2 22

( /"

& '("G, +"H,I

  • Two Sine Waves

Two Sine Waves + Noise Data Mining Lecture 2 23

  • "#$,

"#" "#$"

Data Mining Lecture 2 24

  • & "

0**)"",4 & %### 0**)#"4

?"

& -#$ & J & J% & ,#0,"# ##4

slide-5
SLIDE 5

5

Data Mining Lecture 2 25

  • ""#$

")"

& $,"

  • '(

& ,"" & @",""

Data Mining Lecture 2 26

  • %

" # ". %#

Data Mining Lecture 2 27

  • !#,#0#$4

#0#$4 @

& "

"###$

& !

!")))

& H#I"

%"""#

Data Mining Lecture 2 28

  • Standard Deviation of Average

Monthly Precipitation Standard Deviation of Average Yearly Precipitation

Variation of Precipitation in Australia

Data Mining Lecture 2 29

"%

"" *

& "# """*

## "'* ""# "' *

Data Mining Lecture 2 30

"%) + ,(

& ,,+, ") & %' 04 "

slide-6
SLIDE 6

6

Data Mining Lecture 2 31

"%

"

& ##

,

& %")"

,

&

  • #$"

"*

,)#$#+"

  • "

& "K","

  • Data Mining Lecture 2

32

"%"*

  • 8000 points

2000 Points 500 Points Data Mining Lecture 2 33

!%

B" )"#

  • "

""#, ), " ") #

  • Randomly generate 500 points
  • Compute difference between max and min

distance between any pair of points Data Mining Lecture 2 34

%# @(

& %"" & "" "# " & %,"#" & "

  • & @!%

& J & -(""2

Data Mining Lecture 2 35

%# +! A"$ "

x2 x1 e

Data Mining Lecture 2 36

%# +! " ' ",

x2 x1 e

slide-7
SLIDE 7

7

Data Mining Lecture 2 37

,**" -

,**"+ ,# " ,L3)1M*

& 0'4(@##'* & 120'4(@##'*

' & 85'N'"'6 & 0'4###' & ?#

+ **.

Data Mining Lecture 2 38

,**"

Short Height 1 Tall Medium

Crisp Sets

Short Height 1 Tall Medium

Fuzzy Sets

Data Mining Lecture 2 39

!/ ,**

Loan Amount 0-1 Decision Fuzzy Decision Accept Accept Reject Reject Salary Salary

Data Mining Lecture 2 40

0%#

0%#10#2+ "" '"

& # & # & B# & "#+,"#" & (

""#H"I*

+"%%3 $

Data Mining Lecture 2 41

0%#14 2 "%+ , "* ,HI "* ( & 8N""N N"N & # 8N""N NN

Data Mining Lecture 2 42

0#'# !

IR Classification

Tall Classified Tall Not Tall Classified Tall Not Tall Classified Not Tall Tall Classified Not Tall Relevant Retrieved Not Relevant Retrieved Not Relevant Not Retrieved Relevant Not Retrieved 20 10 45 25

slide-8
SLIDE 8

8

Data Mining Lecture 2 43

  • 1-2+ %',

"*

  • ""

"*

  • " -+ #'*
  • 5 -+ ,+,"

,*

  • ","*

+5%%.

Data Mining Lecture 2 44

"

O""* "+ "" ""* 6$+ & ""* & -",* "#*

+ %% .

Data Mining Lecture 2 45

6%

6%+ * #"# * #"""* '(

& 133 & PP & QR3)333 & OQR3)333G* ""F

Data Mining Lecture 2 46

6%6

+#,'"" * " 61"62+ '" ""#," ( B F 04*

Data Mining Lecture 2 47

7886%

7886%+ #"# #"* '(S85'1)T)'6

θ

Data Mining Lecture 2 48

$%%-8 6%1-62

  • #'

##" "* U### "#""##* +"( '*

slide-9
SLIDE 9

9

Data Mining Lecture 2 49

  • 66$%

!(5?)?)?)?)6 %,?" +) +" ( ?,##?3*V(

Data Mining Lecture 2 50

  • 66$%14 2

A+"( >=R83*V

Data Mining Lecture 2 51

6$9$%*162

,"* %

  • #*

"" 0'4 *

Data Mining Lecture 2 52

6$$%*%

Data Mining Lecture 2 53

6$$%*6$%

Data Mining Lecture 2 54

"%%*

  • *+ "#)))")

")*

  • $+
slide-10
SLIDE 10

10

Data Mining Lecture 2 55

"%

Data Mining Lecture 2 56

%

+ @01N'4 + @014 %+ %##"*

Data Mining Lecture 2 57

%6$%

!"04(

& 18) & W8") & X8") & >8"#

%,"# ""( "(@0148E3YK@0W48W3YK @0X4813YK@0>4813Y*

1 2 3 4 Excellent x

1 x 2 x 3 x 4

Good x

5 x 6 x 7 x 8

Bad x

9 x 10 x 11 x 12 Data Mining Lecture 2 58

6$%14 2 (

ID Income Credit Class xi 1 4 Excellent h1 x4 2 3 Good h1 x7 3 2 Excellent h1 x2 4 3 Good h1 x7 5 4 Good h1 x8 6 2 Excellent h1 x2 7 3 Bad h2 x11 8 2 Bad h2 x10 9 3 Bad h3 x11 10 1 Bad h4 x9

Data Mining Lecture 2 59

6$%14 2

!@0'N$4"@0'4 '(@0'ZN148W=EK@0'>N1481=EK@0'WN148W=EK @0'VN1481=EK"@0'N1483'* @"'>( & !@0$N'>4$* & @'>,* & '( @01N'>480@0'>N140@0144=@0'>4 801=E403*E4=3*181* '>1*

Data Mining Lecture 2 60

: ""'##" #"* '* ?3& /K?#"* ?1& %*

slide-11
SLIDE 11

11

Data Mining Lecture 2 61

!" "

  • & #"

& '"#"* '(

& -85R3)PX)EZ)ZV)VZ6 & 8ZR & χW81R*RR"

Data Mining Lecture 2 62

#

@"#"

  • #

'* 83 ;1 '1 ;T; ' " #"

Data Mining Lecture 2 63

! '", ,##* !(

18 218# 38

Data Mining Lecture 2 64

"%

#,,#$* !"( %)"",+ "#$*

Data Mining Lecture 2 65

!%%5 "%

Data Mining Lecture 2 66

  • "#,#$
slide-12
SLIDE 12

12

Data Mining Lecture 2 67

'&%

Data Mining Lecture 2 68

  • 12+

& ,"" #", * & #, " * & "" #* @ K"" ," #*

Data Mining Lecture 2 69

6$%

Data Mining Lecture 2 70

  • % "

(

& & % & %"

!"* @# #0 ,##4*

Data Mining Lecture 2 71

%

Data Mining Lecture 2 72

+ ; %"(

& ""* & *

"(

& * & !#* & ""* & !# & *