Modeling Data the different views on Data Mining Views on Data - - PowerPoint PPT Presentation

modeling data
SMART_READER_LITE
LIVE PREVIEW

Modeling Data the different views on Data Mining Views on Data - - PowerPoint PPT Presentation

Modeling Data the different views on Data Mining Views on Data Mining Fitting the data Density Estimation Learning being able to perform a task more accurately than before Prediction use the data to predict future data


slide-1
SLIDE 1

Modeling Data

the different views on Data Mining

slide-2
SLIDE 2

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-3
SLIDE 3

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-4
SLIDE 4

Data fitting

 Very old concept  Capture function between variables  Often

 few variables  simple models

 Functions

 step-functions  linear  quadratic

 Trade-off between complexity of model and fit

(generalization)

slide-5
SLIDE 5

response to new drug body weight

slide-6
SLIDE 6

response to new drug body weight

slide-7
SLIDE 7

money spent income

slide-8
SLIDE 8

money spent income

¾ ratio

slide-9
SLIDE 9

Kleiber’s Law of Metabolic Rate

slide-10
SLIDE 10

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-11
SLIDE 11

Density Estimation

 Dataset describes a sample from a distribution  Describe distribution is simple terms

prototypes

slide-12
SLIDE 12

Density Estimation

 Other methods also take into account the spatial

relationships between prototypes

 Self-Organizing Map (SOM)

slide-13
SLIDE 13

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-14
SLIDE 14

Learning

 Perform a task more accurately than before  Learn to perform a task (at all)  Suggests an interaction between model and domain

 perform some action in domain  observe performance  update model to reflect desirability of action

 Often includes some form of experimentation  Not so common in Data Mining

 often static data (warehouse), observational data

slide-15
SLIDE 15

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-16
SLIDE 16

Prediction: learning a decision boundary

+ + + + + + + + +

slide-17
SLIDE 17

Prediction: learning a decision boundary

+ + + + + + + + +

slide-18
SLIDE 18

Prediction: learning a decision boundary

+ + + + + + + + +

slide-19
SLIDE 19

Prediction: learning a decision boundary

+ + + + + + + + +

slide-20
SLIDE 20

Prediction: learning a decision boundary

+ + + + + + + + +

slide-21
SLIDE 21

Views on Data Mining

 Fitting the data  Density Estimation  Learning

 being able to perform a task more accurately than before

 Prediction

 use the data to predict future data

 Compressing the data

 capture the essence of the data  discard the noise and details

slide-22
SLIDE 22

Compression

 Compression is possible when data contains structure

(repeting patterns)

 Compression algorithms will discover structure and replace

that by short code

 Code table forms interesting set of patterns

A B C D E F 1 1 1 1 1 1 1 1 1 1 1 1 … 1 … 1 … 1 … … 1 …

slide-23
SLIDE 23

Compression

 Compression is possible when data contains structure

(repeting patterns)

 Compression algorithms will discover structure and replace

that by short code

 Code table forms interesting set of patterns

A B C D E F 1 1 1 1 1 1 1 1 1 1 1 1 … 1 … 1 … 1 … … 1 …

slide-24
SLIDE 24

Compression

 Compression is possible when data contains structure

(repeting patterns)

 Compression algorithms will discover structure and replace

that by short code

 Code table forms interesting set of patterns

A B C D E F 1 1 1 1 1 1 1 1 1 1 1 1 … 1 … 1 … 1 … … 1 …

  • Pattern ACD appears

frequently

  • ACD helps to compress the

data

  • ACD is a relevant pattern to

report

slide-25
SLIDE 25

Compression

Paul Vitanyi (CWI, Amsterdam)

 Software to unzip identity of unknown composers

 Beethoven, Miles Davis, Jimmy Hendrix

 SARS virus similarity  internet worms, viruses  intruder attack traffic  images, video, …

slide-26
SLIDE 26

Mobile calls: modeling duration of calls

slide-27
SLIDE 27

More data: linear model

slide-28
SLIDE 28

Even more data: still linear?

slide-29
SLIDE 29

Hmmm