modeling data
play

Modeling Data the different views on Data Mining Views on Data - PowerPoint PPT Presentation

Modeling Data the different views on Data Mining Views on Data Mining Fitting the data Density Estimation Learning being able to perform a task more accurately than before Prediction use the data to predict future data


  1. Modeling Data the different views on Data Mining

  2. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  3. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  4. Data fitting  Very old concept  Capture function between variables  Often  few variables  simple models  Functions  step-functions  linear  quadratic  Trade-off between complexity of model and fit (generalization)

  5. response to new drug body weight

  6. response to new drug body weight

  7. money spent income

  8. money ¾ ratio spent income

  9. Kleiber’s Law of Metabolic Rate

  10. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  11. Density Estimation  Dataset describes a sample from a distribution  Describe distribution is simple terms prototypes

  12. Density Estimation  Other methods also take into account the spatial relationships between prototypes  Self-Organizing Map (SOM)

  13. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  14. Learning  Perform a task more accurately than before  Learn to perform a task (at all)  Suggests an interaction between model and domain  perform some action in domain  observe performance  update model to reflect desirability of action  Often includes some form of experimentation  Not so common in Data Mining  often static data (warehouse), observational data

  15. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  16. Prediction: learning a decision boundary - - - - - - - - - - - - + - + - + + + + + + +

  17. Prediction: learning a decision boundary - - - - - - - - - - - - + - + - + + + + + + +

  18. Prediction: learning a decision boundary - - - - - - - - - - - - + - + - + + + + + + +

  19. Prediction: learning a decision boundary - - - - - - - - - - - - + - + - + + + + + + +

  20. Prediction: learning a decision boundary - - - - - - - - - - - - + - + - + + + + + + +

  21. Views on Data Mining  Fitting the data  Density Estimation  Learning  being able to perform a task more accurately than before  Prediction  use the data to predict future data  Compressing the data  capture the essence of the data  discard the noise and details

  22. Compression  Compression is possible when data contains structure (repeting patterns)  Compression algorithms will discover structure and replace that by short code  Code table forms interesting set of patterns A B C D E F 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 0 1 … … … … … …

  23. Compression  Compression is possible when data contains structure (repeting patterns)  Compression algorithms will discover structure and replace that by short code  Code table forms interesting set of patterns A B C D E F 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 0 1 … … … … … …

  24. Compression  Compression is possible when data contains structure (repeting patterns)  Compression algorithms will discover structure and replace that by short code  Code table forms interesting set of patterns A B C D E F •Pattern ACD appears frequently 1 0 1 1 0 0 1 1 1 1 1 0 •ACD helps to compress the data 0 1 0 1 1 0 1 1 1 1 0 1 •ACD is a relevant pattern to report … … … … … …

  25. Compression Paul Vitanyi (CWI, Amsterdam)  Software to unzip identity of unknown composers  Beethoven, Miles Davis, Jimmy Hendrix  SARS virus similarity  internet worms, viruses  intruder attack traffic  images, video, …

  26. Mobile calls: modeling duration of calls

  27. More data: linear model

  28. Even more data: still linear?

  29. Hmmm

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend