information retrieval data mining
play

Information Retrieval & Data Mining Tuesday 1416 & Thursday - PowerPoint PPT Presentation

Nine Credit-Point Core Lecture on Information Retrieval & Data Mining Tuesday 1416 & Thursday 1618 @ HS003 (E1.3) Martin Theobald Pauli Miettinen Data Mining is About finding new and interesting information from data -


  1. Nine Credit-Point Core Lecture on Information Retrieval & Data Mining Tuesday 14–16 & Thursday 16–18 @ HS003 (E1.3) Martin Theobald Pauli Miettinen Data Mining is… • About finding new and interesting information from data - Association rules - Clusterings - Latent models - Classifiers IR&DM, WS'11/12 18 October 2011

  2. Data Mining — motivation What to do with the information you’ve retrieved? The ”PHT” Pirate wanted all information of the world. But before he realized most of it was useless, he was already buried under it. —Stanis ł aw Lem, The Cyberiad IR&DM, WS'11/12 18 October 2011 2

  3. Data Mining — definition Data mining is the process of extracting hidden patterns from data. — Wikipedia Data mining, in a broad sense, is the set of techniques for analyzing and understanding data. —Zaki & Meira: Fundamentals of Data Mining Algorithms Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. —Hand, Mannila & Smyth: Principles of Data Mining IR&DM, WS'11/12 18 October 2011 3

  4. Data Mining — definition Data mining, in a broad sense, is the set of techniques for analyzing and understanding data . —Zaki & Meira: Fundamentals of Data Mining Algorithms IR&DM, WS'11/12 18 October 2011 4

  5. Data Mining Applications IR&DM, WS'11/12 18 October 2011 5

  6. Data Mining Applications • Business intelligence IR&DM, WS'11/12 18 October 2011 5

  7. Data Mining Applications • Business intelligence – What customers buy together? IR&DM, WS'11/12 18 October 2011 5

  8. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? IR&DM, WS'11/12 18 October 2011 5

  9. Data Mining Applications – How to make more money? $? • Business intelligence – What customers buy together? – What are the seasonal trends? IR&DM, WS'11/12 18 October 2011 5

  10. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis IR&DM, WS'11/12 18 October 2011 5

  11. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? IR&DM, WS'11/12 18 October 2011 5

  12. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? IR&DM, WS'11/12 18 October 2011 5

  13. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? IR&DM, WS'11/12 18 October 2011 5

  14. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… IR&DM, WS'11/12 18 October 2011 5

  15. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… – Who Barack Obama should persuade to vote him? IR&DM, WS'11/12 18 October 2011 5

  16. Data Mining Applications • Business intelligence – What customers buy together? – What are the seasonal trends? – How to make more money? • Scientific data analysis – What genes cause diseases? – What species co-inhabit areas? – What happens if average temperature raises? • And anything else where you have data… – Who Barack Obama should persuade to vote him? – Is there a problem in International Space Station? IR&DM, WS'11/12 18 October 2011 5

  17. What do You need to do Data Mining • Data • Domain knowledge • Data mining techniques IR&DM, WS'11/12 18 October 2011 6

  18. What do You need to do Data Mining • Data • Domain knowledge • Data mining techniques This course IR&DM, WS'11/12 18 October 2011 6

  19. The Techniques • Frequent itemset mining & association rules • Clustering • Dimensionality reduction • Matrix factorization & latent factor models • Classifiers IR&DM, WS'11/12 18 October 2011 7

  20. Frequent itemset mining demo IR&DM, WS'11/12 18 October 2011 8

  21. Clustering for Medical Data • Temperament data – Individuals are assigned values on different scales • Fear of uncertainity, shyness, impulsiveness, etc. – Data is clustered (people with similar value combinations go to same cluster) – Results: • 4 clusters are enough • strong association between temperament and socio-economic status and education • males and females cluster similarly, even if clustered independently Wessman: Clustering methods in the Analysis of Complex Diseases, manuscirpt IR&DM, WS'11/12 18 October 2011 9

  22. Clustering for Medical Data Stable, persistent, not very impulsive High socio-economical status and education IR&DM, WS'11/12 18 October 2011 10

  23. Clustering for Medical Data Outgoing, impulsive, energetic High socio-economical status and education IR&DM, WS'11/12 18 October 2011 10

  24. Clustering for Medical Data No extreme scales High hypomania and psychosis proneness IR&DM, WS'11/12 18 October 2011 10

  25. Clustering for Medical Data Shy, pessimistic, prefer routines and privacy Low socio-economic status, high levels of depression and schizophrenia IR&DM, WS'11/12 18 October 2011 10

  26. Ecological Niche Modeling • Goal: Describe the area species inhabit using bio- ecological variables – Temperature, rainfall, etc. • Application: Forecast what happens to species if bio- ecological environment changes – Consequences of global warming • Data Mining Problem: classification – Classify the areas inhabited by species using the bio- ecological variables IR&DM, WS'11/12 18 October 2011 11

  27. Ecological Niche Modeling • Either - February’s max temperature is between -9.8°C and 0.4°C - July’s max temperature is between 12.2°C and 24.6°C - August’s average rainfall is European Elk between 56.85 mm and 136.46 mm • Or - September’s average rainfall is between 183.27 mm and 238.78 mm Galbrun & Miettinen: From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World. SDM ’11 IR&DM, WS'11/12 18 October 2011 12

  28. Ecological Niche Modeling IR&DM, WS'11/12 18 October 2011 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend