Data Mining 2020 Introduction
Ad Feelders
Universiteit Utrecht
Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 54
Data Mining 2020 Introduction Ad Feelders Universiteit Utrecht Ad - - PowerPoint PPT Presentation
Data Mining 2020 Introduction Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 54 The Course Literature: Lecture Notes, Book Chapters, Articles, Slides (the slides appear in the schedule on the course
Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 2 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 54
1 Write your own classification tree and random forest algorithm in R or
2 Text Mining: analyze text documents and make a predictive model
Ad Feelders ( Universiteit Utrecht ) Data Mining 4 / 54
1 Classification trees, bagging and random forests. 2 Undirected graphical models. 3 Frequent pattern mining. 4 Bayesian Networks.
Ad Feelders ( Universiteit Utrecht ) Data Mining 5 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 6 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 7 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 8 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 9 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 10 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 11 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 12 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 13 / 54
1 All horses are mammals 2 All mammals have lungs 3 Therefore, all horses have lungs
1 All horses observed so far have lungs 2 Therefore, all horses have lungs Ad Feelders ( Universiteit Utrecht ) Data Mining 14 / 54
1 4% of the products we tested are defective 2 Therefore, 4% of all products (tested or otherwise)
Ad Feelders ( Universiteit Utrecht ) Data Mining 15 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 16 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 17 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 18 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 19 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 20 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 21 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 22 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 23 / 54
1 data editing: what to do when records contain impossible
2 incomplete data: what to do with missing values? Ad Feelders ( Universiteit Utrecht ) Data Mining 24 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 25 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 26 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 27 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 28 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 29 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 30 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 31 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 32 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 33 / 54
Respondent Demographics Tv Program Viewing Telecast Program Respondent ID PK Household ID Number of Children ID PK Respondent ID Telecast ID Offset Seconds ID PK Program ID ID PK Program Name Channel Name Date Duration Seconds Date Duration Seconds m 1 1 m 1 m 1
Ad Feelders ( Universiteit Utrecht ) Data Mining 34 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 35 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 36 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 37 / 54
No condition [49.7%,50.3%] 100,000 age in[19,24] [54.6%,56.2%] 14,249 gender = m age in [19,24] [52.8%,53.7%] 53,179 [60.2%,62.3%] 8,130 age in[19,24] carprice in [59000,79995] [55.9%,59.6%] 2,831 [61.2,67.4] 1,134 category = lease gender = m age in[19,24] [50.7%,52.0%] 20,315 [53.5%,55.4%] 10,778 [59.4%,64.1%] 1,651 Ad Feelders ( Universiteit Utrecht ) Data Mining 38 / 54
age ninsclas income race death cat1 meanbp1 swang1 ca gender
death is independent of the remaining variables given age, cat1, and ca. Ad Feelders ( Universiteit Utrecht ) Data Mining 39 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 40 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 41 / 54
1 A representation language: what models are we looking for? 2 A quality function: when do we consider a model to be good? 3 A search algorithm: how de we go about finding good models? Ad Feelders ( Universiteit Utrecht ) Data Mining 42 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 43 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 44 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 45 / 54
4 6 8 10 10 20 30 40 Ad Feelders ( Universiteit Utrecht ) Data Mining 46 / 54
4 6 8 10 10 20 30 40 Ad Feelders ( Universiteit Utrecht ) Data Mining 47 / 54
4 6 8 10 10 20 30 40 50 Ad Feelders ( Universiteit Utrecht ) Data Mining 48 / 54
4 6 8 10 10 20 30 40 50 Ad Feelders ( Universiteit Utrecht ) Data Mining 49 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 50 / 54
1 Start with some initial model, e.g. y = a, and compute its quality. 2 Neighbours: add or remove a predictor. 3 If all neighbours have lower quality, then stop and return the current
Ad Feelders ( Universiteit Utrecht ) Data Mining 51 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 52 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 53 / 54
Ad Feelders ( Universiteit Utrecht ) Data Mining 54 / 54