Itera rati tive Dat ata a Min inin ing Jill illes V s - PowerPoint PPT Presentation

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA TADA)

Ser ervic ice Ann e Announ uncemen ent #1 Evaluation Forms Hand forms out (me) 1. Fill forms out (you) 2. Collect forms (you) 3. Put forms in envelop (you) 4. Bring envelop back to Evelyn (one ‘volunteer’ and me) 5.

Ser ervic ice Ann e Announ uncemen ent #2 The Exam type: oral when: September 11 th time: individual where: E1.3 room 0.16 what: all material discussed in the lectures, plus one assignment (your choice) per topic The Re-Exam type: oral when: October 1 st time: individual where: E1.3 room 001

Ser ervic ice Ann e Announ uncemen ent #3 Master thesis projects  in principle: yes!  in practice: depending background, motivation, interests, and grades --- plus, on whether I have time  interested? mail me and/or Pauli Student Research Assistant (HiWi) positions  in principle: maybe!  in practice: depends on background, grades, and in particular your motivation and interests  interested? mail me and/or Pauli, include CV and grades

Ser ervic ice Ann e Announ uncemen ent #4 Tensors Introduction - Introduction to tensors - Is DM science? - Tensors in DM - DM in action - Special topics in tensors Information Theory Mixed Grill - MDL + patterns - Influence Propagation - Entropy + correlation - Redescription Mining - MaxEnt + iterative DM - <special request>

Ser ervic ice Ann e Announ uncemen ent #4 Tensors Introduction <special request>? - Introduction to tensors - Is DM science? - Tensors in DM - DM in action - Special topics in tensors Let us know (asap, mail) what topic you would Information Theory Mixed Grill like to see discussed - MDL + patterns - Influence Propagation - Entropy + correlation - Redescription Mining - MaxEnt + iterative DM - <special request>

Ser ervic ice Ann e Announ uncemen ent #5 Introduction Tensors Information Theory Mixed Grill Wrap-up + < ask-us-anything>

Ser ervic ice Ann e Announ uncemen ent #5 <ask-us-anything>? Introduction Yes! Prepare questions on Tensors anything* you’ve always wanted to ask Pauli and/or me. Information Theory We’ll answer on the spot Mixed Grill * preferably related to TADA, data mining, machine learning, science, the world, etc. Wrap-up + < ask-us-anything>

Go Good R d Rea eads ds Data Analysis: a Bayesian Tutorial Elements of Information Theory The Information D.S. Sivia & J. Skilling Thomas Cover & Joy Thomas James Gleick (very good, but skip the MaxEnt stuff) (very good textbook) (great light reading)

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA TADA)

Qu Question o of f th the da day How can we find things that are interesting with regard to what we already know ? How can we measure subjective interestingness ?

Wha hat is is int inter eres estin ing? something that increases our knowledge about the data

Wha hat is is a go good r d result esult? something that reduces our uncertainty about the data (ie. increases the likelihood of the data)

Wha hat is is rea eally lly g good? d? something that, in simple terms, strongly reduces our uncertainty about the data (maximise likelihood, but avoid overfitting)

Let et’s m s make e this v is visua isual universe  of possible datasets our dataset D

dimensions, margins Giv Given en wh what we we kno now possible datasets, given current knowledge all possible our dataset D datasets

dimensions, margins, Mo More k e kno nowled wledge. ge... pattern P 1 all possible our dataset D datasets

dimensions, margins, Fewe ewer p possib ssibilit ilities es... patterns P 1 and P 2 all possible our dataset D datasets

dimensions, margins, Less u ess unc ncer ertain inty. the key structure all possible our dataset D datasets

dimensions, margins, Ma Maxim ximis isin ing c cer ertain inty patterns P 1 and P 2 knowledge added by P 2 all possible our dataset D datasets

Ho How c w can n we we def define ine ‘uncertainty’ and ‘simplicity’? interpretability and informativeness are intrinsically subjective

Mea Measu surin ing U g Uncer ertain inty We need access to the likelihood of data D given background knowledge B such that we can calculate the gain for X …which distribution should we use?

Mea Measu surin ing S g Sur urpris ise We need access to the likelihood of result X given background knowledge B such that we can mine the data for X that have a low likelihood, that are surprising …which distribution should we use?

Measu Mea surin ing S g Sur urpris ise We need access to the likelihood of result X given background knowledge B This is called the p-value of result X such that we can mine the data for X that have a low likelihood, that are surprising …which distribution should we use?

Approach 1: Rando ndomiz izatio ion Mine original data 1. Mine random data 2. Determine probability 3. Random Random Random Original ... data #1 data #2 data #N data score ( X | D )

Approach 1: Rando ndomiz izatio ion Mine original data 1. Mine random data 2. Determine probability The fraction of better ‘randoms’ is the 3. empirical p-value of result X Random Random Random Original ... data #1 data #2 data #N data score ( X | D )

Rando ndom Da Data So, we need data that  maintains our background knowledge, and  is otherwise completely random How can we get our hands on that?

Swa wap R Rando ndomiz izatio ion Let there be data 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion Say we only know overall density. How to sample random data? 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 27 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion Didactically, let us instead consider a Monte-Carlo Markov Chain 1 1 1 0 1 1 1 Very simple scheme 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1. select two cells at random, 1 1 1 1 0 0 1 2. swap values, 3. repeat until convergence . 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 27 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion Margins are easy understandable for binary data, how can we sample data with same margins? 1 1 1 0 1 1 1 6 0 1 1 0 1 0 1 4 1 1 1 1 0 0 0 4 1 1 1 1 0 0 1 5 0 1 1 1 0 0 0 3 0 1 1 1 0 1 0 4 0 0 0 0 1 0 0 1 3 6 6 4 3 2 3 27 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion By MCMC! 1 1 1 0 1 1 1 6 1. randomly find submatrix 0 1 1 0 1 0 1 4 1 1 1 1 0 0 0 4 1 1 1 1 0 0 1 5 0 1 1 1 0 0 0 3 0 1 1 1 0 1 0 4 0 0 0 0 1 0 0 1 3 6 6 4 3 2 3 27 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion By MCMC! 1 1 1 0 1 1 1 6 1. randomly find submatrix 0 1 1 0 1 0 1 4 1 1 1 1 0 0 0 4 1 1 1 1 0 0 1 5 0 1 1 1 0 0 0 3 0 1 1 1 0 1 0 4 0 0 0 0 1 0 0 1 2. swap values 3 6 6 4 3 2 3 27 (swap randomization, Gionis et al. 2005)

Swa wap R Rando ndomiz izatio ion By MCMC! 1 1 1 1 1 1 0 0 1 1 1 1 1 1 6 6 1. randomly find submatrix 1 0 0 1 1 1 1 1 0 0 1 0 0 1 4 4 1 1 1 1 1 1 0 1 0 1 0 0 0 0 4 4 1 1 1 1 1 1 1 1 1 0 0 0 0 1 5 5 0 0 1 1 1 1 1 0 0 0 0 0 1 0 3 3 0 0 1 1 1 1 1 1 0 0 1 1 0 0 4 4 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 2. swap values 3 3 6 6 6 6 4 4 3 3 2 2 3 3 27 27 3. repeat until convergence (swap randomization, Gionis et al. 2005)

Static ic Mo Models dels Many ways to test static null hypothesis assuming distribution, swap-randomization, MaxEnt What can we use this for? ranking based on static significance mining the top-k most significant patterns, but not suited for iterative mining

Dynamic Mo Dy Models dels For iterative data mining, we need models that can maintain the type of information (eg. patterns) that we mine Randomization is powerful  variations exists for many data types (Ojala ‘09, Henelius et al ’13)  can be pushed beyond margins (see Hanhijärvi et al 2009)  but… has key disadvantages

Itera rati tive Dat ata a Min inin ing Jill illes V s - PowerPoint PPT Presentation

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA TADA) Ser ervic ice Ann e Announ uncemen ent #1 Evaluation Forms Hand forms out (me) 1. Fill forms out (you) 2. Collect forms (you) 3.

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Overview What is iteraon? Racket has no loops, and yet can express iteraon. Iteration

Overview What is iteraon? Racket has no loops, and yet can express iteraon. Iteration

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

MDL L for or Pat atte tern Min inin ing Jill illes V s Vreeken 4 4 June une 2014 2014

Construcng Tree Decomposions Using Itera*ve Compression

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Im Immig igra rati tion on Dat Data a in in Can Canada ada A story of Statistics Canada

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

Melb lbourne Min inin ing Clu lub, 30 30 May 20 2019 19 Photo credit: ABC

De Deal aling ing wit ith h mi missing ssing dat ata a in in pr pract actice: ice:

WE WELCOME ME REFEREE T TRAIN ININ ING SPRIN ING 2 2019 Responsibilities EAST BAY FLAG

ing a Robust PMO ing & Sustain inin A Complimentary Webinar From healthsystemCIO.com

ASTRO News Brie iefin ing: Refin inin ing Treatment Decis isions Monday, September 26, 8-9am

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

NRC Sociology Rankings Andrew J. Perrin November 3, 2010 Andrew J. Perrin () NRC Sociology

Data Centre Acceleration Monica Qin Li Aaron Chelvan Sijun Zhu Background 2019: Data

FPGAs 1 To read more This days papers: Brown and Rose, Architecture of FPGAs and

Building OSGi Components Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 1 About

Prototyping CS 4730 Computer Game Design Credit:

SunyoungKim,PhD Last class 1. Brainstorming 2. Sketch 3. Scenario 4. Storyboard Recap:

Rapid Prototyping of Eclipse RCP Applications So, you want to create a quick RCP prototype for a

Justifying a 3D Printer Investment For Rapid Prototyping 1 TODAYS TOPIC The Challenge

Sambuz

Useful Links

Newsletter

Mail Us

Itera rati tive Dat ata a Min inin ing Jill illes V s - PowerPoint PPT Presentation

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA TADA) Ser ervic ice Ann e Announ uncemen ent #1 Evaluation Forms Hand forms out (me) 1. Fill forms out (you) 2. Collect forms (you) 3.

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Overview What is itera*on? Racket has no loops, and yet can express itera*on. Iteration

Overview What is itera*on? Racket has no loops, and yet can express itera*on. Iteration

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

MDL L for or Pat atte tern Min inin ing Jill illes V s Vreeken 4 4 June une 2014 2014

Construc*ng Tree Decomposi*ons Using Itera*ve Compression

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Im Immig igra rati tion on Dat Data a in in Can Canada ada A story of Statistics Canada

Bitly Link &amp; DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

Melb lbourne Min inin ing Clu lub, 30 30 May 20 2019 19 Photo credit: ABC

De Deal aling ing wit ith h mi missing ssing dat ata a in in pr pract actice: ice:

WE WELCOME ME REFEREE T TRAIN ININ ING SPRIN ING 2 2019 Responsibilities EAST BAY FLAG

ing a Robust PMO ing &amp; Sustain inin A Complimentary Webinar From healthsystemCIO.com

ASTRO News Brie iefin ing: Refin inin ing Treatment Decis isions Monday, September 26, 8-9am

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

NRC Sociology Rankings Andrew J. Perrin November 3, 2010 Andrew J. Perrin () NRC Sociology

Data Centre Acceleration Monica Qin Li Aaron Chelvan Sijun Zhu Background 2019: Data

FPGAs 1 To read more This days papers: Brown and Rose, Architecture of FPGAs and

Building OSGi Components Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 1 About

Prototyping CS 4730 Computer Game Design Credit:

SunyoungKim,PhD Last class 1. Brainstorming 2. Sketch 3. Scenario 4. Storyboard Recap:

Rapid Prototyping of Eclipse RCP Applications So, you want to create a quick RCP prototype for a

Justifying a 3D Printer Investment For Rapid Prototyping 1 TODAYS TOPIC The Challenge

Sambuz

Useful Links

Newsletter

Mail Us

Overview What is iteraon? Racket has no loops, and yet can express iteraon. Iteration

Overview What is iteraon? Racket has no loops, and yet can express iteraon. Iteration

Construcng Tree Decomposions Using Itera*ve Compression

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

ing a Robust PMO ing & Sustain inin A Complimentary Webinar From healthsystemCIO.com