Data Mining and Exploration
Michael Gutmann
michael.gutmann@ed.ac.uk http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh
19th January 2017
Michael Gutmann DME 1 / 14
Data Mining and Exploration Michael Gutmann - - PowerPoint PPT Presentation
Data Mining and Exploration Michael Gutmann michael.gutmann@ed.ac.uk http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 19th January 2017 Michael Gutmann DME 1
Michael Gutmann DME 1 / 14
◮ From Latin: dare, to give; datum: something given ◮ A piece of information
Michael Gutmann DME 2 / 14
by Frederic Dorr Steele
Sherlock Holmes
Michael Gutmann DME 4 / 14
Source: https://home.cern/about/computing Michael Gutmann DME 5 / 14
Source: https://www.gwava.com/blog/internet-data-created-daily Michael Gutmann DME 6 / 14
Source: https://www.domo.com/blog/data-never-sleeps-4-0
Michael Gutmann DME 7 / 14
Sources: From Machine-To-Machine to the Internet of Things, Ch 2, 2014; aviationweek.com/connected-aerospace/internet-aircraft-things-industry-set-be-transformed
Michael Gutmann DME 8 / 14
Michael Gutmann DME 9 / 14
Michael Gutmann DME 10 / 14
Given a data generating process, what are the properties of the outcomes (the data)? Given the outcomes (the data), what can we say about the process that generated them?
(data source)
Based on Figure 1 of All of statistics by Larry Wasserman
Michael Gutmann DME 11 / 14
Given a data generating process, what are the properties of the outcomes (the data)? Given the outcomes (the data), what can we say about the process that generated them?
(data source)
Michael Gutmann DME 12 / 14
Get (raw) data Deploy the product / Communicate findings Sanity checks
sampling process
between data analysis and collection?
Exploratory data analysis
selection/exclusion
Prep data for further analysis Build and fit model
Summarise, vis- ualise results
coherent story?
Objectives and key results Michael Gutmann DME 13 / 14
Get (raw) data Lectures 1-3 Lecture 4 Lecture 5 Mini-project Presentations Deploy the product / Communicate findings Sanity checks
sampling process
between data analysis and collection?
Exploratory data analysis
selection/exclusion
Prep data for further analysis Build and fit model
Summarise, vis- ualise results
coherent story?
Objectives and key results Michael Gutmann DME 14 / 14