Wrapping It Up Jilles Vreeken 31 July 2015 What did we do? - PowerPoint PPT Presentation

Wrapping It Up Jilles Vreeken 31 July 2015

What did we do? Introduction Patterns Correlation and Causation (Subjective) Interestingness Graphs Wrap-up + < ask-me-anything>

T ake Home: ove overa rall Overview of the hot topics in data mining that Jilles thinks are cool strongly biased sample – by interest and available time I wanted to give a general picture of what data mining is, what makes it special, and what’s currently happening at the edge of human knowledge

Key T ake-Home Message Data mining is descriptive not predictive the goal is to give you insight into your data, to offer (parts of) candidate hypotheses, what you do with those is up to you.

T ake Home: In Informatio ion Th Theo eory Exploratory data analysis wandering around your data, looking for interesting things, without being asked questions you cannot know the answer of. Questions like: What distribution should we assume? How many clusters/factors/patterns do you want? Please parameterize this Bayesian network?

T ake Home: Patte tterns rns Pattern mining aims to provide a simple descriptions of the structures that your data exhibits locally. Mining patterns is easy . Mining interesting patterns that are significant and doing so without redundancy , not so much.

T ake Home: In Informatio ion Th Theo eory Information Theory is a branch of statistics, concerned with measuring information information = reduction of uncertainty Uncertainty can be quantified in bits Everything new you learn about your data allows you to compress it better

T ake Home: Correlat ations Correlations can be spurious and deceiving . Mutual information is a strong notion of interaction . Based on Shannon entropy MI is hard to compute for continuous-valued data without making assumptions on the distribution. Based on cumulative entropy MI can detect non-linear correlations without requiring assumptions.

T ake Home: Cau Causation Causality is a difficult concept. Standard probabilistic approaches based on likelihood cannot detect causal direction between pairs . Additive noise models and information theoretic measures can . Oh, and storks cause babies.

T ake Home: Interest stingne ness ss Interestingness is ultimately subjective Still, to have algorithms that can find potentially interesting things we somehow need to formalize it

T ake Home: Gr Graph ph Min Mining ing Most graph mining approaches are global and predictive ‘Explain everything in one go’ real graphs are too complex for that Taking a local and descriptive approach allows for more detailed results, richer problems, easier formalization, efficient solutions very little done so far, many cool open problems

T ake Home: Dy Dynamic Da Data Data is rarely static even though many algorithms expect that Streaming algorithms work when data is too big to fit anywhere while dynamic algorithms aim to adjust the answer with the changing data

T ake Home: Assign Assignmen ents “What the hell was he thinking??” I wanted you to learn to read scientific papers without getting lost in details quickly forming high level pictures of complex ideas read critically , seeing through scientific sales-pitches show independent thinking , make ideas your own I was not disappointed.

T ake Home: TA TADA Data analysis is important, upcoming , but still very young aims to tackle impossible problems , such as finding interesting things in enormous search spaces is a weird mix of theory and practice: likes to be foundational , yet not afraid of ad hoc and, not unimportant, it’s lots of fun.

Exam d date tes The Exam type: oral when: August 3 rd and 4 th time: individual where: E1.7 room 3.01 what: all material discussed in the lectures, plus one assignment (your choice) per topic The Re-Exam type: oral when: September 28 th time: individual where: E1.7 room 3.01

Evaluation: I did I did n not lik like “Class should end in time :)” “The amount of time necessary for every assignment. ”

Evaluation: Sugge gest stions ns “More motivated slide” More details on the why? “Bit heavy course for 5 ECTS“ Yes. “More practical follow-up to implement/text ideas” Maybe… “Discuss assignments the day it is brought online” We can do that.

Things to do Master thesis projects  in principle: yes!  in practice: depending background, motivation, interests, and grades --- plus, on whether I have time  interested? mail me Student Research Assistant (HiWi) positions  in principle: maybe…  in practice: depends on background, grades, and in particular your motivation and interests  interested? mail me, include CV and grades

Sample T opics Causality Graphs - did X cause Y? - characterising viruses - mining causal graphs - realistic graph generators - what’s the cause of this ? - interesting subgraphs - predicting the future - comparing graphs Useful Patterns Rich Data & Text - t ell me… about this - pattern-based topic models - privacy & data generation - grammar & compression - pattern-based indexing - rich MaxEnt modelling - noise reduction - outliers in rich data

Good Reads Data Analysis: a Bayesian Tutorial Elements of Information Theory The Information D.S. Sivia & J. Skilling Thomas Cover & Joy Thomas James Gleick (very good, but skip the MaxEnt stuff) (very good textbook) (great light reading)

T each us More! Well, ok… let me advertise Information Retrieval and Data Mining together with Gerhard Weikum Core Lecture 9 ECTS In addition, Hoang Vu and Mario will likely teach one or two courses next semester Options include: Causal Inference (seminar+lectures) Mining High Dimensional Data (seminar+lectures) Mining (Correlated) Patterns (seminar+lectures)

Quest uestio ion Tim Time! e!

Privacy & Data Mining “What is your opinion on privacy preserving data mining? Have you ever worked with it? Do you think it is useful, or does it somehow contradicts 'the spirit' of data mining?”

T ext Mining “Have you ever worked with text mining? Do you think considering grammar is necessary, or is mere statistics enough?”

Big Data “Does Big Data exist?” “How big is Big Data?” “When is the data Big enough? Is more data always better?”

Mining Massive Data Map Reduce, Hadoop, Big Table, Cassandra, Spark, Dremel, etc, etc engineering or science ? Essentially tricks – not magic – that work well for certain specific problems For KDD 2014, at least 25 out of 150 presentations will be specifically aimed at ‘large scale’ stuff

Mining the Cloud “How about data analytics in the cloud?”

Social Network Analysis Many, many, many papers about social network analysis So far: lots of statistics, not much ‘mining’ That is, most are about how to model a graph probabilistically , how to fit a given distribution . The Elephant in the Room: what is the ‘graph’ distribution? Nobody knows. Yet.

Yo Your Quest uestio ion Here!

Conclusi sions This concludes TADA’15. I hope you enjoyed the ride.

Thank you! This concludes TADA’15. I hope you enjoyed the ride.

Wrapping It Up Jilles Vreeken 31 July 2015 What did we do? - PowerPoint PPT Presentation

Wrapping It Up Jilles Vreeken 31 July 2015 What did we do? Introduction Patterns Correlation and Causation (Subjective) Interestingness Graphs Wrap-up + < ask-me-anything> T ake Home: ove overa rall Overview of the hot topics in

Wrapping LAPPS Services Wrapping a Service Preliminaries: Java,

20 years of excellence beyond wrapping. Safe Bag products SMART TRACK PREMIUM BASIC CARD -

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Key Wrapping with the Keccak Permutation Dmitry Khovratovich University of Luxembourg 17 January

can be better than stretch wrap 1) Less wrapping material: A single stretch hood bag replaces a

YOUR PARTNER FOR SHRINK WRAPPING APPLICATIONS 26/01/2016 AUTOPACK General Presentation 1

Contributing to the Lapps Grid Lapps Service Wrapping Lapps Grid Group May 26, 2014 Outline

Swing to SWT and Back: Patterns for API Migration by Wrapping Thiago Tonelli Bartolomei Ralf L

15-112 Fundamentals of Programming Week 1 - Lecture 5: Wrapping up 1st week + Intro to strings.

A Flexible Learning System for Wrapping Tables and Lists or How to Write a Really

Wrapping up CP course, lecture 11 Constraint programming is a problem-solving technique for

CSc 337 LECTURE 28: SESSIONS AND WRAPPING UP What is a session? session : an abstract concept

CSE 370 Lecture 3 Wrapping up 2s Complement. Starting Boolean Algebra. Lecture 2 Recap:

Memoization & Greedy: Wrapping up programmatical approaches with thanks to Kosie Van Der

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

Towards a Semantic of XML Signature - How to Protect Against XML Wrapping Attacks Sebastian Gajek,

Subgroup Discovery Exploratory Data Analysis Exploratory Data Analysis Classification:

Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard Department of Econometrics and

General considerations Forecasting is about the future! Lead times within 0-48 hours, in line with

M u ltiple e x planator y v ariables IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R Dann y

Jupyter in HPC 1 Matthias Bussonnier A Physicist/Bio-Physicist About Me Core developer

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

CS133 Computational Geometry Instructor: Ahmed Eldawy 4/3/2018 Welcome back to UCR! 4/3/2018

Slides for Lecture 17 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Sambuz

Useful Links

Newsletter

Mail Us

Wrapping It Up Jilles Vreeken 31 July 2015 What did we do? - PowerPoint PPT Presentation

Wrapping It Up Jilles Vreeken 31 July 2015 What did we do? Introduction Patterns Correlation and Causation (Subjective) Interestingness Graphs Wrap-up + < ask-me-anything> T ake Home: ove overa rall Overview of the hot topics in

Wrapping LAPPS Services Wrapping a Service Preliminaries: Java,

20 years of excellence beyond wrapping. Safe Bag products SMART TRACK PREMIUM BASIC CARD -

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Key Wrapping with the Keccak Permutation Dmitry Khovratovich University of Luxembourg 17 January

can be better than stretch wrap 1) Less wrapping material: A single stretch hood bag replaces a

YOUR PARTNER FOR SHRINK WRAPPING APPLICATIONS 26/01/2016 AUTOPACK General Presentation 1

Contributing to the Lapps Grid Lapps Service Wrapping Lapps Grid Group May 26, 2014 Outline

Swing to SWT and Back: Patterns for API Migration by Wrapping Thiago Tonelli Bartolomei Ralf L

15-112 Fundamentals of Programming Week 1 - Lecture 5: Wrapping up 1st week + Intro to strings.

A Flexible Learning System for Wrapping Tables and Lists or How to Write a Really

Wrapping up CP course, lecture 11 Constraint programming is a problem-solving technique for

CSc 337 LECTURE 28: SESSIONS AND WRAPPING UP What is a session? session : an abstract concept

CSE 370 Lecture 3 Wrapping up 2s Complement. Starting Boolean Algebra. Lecture 2 Recap:

Memoization &amp; Greedy: Wrapping up programmatical approaches with thanks to Kosie Van Der

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

Towards a Semantic of XML Signature - How to Protect Against XML Wrapping Attacks Sebastian Gajek,

Subgroup Discovery Exploratory Data Analysis Exploratory Data Analysis Classification:

Sequential data analysis with TraMineR, Part 1 Gilbert Ritschard Department of Econometrics and

General considerations Forecasting is about the future! Lead times within 0-48 hours, in line with

M u ltiple e x planator y v ariables IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R Dann y

Jupyter in HPC 1 Matthias Bussonnier A Physicist/Bio-Physicist About Me Core developer

On Error Correction in the Exponent Chris Peikert MIT Computer Science and AI Laboratory Theory

CS133 Computational Geometry Instructor: Ahmed Eldawy 4/3/2018 Welcome back to UCR! 4/3/2018

Slides for Lecture 17 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Sambuz

Useful Links

Newsletter

Mail Us

Memoization & Greedy: Wrapping up programmatical approaches with thanks to Kosie Van Der