TADA!
T
- pics Algorithmic Data Analysis
24 April 2015
Jilles Vreeken
TADA! T opics Algorithmic Data Analysis Jilles Vreeken 24 April - - PowerPoint PPT Presentation
TADA! T opics Algorithmic Data Analysis Jilles Vreeken 24 April 2015 Question of the Course What are the hot t topics in data mining that are coo cool*? * and important to know Question of the Course How can we extract no novel kno
TADA!
T
24 April 2015
Jilles Vreeken
Question of the Course
What are the hot t topics in data mining that are coo cool*?
* and important to knowQuestion of the Course
How can we extract no novel kno knowledge and nd insi nsight from large data?
Organization
This is an advanced ced lecture,
with lectures, and reading, and assignments.Beware!
this lecture willill be well-worth its 5 ECTS
I’m I’m not a afraid id!
You will be, you will be.I’m not afraid.
I’m I’m not a afraid id!
You will be, you will be.I’m not afraid. Yes… you will be. You will be.
Organization
This is an advanced ced lecture,
with lectures, and reading, and assignments.Beware!
this lecture willill be well-worth its 5 ECTS
a lot of reading, a lot of thinking;it’ll take quite some some effort, but you’ll le learn n a lo lot
Reading Materials
We’ll mainly consider scientific articles All will be available on the website
directly accessible from the MPI network, or using login/password that you can get by emailLectures
Meetings that cover the basic topics
format: ‘sit, listen, shut up interact’Required reading
announced on website read at your own conveniencebut, strongly pref efer erred ed, before the lecture
Exam
Type tba
most likely oralDay and place tba
most likely in early AugustGrading
final grade will be based onfinal exam and assignments
Assignments: gen enera ral
4 assignments Grading scale: fa fail il, pass, excel cellent ent. You may fail on
ils and you fail il the course
Every excel cellent ent gives 1/3 bo bonus nus poin
You must u must p pass t ss the he fina nal exam t m to pass t ss the he co course
Assignments: requir equirem emen ents
To be written in proper academic-style English Us Use proper cit citatio ions
you are given sources you are encouraged to find additional sources all sources must be mentioned plalagia iaris ism instant fail il (at best)
Assignments: format mat
Return assignment reports as PDF files by email
no .doc(x), .odt, .rtf, .txt, .xml, .html, .pages, .ps, .eps, .etcNo page limit!
probably most will need 3 to 5 pages more is not necessarily betterReports must clearly state on the first page
name, matriculation number, email address and topicAssignments: returning ng
Return assignment reports are to be returned by email
tada@a@mpi-inf.mpg. g.de de
Deadline is on 1400 hours on the stated day
NONO delays, no excuses, time base on mail time stamp.
Submissions that I receive before the DL day I will ACK
Assignments: grading ing
Assignments are not for repeating what papers say
perhaps surprisingly, but I have already read the papers.You are expected to cr crit itic ically ly discuss the sources,
build connections, point out differences, provide
new insights, etc.
Some assignments are marked as hard rd
this is because they are and this will be takenn in into account unt when grading
News & Updates
Urgent and personal messages by email
everything else via the websiteQuestion of the Course
How can we extract no novel kno knowledge and nd insi nsight from large data?
For thousands of years, science was empir iric ical: describing natural phenome
1st
st Paradi
digm gm: Empir pirical S l Scien ience
2nd
nd Paradig
igm: Th Theo eoretical l Scien ience
The last few hundred years science was theoretical al: used models, generalizations, made predic ictio ions ns
3rd
rd Paradigm
gm: C Computatio iona nal S l Scienc nce
The last decades, science was comput utationa nal: complex models sim imul ulating ing complex phenome
4th
th Paradig
digm: Da Data-Intensi nsive S Scienc nce
Interesting phenomena are too
plex x to come up with good hypotheses. We need to unify theory, experimentation, and simulation
capture re data, mi mine ne hypotheses, inspec pect and evaluate, genera erate e extra data to sele lect ct the best ones, iterate itera erative e procedure between wo world and nd mod model, scientist in the middle
laws
Sho hopp ppin ing Da Data
Which products are
toget ether er?
Train in Dela Delays
Which trains are delayed because of othe
Dr Drug Disc Discover ery
What part of the molecule makes the drug work?
More patterns than you can shake a stick at
Pattern-based Modelling
Mining Algorithm
support vector machin svm associ rule mine nearest neighbor frequent itemset mine naïv bay linear discrimin analysi lda cluster high dimension state art frequent pattern mine algorithm synthet real summary of JMLR abstract databaseSummaris ising ing
Which sales chara racteri rise se your customers?
Summaris ising ing
Go Google gle Flu
Quit uite He e Healt lthy hy
Patient D Dece ceased
Big Big Da Data, Bigg Bigger er Da Data, Big iggest gest Da Data
No model is del is per erfec ect
Scien ience h e has lo lots o s of data, not t the the to tools to to analy lyse se it it
Soci cial Sci cience & e & th the Web
Astronomy my
Sloan Sky Su Survey:
100TB between 2000 and 2008 1 billion objects: 260M galaxies, 260M stars non
ial l analy lysis: currently impossible
With Your Help!
Maybe!