SLIDE 1
Data Science: The End of Statistics? Larry Wasserman Carnegie - - PowerPoint PPT Presentation
Data Science: The End of Statistics? Larry Wasserman Carnegie - - PowerPoint PPT Presentation
Data Science: The End of Statistics? Larry Wasserman Carnegie Mellon University Interface 2015 Conclusion Conclusion Lets turn the Interface meeting into the statistics version of NIPS This Talk This Talk Will be short This Talk Will
SLIDE 2
SLIDE 3
Conclusion
Let’s turn the Interface meeting into the statistics version of NIPS
SLIDE 4
This Talk
SLIDE 5
This Talk
Will be short
SLIDE 6
This Talk
Will be short Will be annoying provocative
SLIDE 7
Main Points
SLIDE 8
Main Points
- Statisticians are being left out
SLIDE 9
Main Points
- Statisticians are being left out
- This should worry everyone (not just statisticians)
SLIDE 10
Main Points
- Statisticians are being left out
- This should worry everyone (not just statisticians)
- It’s (partly) our fault
SLIDE 11
Main Points
- Statisticians are being left out
- This should worry everyone (not just statisticians)
- It’s (partly) our fault
- We need a culture shift:
- 1. modernize training (no more UMVUE’s)
- 2. embrace the CS conference culture
- 3. watch and learn from CS: active learning, deep learning, SVM,
- nline learning, RKHS, differential privacy ...
SLIDE 12
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ...
SLIDE 13
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
SLIDE 14
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy.
SLIDE 15
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy. Not a statistician.
SLIDE 16
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy. Not a statistician.
- Forbes: World’s 7 Most Powerful Data Scientists
SLIDE 17
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy. Not a statistician.
- Forbes: World’s 7 Most Powerful Data Scientists
0 statisticians.
SLIDE 18
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy. Not a statistician.
- Forbes: World’s 7 Most Powerful Data Scientists
0 statisticians.
- Startups?
SLIDE 19
Where are the Statisticians?
- President’s Council of Advisors on Science and Technology
(PCAST) includes ... 0 statisticians!
- Chief Data Scientist of the United States Office of Science and
Technology Policy. Not a statistician.
- Forbes: World’s 7 Most Powerful Data Scientists
0 statisticians.
- Startups?
- Google, Microsoft, Facebook all have Chief Economists. Chief
Statisticians?
SLIDE 20
Everyone Should Care (Not Just Statisticians)
- Big Data + Bad Analysis = Bad Decisions
SLIDE 21
Everyone Should Care (Not Just Statisticians)
- Big Data + Bad Analysis = Bad Decisions
- Gary King: Big data is not about the data, it’s about the
analytics.
SLIDE 22
Everyone Should Care (Not Just Statisticians)
- Big Data + Bad Analysis = Bad Decisions
- Gary King: Big data is not about the data, it’s about the
analytics.
- Google search: big data bad analytics = 10,700,000 hits
SLIDE 23
Everyone Should Care (Not Just Statisticians)
- Big Data + Bad Analysis = Bad Decisions
- Gary King: Big data is not about the data, it’s about the
analytics.
- Google search: big data bad analytics = 10,700,000 hits
- Statisticians have been doing data science for at least 100 years.
SLIDE 24
Everyone Should Care (Not Just Statisticians)
- Big Data + Bad Analysis = Bad Decisions
- Gary King: Big data is not about the data, it’s about the
analytics.
- Google search: big data bad analytics = 10,700,000 hits
- Statisticians have been doing data science for at least 100 years.
- You would not get brain surgery done by a cardiologist.
SLIDE 25
Why Are Statisticians Left Out?
Statisticians are:
SLIDE 26
Why Are Statisticians Left Out?
Statisticians are: conservative
SLIDE 27
Why Are Statisticians Left Out?
Statisticians are: conservative stubborn
SLIDE 28
Why Are Statisticians Left Out?
Statisticians are: conservative stubborn inflexible
SLIDE 29
Why Are Statisticians Left Out?
Statisticians are: conservative stubborn inflexible bad at selling themselves
SLIDE 30
Why Are Statisticians Left Out?
Statisticians are: conservative stubborn inflexible bad at selling themselves afraid
SLIDE 31
Why Are Statisticians Left Out?
Statisticians are: conservative stubborn inflexible bad at selling themselves afraid experts at saying what you can’t do
SLIDE 32
A (mostly) True Story
- Astronomer asks us for help.
SLIDE 33
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
SLIDE 34
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
SLIDE 35
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
- In the meantime...
SLIDE 36
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
- In the meantime...
... my astronomer friend went to see my friends in ML.
SLIDE 37
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
- In the meantime...
... my astronomer friend went to see my friends in ML.
- Two days later the ML people produced fancy plots, analyses etc.
SLIDE 38
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
- In the meantime...
... my astronomer friend went to see my friends in ML.
- Two days later the ML people produced fancy plots, analyses etc.
- We complain that their analysis was not rigorous.
SLIDE 39
A (mostly) True Story
- Astronomer asks us for help.
- We spend months learning the science, cleaning the data and
carefully analyzing the data.
- Some careful, modest results after one year.
- In the meantime...
... my astronomer friend went to see my friends in ML.
- Two days later the ML people produced fancy plots, analyses etc.
- We complain that their analysis was not rigorous.
- Who will the astronomer go to in the future?
SLIDE 40
Anecdote: My One Week as Editor of JASA
SLIDE 41
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA.
SLIDE 42
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA. I insisted that the journal be made freely available, online.
SLIDE 43
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired.
SLIDE 44
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis.
SLIDE 45
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall.
SLIDE 46
Anecdote: My One Week as Editor of JASA
I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall. Compare this to JMLR (Journal of Machine Learning Research) jmlr.org. or NIPS (nips.cc) or ICML (imcl.cc) etc.
SLIDE 47
What to Do?
SLIDE 48
What to Do?
- Change “Department of Statistics” to “Department of Statistics
and Data Science”
SLIDE 49
What to Do?
- Change “Department of Statistics” to “Department of Statistics
and Data Science”
- Mostly, we need a cultural shift: training, conferences, topics.
SLIDE 50
Training
SLIDE 51
Training
- Get rid of: MVUE, ancillarity, completeness, ...
SLIDE 52
Training
- Get rid of: MVUE, ancillarity, completeness, ...
- Get rid of assumptions: (more on this is in a minute)
SLIDE 53
Training
- Get rid of: MVUE, ancillarity, completeness, ...
- Get rid of assumptions: (more on this is in a minute)
- Add:
VC dimension support vector machines
- nline learning, bandits
deep learning
- ptimization
coding (not just R) cloud computing basic software engineering (github etc)
SLIDE 54
Assumptions are For Suckers
SLIDE 55
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
SLIDE 56
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
SLIDE 57
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
SLIDE 58
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ
SLIDE 59
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality
SLIDE 60
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models)
SLIDE 61
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence)
SLIDE 62
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability!
SLIDE 63
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?”
SLIDE 64
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?” Why should X1, . . . , Xn be thought of as draws from some distribution?
SLIDE 65
Assumptions are For Suckers
- model-based, assumption-laden methods are useless in the world
- f big, complex, datasets
- We need assumption-light methods with good visualization
- I propose we ban these things:
Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?” Why should X1, . . . , Xn be thought of as draws from some distribution?
- online learning, individual sequence prediction, ...
SLIDE 66
Conference Culture
SLIDE 67
Conference Culture
- conference model: refereed conferences: NIPS, ICML, AISTATS,
etc
SLIDE 68
Conference Culture
- conference model: refereed conferences: NIPS, ICML, AISTATS,
etc
- leads to energetic, fast, continuous progress
SLIDE 69
Conference Culture
- conference model: refereed conferences: NIPS, ICML, AISTATS,
etc
- leads to energetic, fast, continuous progress
- Every student should be regularly submitting papers to NIPS,
AISTATS, ICML, ...
SLIDE 70
Conference Culture
- conference model: refereed conferences: NIPS, ICML, AISTATS,
etc
- leads to energetic, fast, continuous progress
- Every student should be regularly submitting papers to NIPS,
AISTATS, ICML, ...
- The Interface:
SLIDE 71
Conference Culture
- conference model: refereed conferences: NIPS, ICML, AISTATS,
etc
- leads to energetic, fast, continuous progress
- Every student should be regularly submitting papers to NIPS,
AISTATS, ICML, ...
- The Interface:
- Let’s make the interface the epicenter of statistics. Make it like
NIPS.
SLIDE 72
Conclusion
SLIDE 73
Conclusion
- Statisticans are the original Data Scientists.
SLIDE 74
Conclusion
- Statisticans are the original Data Scientists.
- Let’s embrace some of the CS culture. (If you can’t beat them,
join them).
SLIDE 75
Conclusion
- Statisticans are the original Data Scientists.
- Let’s embrace some of the CS culture. (If you can’t beat them,