Data Science: The End of Statistics? Larry Wasserman Carnegie - - PowerPoint PPT Presentation

data science the end of statistics
SMART_READER_LITE
LIVE PREVIEW

Data Science: The End of Statistics? Larry Wasserman Carnegie - - PowerPoint PPT Presentation

Data Science: The End of Statistics? Larry Wasserman Carnegie Mellon University Interface 2015 Conclusion Conclusion Lets turn the Interface meeting into the statistics version of NIPS This Talk This Talk Will be short This Talk Will


slide-1
SLIDE 1

Data Science: The End of Statistics?

Larry Wasserman Carnegie Mellon University Interface 2015

slide-2
SLIDE 2

Conclusion

slide-3
SLIDE 3

Conclusion

Let’s turn the Interface meeting into the statistics version of NIPS

slide-4
SLIDE 4

This Talk

slide-5
SLIDE 5

This Talk

Will be short

slide-6
SLIDE 6

This Talk

Will be short Will be annoying provocative

slide-7
SLIDE 7

Main Points

slide-8
SLIDE 8

Main Points

  • Statisticians are being left out
slide-9
SLIDE 9

Main Points

  • Statisticians are being left out
  • This should worry everyone (not just statisticians)
slide-10
SLIDE 10

Main Points

  • Statisticians are being left out
  • This should worry everyone (not just statisticians)
  • It’s (partly) our fault
slide-11
SLIDE 11

Main Points

  • Statisticians are being left out
  • This should worry everyone (not just statisticians)
  • It’s (partly) our fault
  • We need a culture shift:
  • 1. modernize training (no more UMVUE’s)
  • 2. embrace the CS conference culture
  • 3. watch and learn from CS: active learning, deep learning, SVM,
  • nline learning, RKHS, differential privacy ...
slide-12
SLIDE 12

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ...

slide-13
SLIDE 13

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

slide-14
SLIDE 14

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy.

slide-15
SLIDE 15

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy. Not a statistician.

slide-16
SLIDE 16

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy. Not a statistician.

  • Forbes: World’s 7 Most Powerful Data Scientists
slide-17
SLIDE 17

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy. Not a statistician.

  • Forbes: World’s 7 Most Powerful Data Scientists

0 statisticians.

slide-18
SLIDE 18

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy. Not a statistician.

  • Forbes: World’s 7 Most Powerful Data Scientists

0 statisticians.

  • Startups?
slide-19
SLIDE 19

Where are the Statisticians?

  • President’s Council of Advisors on Science and Technology

(PCAST) includes ... 0 statisticians!

  • Chief Data Scientist of the United States Office of Science and

Technology Policy. Not a statistician.

  • Forbes: World’s 7 Most Powerful Data Scientists

0 statisticians.

  • Startups?
  • Google, Microsoft, Facebook all have Chief Economists. Chief

Statisticians?

slide-20
SLIDE 20

Everyone Should Care (Not Just Statisticians)

  • Big Data + Bad Analysis = Bad Decisions
slide-21
SLIDE 21

Everyone Should Care (Not Just Statisticians)

  • Big Data + Bad Analysis = Bad Decisions
  • Gary King: Big data is not about the data, it’s about the

analytics.

slide-22
SLIDE 22

Everyone Should Care (Not Just Statisticians)

  • Big Data + Bad Analysis = Bad Decisions
  • Gary King: Big data is not about the data, it’s about the

analytics.

  • Google search: big data bad analytics = 10,700,000 hits
slide-23
SLIDE 23

Everyone Should Care (Not Just Statisticians)

  • Big Data + Bad Analysis = Bad Decisions
  • Gary King: Big data is not about the data, it’s about the

analytics.

  • Google search: big data bad analytics = 10,700,000 hits
  • Statisticians have been doing data science for at least 100 years.
slide-24
SLIDE 24

Everyone Should Care (Not Just Statisticians)

  • Big Data + Bad Analysis = Bad Decisions
  • Gary King: Big data is not about the data, it’s about the

analytics.

  • Google search: big data bad analytics = 10,700,000 hits
  • Statisticians have been doing data science for at least 100 years.
  • You would not get brain surgery done by a cardiologist.
slide-25
SLIDE 25

Why Are Statisticians Left Out?

Statisticians are:

slide-26
SLIDE 26

Why Are Statisticians Left Out?

Statisticians are: conservative

slide-27
SLIDE 27

Why Are Statisticians Left Out?

Statisticians are: conservative stubborn

slide-28
SLIDE 28

Why Are Statisticians Left Out?

Statisticians are: conservative stubborn inflexible

slide-29
SLIDE 29

Why Are Statisticians Left Out?

Statisticians are: conservative stubborn inflexible bad at selling themselves

slide-30
SLIDE 30

Why Are Statisticians Left Out?

Statisticians are: conservative stubborn inflexible bad at selling themselves afraid

slide-31
SLIDE 31

Why Are Statisticians Left Out?

Statisticians are: conservative stubborn inflexible bad at selling themselves afraid experts at saying what you can’t do

slide-32
SLIDE 32

A (mostly) True Story

  • Astronomer asks us for help.
slide-33
SLIDE 33

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

slide-34
SLIDE 34

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
slide-35
SLIDE 35

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
  • In the meantime...
slide-36
SLIDE 36

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
  • In the meantime...

... my astronomer friend went to see my friends in ML.

slide-37
SLIDE 37

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
  • In the meantime...

... my astronomer friend went to see my friends in ML.

  • Two days later the ML people produced fancy plots, analyses etc.
slide-38
SLIDE 38

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
  • In the meantime...

... my astronomer friend went to see my friends in ML.

  • Two days later the ML people produced fancy plots, analyses etc.
  • We complain that their analysis was not rigorous.
slide-39
SLIDE 39

A (mostly) True Story

  • Astronomer asks us for help.
  • We spend months learning the science, cleaning the data and

carefully analyzing the data.

  • Some careful, modest results after one year.
  • In the meantime...

... my astronomer friend went to see my friends in ML.

  • Two days later the ML people produced fancy plots, analyses etc.
  • We complain that their analysis was not rigorous.
  • Who will the astronomer go to in the future?
slide-40
SLIDE 40

Anecdote: My One Week as Editor of JASA

slide-41
SLIDE 41

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA.

slide-42
SLIDE 42

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA. I insisted that the journal be made freely available, online.

slide-43
SLIDE 43

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired.

slide-44
SLIDE 44

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis.

slide-45
SLIDE 45

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall.

slide-46
SLIDE 46

Anecdote: My One Week as Editor of JASA

I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall. Compare this to JMLR (Journal of Machine Learning Research) jmlr.org. or NIPS (nips.cc) or ICML (imcl.cc) etc.

slide-47
SLIDE 47

What to Do?

slide-48
SLIDE 48

What to Do?

  • Change “Department of Statistics” to “Department of Statistics

and Data Science”

slide-49
SLIDE 49

What to Do?

  • Change “Department of Statistics” to “Department of Statistics

and Data Science”

  • Mostly, we need a cultural shift: training, conferences, topics.
slide-50
SLIDE 50

Training

slide-51
SLIDE 51

Training

  • Get rid of: MVUE, ancillarity, completeness, ...
slide-52
SLIDE 52

Training

  • Get rid of: MVUE, ancillarity, completeness, ...
  • Get rid of assumptions: (more on this is in a minute)
slide-53
SLIDE 53

Training

  • Get rid of: MVUE, ancillarity, completeness, ...
  • Get rid of assumptions: (more on this is in a minute)
  • Add:

VC dimension support vector machines

  • nline learning, bandits

deep learning

  • ptimization

coding (not just R) cloud computing basic software engineering (github etc)

slide-54
SLIDE 54

Assumptions are For Suckers

slide-55
SLIDE 55

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
slide-56
SLIDE 56

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
slide-57
SLIDE 57

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:
slide-58
SLIDE 58

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ

slide-59
SLIDE 59

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality

slide-60
SLIDE 60

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models)

slide-61
SLIDE 61

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence)

slide-62
SLIDE 62

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability!

slide-63
SLIDE 63

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?”

slide-64
SLIDE 64

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?” Why should X1, . . . , Xn be thought of as draws from some distribution?

slide-65
SLIDE 65

Assumptions are For Suckers

  • model-based, assumption-laden methods are useless in the world
  • f big, complex, datasets
  • We need assumption-light methods with good visualization
  • I propose we ban these things:

Y = Xβ + ǫ Normality sparsity (sparse methods not sparse models) design assumptions (incoherence) radical suggestion: let’s get rid of probability! Jim Ramsay: “what good has probability ever done for statistics?” Why should X1, . . . , Xn be thought of as draws from some distribution?

  • online learning, individual sequence prediction, ...
slide-66
SLIDE 66

Conference Culture

slide-67
SLIDE 67

Conference Culture

  • conference model: refereed conferences: NIPS, ICML, AISTATS,

etc

slide-68
SLIDE 68

Conference Culture

  • conference model: refereed conferences: NIPS, ICML, AISTATS,

etc

  • leads to energetic, fast, continuous progress
slide-69
SLIDE 69

Conference Culture

  • conference model: refereed conferences: NIPS, ICML, AISTATS,

etc

  • leads to energetic, fast, continuous progress
  • Every student should be regularly submitting papers to NIPS,

AISTATS, ICML, ...

slide-70
SLIDE 70

Conference Culture

  • conference model: refereed conferences: NIPS, ICML, AISTATS,

etc

  • leads to energetic, fast, continuous progress
  • Every student should be regularly submitting papers to NIPS,

AISTATS, ICML, ...

  • The Interface:
slide-71
SLIDE 71

Conference Culture

  • conference model: refereed conferences: NIPS, ICML, AISTATS,

etc

  • leads to energetic, fast, continuous progress
  • Every student should be regularly submitting papers to NIPS,

AISTATS, ICML, ...

  • The Interface:
  • Let’s make the interface the epicenter of statistics. Make it like

NIPS.

slide-72
SLIDE 72

Conclusion

slide-73
SLIDE 73

Conclusion

  • Statisticans are the original Data Scientists.
slide-74
SLIDE 74

Conclusion

  • Statisticans are the original Data Scientists.
  • Let’s embrace some of the CS culture. (If you can’t beat them,

join them).

slide-75
SLIDE 75

Conclusion

  • Statisticans are the original Data Scientists.
  • Let’s embrace some of the CS culture. (If you can’t beat them,

join them). THE END