UCSD Robustness Summer School David Donoho 20190812 David Donoho - - PowerPoint PPT Presentation

ucsd robustness summer school
SMART_READER_LITE
LIVE PREVIEW

UCSD Robustness Summer School David Donoho 20190812 David Donoho - - PowerPoint PPT Presentation

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has Robustness in Statistics delivered? What has Robustness in Statistics to do with CS? UCSD Robustness


slide-1
SLIDE 1

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

UCSD Robustness Summer School

David Donoho 20190812

David Donoho UCSD Robustness Summer School

slide-2
SLIDE 2

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Outline

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS? David Donoho UCSD Robustness Summer School

slide-3
SLIDE 3

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Questions for Today

◮ What is Statistics? ◮ What is Statistical Research? ◮ How did ‘Robustness in Statistics’ Emerge? ◮ What has ‘Robustness in Statistics’ Delivered? ◮ How is any of this related to what CS people call robustness?

David Donoho UCSD Robustness Summer School

slide-4
SLIDE 4

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Two possible definitions:

◮ Academic discipline Mathematical of inference from data sampled from generative models ◮ Professional discipline Organizing term for professionally trained data analysts

David Donoho UCSD Robustness Summer School

slide-5
SLIDE 5

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Helpful framing for CS people

When you are raising money, its AI. when you are hir- ing, it’s machine learning. When you are actually doing it, it’s statistics. – Various.

David Donoho UCSD Robustness Summer School

slide-6
SLIDE 6

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Some history of Academic Statistics

◮ Cardano, Pascal ◮ John Arbuthnott & Nicholas Bernoulli (1710’s) ◮ John Graunt (1662) ◮ Abraham DeMoivre (1718) ◮ Bayes (1763), Laplace (ca. 1783,1813) ◮ Gauss (1809) + Legendre (1805) ◮ Chebyshev (ca. 1860) ◮ Galton (1880) ◮ Pearson (1890) ◮ Student (1905) ◮ Fisher (1920’s) ◮ Neyman-Pearson (1932) ◮ Kolmogorov (1932) ◮ Wald (1940’s) ◮ Tukey (1960’s)

See Books of Stephen Stigler, eg. Statistics on the Table.

David Donoho UCSD Robustness Summer School

slide-7
SLIDE 7

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Killer App: Scientific inference

◮ 1 Million Papers/year (many: clinical medicine) ◮ Large fraction use inference

◮ P-values ◮ Type I, Type II errors, P Values ◮ Confidence statements ◮ (Generalized) Linear Model hypothesis tests

Common Situations ◮ Small data ◮ Formal tests ◮ Prescriptive conclusions

David Donoho UCSD Robustness Summer School

slide-8
SLIDE 8

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Achievements of Scientific Inference

◮ Green Revolution ◮ Cancer research ◮ Surgical procedures ◮ Quality manufacture

David Donoho UCSD Robustness Summer School

slide-9
SLIDE 9

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Deliverables

◮ Generative Models ◮ Theoretical Frameworks ◮ Data Analysis Tools

David Donoho UCSD Robustness Summer School

slide-10
SLIDE 10

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Deliverables, I: Generative Models

◮ Normal Distribution ◮ bivariate Normal Distribution ◮ Chi-squared, F ◮ Binomial Poisson ◮ Poisson Processes ◮ Brownian Motion

David Donoho UCSD Robustness Summer School

slide-11
SLIDE 11

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Deliverables, II: Theoretical Frameworks/Analyses

Theoretical Frameworks ◮ Optimality ◮ Decision Theory ◮ Bayes Decision Theory ◮ Minimax Decision Theory ◮ Design of Experiments ◮ Asymptotic Analysis ◮ Universality (e.g. Central Limit Theorem) ◮ Ill-Posed problems and Regularization

David Donoho UCSD Robustness Summer School

slide-12
SLIDE 12

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Deliverables, III: Data Analysis Tools

Data Analysis Tools ◮ Fisher, analysis of variance ◮ Fisher, Yates, Design of Experiments ◮ Survival Analysis, censoring ◮ Fixed and Random effects analysis ◮ Observational Studies, Causal inference ◮ Time Series methods ◮ Multivariate methods ◮ Robust methods ◮ Regularization methods ◮ . . . ◮ Meta Analysis

David Donoho UCSD Robustness Summer School

slide-13
SLIDE 13

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Successes outside ‘Statistics’ strictu sensu

◮ Statistical Signal Processing in Radar/RF

◮ Matched Filtering & Detection ◮ Blind Equalization and Deconvolution ◮ On-line algorithms

◮ Nobel Prizes in economics/physics/chemistry

◮ Harry Markowitz – portfolio allocation ◮ Robert Engle – Volatility clustering ◮ . . . ◮ W.E. Moerner – fluourescence microscopy ◮ Higgs/CERN – Higgs mass ◮ Kip Thorne – gravitational waves

David Donoho UCSD Robustness Summer School

slide-14
SLIDE 14

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Psychology of Statistical Research

◮ Noise is huge, signal is weak. ◮ Generative model provides driving insight ◮ Tools for use by thousands of researchers correctly ◮ Possibility to be completely misled (replication crisis)

David Donoho UCSD Robustness Summer School

slide-15
SLIDE 15

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Goals in Statistics Research

Papers develop methodology that ◮ Applies very widely ◮ Many people could actually use correctly ◮ Understood using generative model;

◮ derive from optimality; ◮ establish operating characteristics

David Donoho UCSD Robustness Summer School

slide-16
SLIDE 16

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Stats Impact

R provides a kind of universal platform for statistics ◮ Stats papers typically lead to R packages ◮ Packages supply code, data used in paper ◮ Allow replication of results ◮ Allows others to benchmark new methods on same data ◮ Allow others to use methodology easily on fresh data.

David Donoho UCSD Robustness Summer School

slide-17
SLIDE 17

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Paradigm Shifts

◮ 1900 – Math Paper Paradigm ◮ 1960 – Data Analysis Process ◮ 1960 – Data Reuse Framework ◮ 1985 – Common Task Framework ◮ 2005 – Data Science Pipelines

David Donoho UCSD Robustness Summer School

slide-18
SLIDE 18

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS? ◮ Math Paper model: (Biometrika ... AoS) ◮ Generative Model ◮ Derive Procedure formally ◮ Properties of Procedure ◮ Data Analysis Process model: ◮ Propose Model ◮ Take Residuals ◮ Diagnose ◮ Transform data/ Refine model ◮ Iterate ◮ Data reuse model: ◮ Cross-validation ◮ Bootstrapping ◮ Random Forests ◮ Common Task Framework ◮ Competitors ◮ Shared dataset ◮ Robotic Scoring ◮ Data Science Pipeline ◮ Design/Build ◮ Improve ◮ Share/Reuse David Donoho UCSD Robustness Summer School

slide-19
SLIDE 19

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

How did Robustness in Statistics Emerge?

◮ RA. Fisher vs. AS Eddington (ca 1922) ◮ Neyman-Pearson & LRT ◮ Wald & decision theory ◮ Wald Wolfowitz & Asymptotic Minimaxity ◮ WWII ◮ Professionalization of Statistics ◮ Tukey & Future of Data Analysis (1962)

David Donoho UCSD Robustness Summer School

slide-20
SLIDE 20

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Tukey Critiques

◮ Data typically contaminated ◮ No generative model is ever exactly true ◮ Narrow optimality can be misleading ◮ Eddington was right, not Fisher ◮ Practical power of a test

David Donoho UCSD Robustness Summer School

slide-21
SLIDE 21

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Mathematical Formalizations: Huber/Hampel

◮ Data contamination as a formal process Modification of a ‘central model’ ◮ 2-person game against nature

◮ Nature chooses contamination ◮ Statistician chooses procedure

◮ Quantitative robustness

Huber– ‘optimality in minimax sense’ Hampel – ‘boundedness against substantial fraction contamination’ ◮ Qualitative robustness Hampel – ‘Robustness == continuity in appropriate topology’

David Donoho UCSD Robustness Summer School

slide-22
SLIDE 22

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Deliverable tools

◮ Iteratively reweighted least squares (IRLS) – Tukey ◮ Robust M estimates – Huber ◮ High-breakdown methods – Hampel, Rousseeuw and others

David Donoho UCSD Robustness Summer School

slide-23
SLIDE 23

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Stats vs. CS Goals

Contrast between CS Research and Stats Research Goals. ◮ Stats research seeks to write lasting papers about methods which are widely applied by a broad community of scientists/engineers. ◮ CS research seeks to write papers of intense current interest to a focused community of other CS researchers. Statisticians similar to MD’s.

◮ Hundreds of years of organized professional practice ◮ Tradition; standards of care ◮ Emphasis on not ‘killing patient’ ◮ Emphasis on systemic effects (‘morality’ + ‘consequentialism’)

David Donoho UCSD Robustness Summer School

slide-24
SLIDE 24

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Points of Contact

Mathematical Formalizations: Huber/Hampel ◮ Data contamination as a formal process Modification of a ‘central model’ ◮ 2-person game against nature

◮ Nature chooses contamination ◮ Statistician chooses procedure

◮ Quantitative robustness Huber– ‘optimality in minimax sense’ Hampel – ‘boundedness against substantial fraction contamination’ ◮ Qualitative robustness Hampel – ‘Robustness == continuity in appropriate topology’

David Donoho UCSD Robustness Summer School

slide-25
SLIDE 25

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS?

Reading suggestions

  • D. Donoho (2017) 50 Years of Data Science, Journal of Computational and Graphical Statistics. Volume

26, 2017 - Issue 4 ◮

  • RA. Fisher (1922). On the mathematical foundations of theoretical statistics Philosophical Transactions of

the Royal Society of London. Series A ◮ Frank R. Hampel Elvezio M. Ronchetti Peter J. Rousseeuw Werner A. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, 2nd ed 2005. ◮ Peter J Huber.(1982) Robust Statistics. Wiley. 2nd ed. w/E. Ronchetti (2009). ◮ Peter J Huber.(1997) “Speculations on the Path of Statistics,” in The Practice of Data Analysis, eds. D.

  • R. Brillinger, L. T. Fernholz, and S. Morgenthaler, Princeton, NJ: Princeton University Press, pp. 175–191.

◮ Colin Mallows. (2006) Tukey’s paper after 40 years. Technometrics. 48 319-325. ◮ John W. Tukey The Future of Data Analysis. Ann. Math. Statist. 33 (1962), no. 1, 1–67. doi: David Donoho UCSD Robustness Summer School

slide-26
SLIDE 26

What is Statistics? What is Statistical Research? Paradigm Shifts in Statistics How did Robustness in Statistics Emerge? What has ’Robustness in Statistics’ delivered? What has ’Robustness in Statistics’ to do with CS? David Donoho UCSD Robustness Summer School