Calibrate p values by taking the square root Rutgers Foundations of - PowerPoint PPT Presentation

Calibrate p ‐ values by taking the square root Rutgers Foundations of Probability Seminar September 12, 2016 Glenn Shafer 1. Significance levels and p ‐ values 2. Game ‐ theoretic probability 3. The dynamic nature of game ‐ theoretic testing 4. Calibrating p ‐ values 5. Insuring against loss of evidence 1

See Working Papers at www.probabilityandfinance.com: 33. Test martingales, Bayes factors, and p ‐ values, by Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Statistical Science 26 , 84–101, 2011. 34. Insuring against loss of evidence in game ‐ theoretic probability, by A. Philip Dawid, Steven de Rooij, Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk . Statistics and Probability Letters 81 , 157–162, 2011. 2

For nearly 100 years, researchers have persisted in using p ‐ values in spite of fierce criticism. Both Bayesians and Neyman ‐ Pearson purists contend that use of a p ‐ value is cheating even in the simplest case, where the hypothesis to be tested and a test statistic are specified in advance. Bayesians point out that a small p ‐ value often does not translate into a strong Bayes factor against the hypothesis. Neyman ‐ Pearson purists insist that you should state a significance level in advance and stick with it, even if the p ‐ value turns out to be much smaller than this significance level. But many applied statisticians persist in feeling that a p ‐ value much smaller than the significance level is meaningful evidence. In the game ‐ theoretic approach to probability (see my 2001 book with Vladimir Vovk, described at www.probabilityandfinance.com), you test a statistical hypothesis by using its probabilities to bet. You reject at a significance level of 0.01, say, if you succeed in multiplying the capital you risk by 100. In this picture, we can calibrate small p ‐ values so as to measure their meaningfulness while absolving them of cheating. There are various ways to implement this calibration, but one of them leads to a very simple rule of thumb: take the square root of the p ‐ value. Thus rejection at a significance level of 0.01 requires a p ‐ value of one in 10,000. 3

Part 1. Significance levels and p ‐ values • Is use of p ‐ values cheating? Part 2. Game ‐ theoretic probability • Use a game to define probability • Game ‐ theoretic justification of significance testing Part 3. The dynamic nature of game ‐ theoretic testing • Evidence can go up and then back down. • Pretending you had stopped earlier = using p ‐ value Part 4. Calibrating p ‐ values • Averaging stopped versions of Skeptic’s play • The square ‐ root calibrator 4

Part 1. Significance levels and p ‐ values Is use of p ‐ values is cheating? The emphasis on p ‐ values began with Karl Pearson and R. A. Fisher. R. A. Fisher, 1890 ‐ 1962 Karl Pearson, 1857 ‐ 1936 5

Part 1. Significance levels and p ‐ values 6

Part 1. Significance levels and p ‐ values Twentieth ‐ century questions 7

Part 2. Game ‐ theoretic probability • Use a game to define probability • Game ‐ theoretic justification of significance testing Fermat: probability = measure of cases that produce event. Pascal: probability = capital you risk to get 1 if event happens. Pierre Fermat, 1601 ‐ 1665 Blaise Pascal, 1623 ‐ 1662 11

Part 2. Game ‐ theoretic probability Pascal’s question to Fermat in 1654 0 Peter 0 Peter Paul’s payoffs are shown. Paul Paul 64 Paul needs 2 points to win. Peter needs only one. If the game must be broken off, how much of the stake should Paul get? 12

Part 2. Game ‐ theoretic probability 0 Peter Fermat’s answer 0 Peter Suppose they play two rounds. Paul There are 4 possible outcomes: Paul 64 1. Peter wins first, Peter wins second 2. Peter wins first, Paul wins second 3. Paul wins first, Peter wins second 4. Paul wins first, Paul wins second Paul wins only in outcome 4. So his share should be ¼, or 16 pistoles. Pascal didn’t like the argument. 13

Part 2. Game ‐ theoretic probability Pascal’s answer (game theory) 0 Peter 16 0 Peter Paul 32 Paul 64 14

Part 2. Game ‐ theoretic probability 0 Peter 16 Peter 0 32 Paul Paul 64 0 Peter 1/4 Peter 0 1/2 Paul Paul 1 15

Part 2. Game ‐ theoretic probability Measure ‐ theoretic probability: • Classical: elementary events with probabilities adding to one. • Modern: space with filtration and probability measure. Probability of A = total probability of elementary events favoring A Game ‐ theoretic probability: • Forecaster offers prices for uncertain payoffs. • Skeptic decides what to buy. Probability of A = stake Skeptic must risk to get 1 if A happens Upper probability of A = stake Skeptic must risk to get at least 1 if A happens 16

Part 2. Game ‐ theoretic probability Ville revived game ‐ theoretic probability (e.g., martingales) in 1939. Jean Ville 1910 ‐ 1989 Ville’s Picture On each round: 1. Skeptic decides which offers to accept. 2. Reality decides the outcome. Sequential nature of the game is fundamental. 17

Part 2. Game ‐ theoretic probability 18

Part 2. Game ‐ theoretic probability Ville’s game ‐ theoretic foundation for classical probability • An event has probability zero if and only if Skeptic can multiply his capital infinitely if the event fails. • An event has probability <1/ K if and only if Skeptic can multiply his capital by K if the event fails. Vovk and I generalize in two ways: 1. We say upper probability instead of probability when too few bets are offered to construct an exact 0/1 payoff. 2. We allow bets to be offered in the course of the game by a forecaster. Vovk’s Picture On each round: 1. Forecaster offers bets. 2. Skeptic decides which offers to accept. 3. Reality decides the outcome. 19

Part 2. Game ‐ theoretic probability Game ‐ theoretic justification of significance testing 20

Part 2. Game ‐ theoretic probability Game ‐ theoretic justification of significance testing The gambling picture justifies significance testing. Don’t try to understand game ‐ theoretic probability in terms of classical statistics. The logic goes in the other direction. 21

Part 2. Game ‐ theoretic probability Game ‐ theoretic explanation of why p ‐ values are less convincing Strategy depends on p !! 22

Part 3. Dynamic nature of game ‐ theoretic testing • Evidence can go up and then back down. • Pretending you stopped earlier = using p ‐ value 23

Part 3. Dynamic nature of game ‐ theoretic testing Classical Cournot principle Game ‐ theoretic Cournot principle Meaning of probability model Meaning of forecasts = = Event of small probability 1/K Skeptic will not multiply capital risked by selected in advance will not happen. large factor. Classic principle as special case of game ‐ theoretic principle: 1. Assume forecast on each round is probability distribution for Reality’s next move. 2. Fix a strategy for Forecaster, thus defining a classical probability model for Reality’s moves. 3. Fix a strategy for Skeptic (including a stopping time). 4. Fix a factor K by which Skeptic aims to multiply capital. In this special case, the two principles are equivalent, because 24

Part 3. Dynamic nature of game ‐ theoretic testing Classical Cournot principle Game ‐ theoretic Cournot principle Meaning of probability model Meaning of forecasts = = Event of small probability 1/K Skeptic will not multiply capital risked by selected in advance will not happen. large factor. The scope of the generalization: 1. Forecast on each round may fall short of a complete probability distribution for Reality’s next move. 2. Forecaster need not follow a strategy. 3.Skeptic need not follow a strategy. 4. Skeptic need not set a goal for multiplying his capital. 5. But the stopping time must be fixed. 25

Part 3. Dynamic nature of game ‐ theoretic testing Skeptic need not follow a strategy. But we study strategies for Skeptic in order to see what he can accomplish. The capital process for a strategy for Skeptic is called a martingale . (This usage is due to Jean Ville.) 26

Part 3. Dynamic nature of game ‐ theoretic testing • Forecaster gives 50 ‐ 50 odds on Paul each time. • Skeptic’s strategy: Start with 16 pistols and bet all his money on Paul each time. • The numbers in red constitute the martingale for this strategy. Peter 0 Peter 16 0 Paul 32 Paul 64 27

Part 3. Dynamic nature of game ‐ theoretic testing Selecting a strategy for Skeptic (martingale) In probability case, the martingale is a likelihood ratio. This is how the notion of an alternative hypothesis enters the picture. 28

Part 3. Dynamic nature of game ‐ theoretic testing Classical Cournot principle Game ‐ theoretic Cournot principle Meaning of probability model Meaning of forecasts = = Event of small probability 1/K Skeptic will not multiply capital risked by selected in advance will not happen. large factor. The scope of the generalization: 1. Forecast on each round may fall short of a complete probability distribution for Reality’s next move. 2. Forecaster need not follow a strategy. 3. Skeptic need not follow a strategy. 4. Skeptic need not set a goal for multiplying his capital. 5. But the stopping time must be fixed. 29

Calibrate p values by taking the square root Rutgers Foundations of - PowerPoint PPT Presentation

Calibrate p values by taking the square root Rutgers Foundations of Probability Seminar September 12, 2016 Glenn Shafer 1. Significance levels and p values 2. Game theoretic probability 3. The dynamic nature of game theoretic testing

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

5/15/2019 Square Root - Direct Method Square Root - Direct Method In IEEE floating point standard

Thoughts on F-Root Futures Jeff Osborn President, Internet Systems Consortium Whats the

Root Cause Analysis 1 Root Cause Analysis Root Cause Analysis is a method that is used to

F root anycast: What, why and how Joo Damas ISC Overview What is a root server? What is

Run 2 Data Taking Run 2 Data Taking 50ns ramp (early measurement) 25ns data taking

Taking Taking Taking Taking Aspiration and Aspiration and Aspiration and Aspiration and

Values Learning Outcomes Define what values are Identify your personal values Relate

Tutorial on Root Server System Root Server System Advisory Committee | October 2015 Outline 1.

Root C t Cause An Analysis Presented by: Isaac Garcia, RCC Objec ectives es Define Root

BARE ROOT AND BARE ROOT AND CONTAINERIZED FOREST CONTAINERIZED FOREST PLANTS PLANTS PLANTS

Scaling the Root A study of the impact on the DNS root system of increasing the size and

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2 Ben Jann

Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI fr Physik u.

Rcourse: Basic statistics with R Sonja Grath, No emie Becker & Dirk Metzler Winter

NLP for low-resourced languages Teresa Lynn, PhD Research Fellow ADAPT Centre Dublin City

Acknowledgements Acknowledgements Coauthors: Amy Wilson-Stronks, The Joint Commission,

RGG An XML based GUI Generator for R Ilhami Visne 1 , Klemens Vierlinger 1 , Friedrich Leisch 2 ,

I 06 - p -values STAT 587 (Engineering) Iowa State University September 27, 2020 p -values p

The Set of 3 4 4 Contingency Tables has 3-Neighborhood Property Toshio Sumi and Toshio

Calibrate p values by taking the square root Rutgers Foundations of - PowerPoint PPT Presentation

Calibrate p values by taking the square root Rutgers Foundations of Probability Seminar September 12, 2016 Glenn Shafer 1. Significance levels and p values 2. Game theoretic probability 3. The dynamic nature of game theoretic testing

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

5/15/2019 Square Root - Direct Method Square Root - Direct Method In IEEE floating point standard

Thoughts on F-Root Futures Jeff Osborn President, Internet Systems Consortium Whats the

Root Cause Analysis 1 Root Cause Analysis Root Cause Analysis is a method that is used to

F root anycast: What, why and how Joo Damas ISC Overview What is a root server? What is

Run 2 Data Taking Run 2 Data Taking 50ns ramp (early measurement) 25ns data taking

Taking Taking Taking Taking Aspiration and Aspiration and Aspiration and Aspiration and

Values Learning Outcomes Define what values are Identify your personal values Relate

Tutorial on Root Server System Root Server System Advisory Committee | October 2015 Outline 1.

Root C t Cause An Analysis Presented by: Isaac Garcia, RCC Objec ectives es Define Root

BARE ROOT AND BARE ROOT AND CONTAINERIZED FOREST CONTAINERIZED FOREST PLANTS PLANTS PLANTS

Scaling the Root A study of the impact on the DNS root system of increasing the size and

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2 Ben Jann

Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI fr Physik u.

Rcourse: Basic statistics with R Sonja Grath, No emie Becker &amp; Dirk Metzler Winter

NLP for low-resourced languages Teresa Lynn, PhD Research Fellow ADAPT Centre Dublin City

Acknowledgements Acknowledgements Coauthors: Amy Wilson-Stronks, The Joint Commission,

RGG An XML based GUI Generator for R Ilhami Visne 1 , Klemens Vierlinger 1 , Friedrich Leisch 2 ,

I 06 - p -values STAT 587 (Engineering) Iowa State University September 27, 2020 p -values p

The Set of 3 4 4 Contingency Tables has 3-Neighborhood Property Toshio Sumi and Toshio

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

Rcourse: Basic statistics with R Sonja Grath, No emie Becker & Dirk Metzler Winter