Calibrate p values by taking the square root Rutgers Foundations of - - PowerPoint PPT Presentation

calibrate p values by taking the square root
SMART_READER_LITE
LIVE PREVIEW

Calibrate p values by taking the square root Rutgers Foundations of - - PowerPoint PPT Presentation

Calibrate p values by taking the square root Rutgers Foundations of Probability Seminar September 12, 2016 Glenn Shafer 1. Significance levels and p values 2. Game theoretic probability 3. The dynamic nature of game theoretic testing


slide-1
SLIDE 1

Calibrate p‐values by taking the square root

Rutgers Foundations of Probability Seminar September 12, 2016 Glenn Shafer

  • 1. Significance levels and p‐values
  • 2. Game‐theoretic probability
  • 3. The dynamic nature of game‐theoretic testing
  • 4. Calibrating p‐values
  • 5. Insuring against loss of evidence

1

slide-2
SLIDE 2

See Working Papers at www.probabilityandfinance.com:

  • 33. Test martingales, Bayes factors, and p‐values, by Glenn

Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Statistical Science 26, 84–101, 2011.

  • 34. Insuring against loss of evidence in game‐theoretic

probability, by A. Philip Dawid, Steven de Rooij, Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk . Statistics and Probability Letters 81, 157–162, 2011.

2

slide-3
SLIDE 3

For nearly 100 years, researchers have persisted in using p‐values in spite of fierce criticism. Both Bayesians and Neyman‐Pearson purists contend that use of a p‐value is cheating even in the simplest case, where the hypothesis to be tested and a test statistic are specified in

  • advance. Bayesians point out that a small p‐value often does not

translate into a strong Bayes factor against the hypothesis. Neyman‐ Pearson purists insist that you should state a significance level in advance and stick with it, even if the p‐value turns out to be much smaller than this significance level. But many applied statisticians persist in feeling that a p‐value much smaller than the significance level is meaningful evidence. In the game‐theoretic approach to probability (see my 2001 book with Vladimir Vovk, described at www.probabilityandfinance.com), you test a statistical hypothesis by using its probabilities to bet. You reject at a significance level of 0.01, say, if you succeed in multiplying the capital you risk by 100. In this picture, we can calibrate small p‐values so as to measure their meaningfulness while absolving them of cheating. There are various ways to implement this calibration, but one of them leads to a very simple rule of thumb: take the square root of the p‐value. Thus rejection at a significance level of 0.01 requires a p‐value of one in 10,000.

3

slide-4
SLIDE 4

4

Part 1. Significance levels and p‐values

  • Is use of p‐values cheating?

Part 2. Game‐theoretic probability

  • Use a game to define probability
  • Game‐theoretic justification of significance testing

Part 3. The dynamic nature of game‐theoretic testing

  • Evidence can go up and then back down.
  • Pretending you had stopped earlier = using p‐value

Part 4. Calibrating p‐values

  • Averaging stopped versions of Skeptic’s play
  • The square‐root calibrator
slide-5
SLIDE 5

Part 1. Significance levels and p‐values

5

Is use of p‐values is cheating?

  • R. A. Fisher, 1890‐1962

Karl Pearson, 1857‐1936

The emphasis on p‐values began with Karl Pearson and R. A. Fisher.

slide-6
SLIDE 6

Part 1. Significance levels and p‐values

6

slide-7
SLIDE 7

Part 1. Significance levels and p‐values

7

Twentieth‐century questions

slide-8
SLIDE 8

Part 1. Significance levels and p‐values

8

slide-9
SLIDE 9

Part 1. Significance levels and p‐values

9

slide-10
SLIDE 10

Part 1. Significance levels and p‐values

10

slide-11
SLIDE 11

Part 2. Game‐theoretic probability

11

  • Use a game to define probability
  • Game‐theoretic justification of significance testing

Pierre Fermat, 1601‐1665 Blaise Pascal, 1623‐1662

Fermat: probability = measure of cases that produce event. Pascal: probability = capital you risk to get 1 if event happens.

slide-12
SLIDE 12

Pascal’s question to Fermat in 1654

64 Peter Paul Peter Paul

Paul needs 2 points to win. Peter needs only one.

12

Part 2. Game‐theoretic probability

If the game must be broken off, how much of the stake should Paul get?

Paul’s payoffs are shown.

slide-13
SLIDE 13

Fermat’s answer

Suppose they play two rounds. There are 4 possible outcomes: 1. Peter wins first, Peter wins second 2. Peter wins first, Paul wins second 3. Paul wins first, Peter wins second 4. Paul wins first, Paul wins second

Paul wins only in outcome 4. So his share should be ¼, or 16 pistoles.

Pascal didn’t like the argument.

13

64 Peter Paul Peter Paul

Part 2. Game‐theoretic probability

slide-14
SLIDE 14

Pascal’s answer (game theory)

32 16 64 Peter Paul Peter Paul

14

Part 2. Game‐theoretic probability

slide-15
SLIDE 15

15

Peter Paul 16 Peter Paul 32 64 Peter Paul 1/4 Peter Paul 1/2 1

Part 2. Game‐theoretic probability

slide-16
SLIDE 16

Measure‐theoretic probability:

  • Classical: elementary events with probabilities adding to one.
  • Modern: space with filtration and probability measure.

Probability of A = total probability of elementary events favoring A Game‐theoretic probability:

  • Forecaster offers prices for uncertain payoffs.
  • Skeptic decides what to buy.

Probability of A = stake Skeptic must risk to get 1 if A happens Upper probability of A = stake Skeptic must risk to get at least 1 if A happens

16

Part 2. Game‐theoretic probability

slide-17
SLIDE 17

17

Sequential nature of the game is fundamental. Ville’s Picture On each round:

  • 1. Skeptic decides which offers to accept.
  • 2. Reality decides the outcome.

Ville revived game‐theoretic probability (e.g., martingales) in 1939.

Jean Ville 1910‐1989

Part 2. Game‐theoretic probability

slide-18
SLIDE 18

18

Part 2. Game‐theoretic probability

slide-19
SLIDE 19

Ville’s game‐theoretic foundation for classical probability

  • An event has probability zero if and only if Skeptic can

multiply his capital infinitely if the event fails.

  • An event has probability <1/K if and only if Skeptic can

multiply his capital by K if the event fails.

19

Part 2. Game‐theoretic probability

Vovk and I generalize in two ways: 1. We say upper probability instead of probability when too few bets are offered to construct an exact 0/1 payoff. 2. We allow bets to be offered in the course of the game by a forecaster. Vovk’s Picture On each round: 1. Forecaster offers bets. 2. Skeptic decides which offers to accept. 3. Reality decides the outcome.

slide-20
SLIDE 20

20

Game‐theoretic justification of significance testing

Part 2. Game‐theoretic probability

slide-21
SLIDE 21

21

Game‐theoretic justification of significance testing

The gambling picture justifies significance testing. Don’t try to understand game‐theoretic probability in terms of classical statistics. The logic goes in the other direction.

Part 2. Game‐theoretic probability

slide-22
SLIDE 22

22

Game‐theoretic explanation of why p‐values are less convincing

Part 2. Game‐theoretic probability

Strategy depends on p!!

slide-23
SLIDE 23

Part 3. Dynamic nature of game‐theoretic testing

23

  • Evidence can go up and then back down.
  • Pretending you stopped earlier = using p‐value
slide-24
SLIDE 24

24

Classical Cournot principle

Meaning of probability model = Event of small probability 1/K selected in advance will not happen.

Classic principle as special case of game‐theoretic principle:

1. Assume forecast on each round is probability distribution for Reality’s next move. 2. Fix a strategy for Forecaster, thus defining a classical probability model for Reality’s moves. 3. Fix a strategy for Skeptic (including a stopping time). 4. Fix a factor K by which Skeptic aims to multiply capital.

In this special case, the two principles are equivalent, because Game‐theoretic Cournot principle

Meaning of forecasts = Skeptic will not multiply capital risked by large factor.

Part 3. Dynamic nature of game‐theoretic testing

slide-25
SLIDE 25

25

The scope of the generalization:

  • 1. Forecast on each round may fall short of a complete

probability distribution for Reality’s next move.

  • 2. Forecaster need not follow a strategy.

3.Skeptic need not follow a strategy.

  • 4. Skeptic need not set a goal for multiplying his capital.
  • 5. But the stopping time must be fixed.

Classical Cournot principle

Meaning of probability model = Event of small probability 1/K selected in advance will not happen.

Game‐theoretic Cournot principle

Meaning of forecasts = Skeptic will not multiply capital risked by large factor.

Part 3. Dynamic nature of game‐theoretic testing

slide-26
SLIDE 26

26

Skeptic need not follow a strategy. But we study strategies for Skeptic in order to see what he can accomplish. The capital process for a strategy for Skeptic is called a martingale. (This usage is due to Jean Ville.)

Part 3. Dynamic nature of game‐theoretic testing

slide-27
SLIDE 27

27

  • Forecaster gives 50‐50 odds on Paul each time.
  • Skeptic’s strategy: Start with 16 pistols and bet all his money on

Paul each time.

  • The numbers in red constitute the martingale for this strategy.

Peter Paul 16 Peter Paul 32 64

Part 3. Dynamic nature of game‐theoretic testing

slide-28
SLIDE 28

28

Selecting a strategy for Skeptic (martingale) In probability case, the martingale is a likelihood ratio. This is how the notion of an alternative hypothesis enters the picture.

Part 3. Dynamic nature of game‐theoretic testing

slide-29
SLIDE 29

29

The scope of the generalization:

  • 1. Forecast on each round may fall short of a complete

probability distribution for Reality’s next move.

  • 2. Forecaster need not follow a strategy.
  • 3. Skeptic need not follow a strategy.
  • 4. Skeptic need not set a goal for multiplying his capital.
  • 5. But the stopping time must be fixed.

Classical Cournot principle

Meaning of probability model = Event of small probability 1/K selected in advance will not happen.

Game‐theoretic Cournot principle

Meaning of forecasts = Skeptic will not multiply capital risked by large factor.

Part 3. Dynamic nature of game‐theoretic testing

slide-30
SLIDE 30

30

Cheating by pretending you stopped earlier… Which result best reports the evidence against Forecaster? $1 → $1024? Or $1 → $0?

Paul 1 Peter 2 One each round, Forecaster gives 50‐50 odds on Paul beating Peter on each round. To refute Forecaster, Skeptic bets all his money on Paul each

  • time. We wins his first 10 bets, turning his $1 into $1024.

Paul 4 Paul 8 . . . Paul 1024 He should have stopped, but instead he bets once more and loses. Paul 1 2 Paul 4 Paul 8 . . . Paul 1024

Part 3. Dynamic nature of game‐theoretic testing

slide-31
SLIDE 31

31

Part 3. Dynamic nature of game‐theoretic testing We use this protocol to fix ideas, but this lecture’s ideas apply to any instantiation of

  • ur three‐player game.
slide-32
SLIDE 32

32

Part 3. Dynamic nature of game‐theoretic testing

slide-33
SLIDE 33

33

Part 3. Dynamic nature of game‐theoretic testing

slide-34
SLIDE 34

Part 4. Calibrating p‐values

34

  • Averaging stopped versions of Skeptic’s play
  • The square root calibrator
slide-35
SLIDE 35

Part 4. Calibrating p‐values

35

slide-36
SLIDE 36

Part 4. Calibrating p‐values

36

This is a special case of the general three‐player game, with Honest Skeptic playing the role of Skeptic. Glenn’s move is auxiliary information provided on each round by Reality.

slide-37
SLIDE 37

Part 4. Calibrating p‐values

37

slide-38
SLIDE 38

Part 4. Calibrating p‐values

38

slide-39
SLIDE 39

Part 4. Calibrating p‐values

39

slide-40
SLIDE 40

Part 4. Calibrating p‐values

40

With this calibrator, a cheating p‐value of 0.002 is reduced to about 0.05.

p‐value adjusted p‐value

0.05 20 3 0.3 0.02 50 6 0.2 0.01 100 9 0.1 0.002 500 21 0.05 0.001 1,000 31 0.03 0.0001 10,000 99 0.01 0.00001 100,000 315 0.003 0.000001 1,000,000 999 0.001

slide-41
SLIDE 41

Part 4. Calibrating p‐values

41

slide-42
SLIDE 42

Part 4. Calibrating p‐values

42

slide-43
SLIDE 43

Part 4. Calibrating p‐values

43

For low p‐values, the second calibrator justifies slightly stronger evidential claims.

p‐value adjusted p‐value

0.05 20 3 0.3 0.02 50 6 0.2 0.01 100 9 0.1 0.002 500 21 0.05 0.001 1,000 31 0.03 0.0001 10,000 99 0.01 0.00001 100,000 315 0.003 0.000001 1,000,000 999 0.001 4 0.2 7 0.2 9 0.1 26 0.04 42 0.02 236 0.004 1,509 0.0007 10,478 0.0001

adjusted p‐value

slide-44
SLIDE 44

Part 4. Calibrating p‐values

44

When to use a calibrator

  • To discount a test result reported by someone who

looked back to see when he had the strongest evidence.

  • To discount any p‐value if a level of significance

was not adopted in advance.

When to discount even more

  • When the reported p‐value has been selected as

the most significant of many p‐values.

slide-45
SLIDE 45

Extra Slides

45

slide-46
SLIDE 46

46

Strategy for Skeptic (Ville 1939):

Part 2. Game‐theoretic probability

slide-47
SLIDE 47

47

Generalizations

Part 2. Game‐theoretic probability

slide-48
SLIDE 48

48

In practice, only finitely many rounds.

Strong (infinitary) law of large numbers Borel 1909 Weak (finitary) law of large numbers Bernoulli 1713

Part 2. Game‐theoretic probability

slide-49
SLIDE 49

49

The analogy is not perfect:

Part 2. Game‐theoretic probability

slide-50
SLIDE 50

50

Recall the definition of upper expectation from Lecture 1. So classical statistical testing already uses upper expectations. Upper expectations can also be characterized as suprema.

Part 2. Game‐theoretic probability

slide-51
SLIDE 51

51

Classical statistical testing uses upper expectations:

Part 2. Game‐theoretic probability