Bayesians Can Learn From Old Data William H. Jefferys University of - - PowerPoint PPT Presentation

▶

Jun 11, 2023 327 likes •586 views

General Overview Glymours Argument Summary and Conclusions Bayesians Can Learn From Old Data William H. Jefferys University of Texas at Austin University of Vermont 27th International Workshop on Bayesian Inference and Maximum Entropy

SLIDE 1

General Overview Glymour’s Argument Summary and Conclusions

Bayesians Can Learn From Old Data

William H. Jefferys University of Texas at Austin University of Vermont 27th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, 2007

Jefferys Old Data

SLIDE 2

General Overview Glymour’s Argument Summary and Conclusions

Abstract

In a paper that has been widely-cited within the philosophy of science community, Glymour claims to show that Bayesians cannot learn from old data. His argument contains elementary errors, ones which E. T. Jaynes and others have often warned

about. I explain exactly where Glymour went wrong, and how to

handle the problem correctly. When the problem is fixed, it is seen that Bayesians, just like logicians, can indeed learn from

ld data.

Jefferys Old Data

SLIDE 3

General Overview Glymour’s Argument Summary and Conclusions

Outline

1

General Overview Standard Logic Probability and Logic A Toy Example

2

Glymour’s Argument Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

3

Summary and Conclusions

Jefferys Old Data

SLIDE 4

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

Logic is Fundamental

Combine propositions A, B, ... having truth-values with logical operations ∧, ∨, →, ¬ to calculate truth-values of the combined propositions Time-independent: Results are independent of when we learn the truth-values

Jefferys Old Data

SLIDE 5

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

Cox’s Theorem

Probability theory extends logic to degrees of plausibility

n [0, 1]

Probability theory is the unique extension of logic to degrees of plausibility, given some obvious requirements (Cox)

Jefferys Old Data

SLIDE 6

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

Jaynes’ Desiderata

Jaynes proposes three desiderata that must be satisfied by any reasonable extension of logic to a theory of plausibility:

If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.

The calculation takes into account all of the evidence relevant to the question. It does not arbitrarily ignore some

f the information, basing its conclusion only on what
remains. It is “completely nonideological”

Equivalent states of knowledge are always represented by equivalent plausibility assignments.

Jefferys Old Data

SLIDE 7

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

A Toy Example.

Two theories, T and ¬T = T Two possible observations, E and ¬E = E

T → E T → E

Logic is time-independent; these relations don’t depend on when we learn the truth or falsity of E

Jefferys Old Data

SLIDE 8

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

Example: Relativity vs. Newtonian Physics

T is general relativity. It predicts that we will observe E = anomalous perihelion motion of Mercury. T is Newtonian physics. It predicts that we will observe E = no anomalous perihelion motion. Assume we observe E or E with 100% certainty. Then

T → E T → E

Jefferys Old Data

SLIDE 9

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

We Can Learn From Logic and Knowledge of E

We know E is true. From logic we conclude:

T → E Therefore, E → T Therefore, T and ¬T

Jefferys Old Data

SLIDE 10

General Overview Glymour’s Argument Summary and Conclusions Standard Logic Probability and Logic A Toy Example

The Likelihood

Logic translates into probability statements T → E yields

P(E | T) = 1 P(E | T) = 0

T → E yields

P(E | T) = 0 P(E | T) = 1

Up to a common factor, these four equations are the likelihood function for the two cases of observing E and E.

Jefferys Old Data

SLIDE 11

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

The Argument

Know that E is true Therefore, P(E) = 1 ??? Therefore, P(E | T) = 1 Therefore P(T | E) = P(E | T) P(E) P(T) = P(T) Since P(T | E) = P(T), we haven’t learned anything

Jefferys Old Data

SLIDE 12

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

Violates Desideratum #1 and Cox’s Theorem

Glymour’s argument says that unless P(T) = 1, we can’t conclude that P(T | E) = 1, i.e., that T | E is true But logic tells us that T | E is true, period Cox’s Theorem and Desideratum #1 guarantee that valid calculations must arrive at the same conclusion. The calculation from logic is manifestly correct, therefore Glymour’s argument cannot be valid.

Jefferys Old Data

SLIDE 13

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

Directly Contradicts Logic

Glymour’s questionable assumption that P(E) = 1 generates a contradiction with logic. If P(E) = 1, then P(E | X) = 1 for any X In particular, P(E | T) = 1 Therefore, T → E But we know that T → E = ¬E Glymour’s argument leads to an absurd conclusion: that Newtonian physics predicts anomalous perihelion motion

f Mercury.

Therefore, P(E) = 1, and Glymour’s argument falls apart

Jefferys Old Data

SLIDE 14

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

It’s Even More Absurd

Glymour’s logic goes even further: It says that if E is old data, then for any theory X whatsoever, P(E | X) = 1, so X → E, and thus X predicts that we will observe E. This is obviously absurd; the predictions of a theory depend only on the theory, and are independent of

bservations we may or may not have made.

Under Glymour’s reasoning, if E is “old data”, then every theory X is an example of Jaynes’ dreaded “Sure Thing R ” theory, under which E is just what the theory predicts will be observed.

Jefferys Old Data

SLIDE 15

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

Violates Desideratum #3

“Wigner’s Friend” shows how people with different initial states of knowledge arrive at the same conclusions when their knowledge is made to coincide Glymour’s friend Tom is ignorant of E, so he is entitled to regard E, when he learns it, as “new” data Suppose Tom chooses P(T) = 1/2, informs Glymour, and Glymour agrees on this prior Using Bayes’ theorem, Tom can calculate in advance that if E is observed, then T is true, and if E is observed, then T is true

Jefferys Old Data

SLIDE 16

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

Violates Desideratum #3

When he is informed that E was observed, he concludes that P(T | E) = 1, so T is true. Glymour concludes only that P(T | E) = P(T) = 1/2 Tom and Glymour have the same priors and now know the same relevant facts; but they have reached different conclusions This violates Jaynes’ Desideratum #3.

Jefferys Old Data

SLIDE 17

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

All Probability Is Conditional

Jaynes: A fruitful source of error and even apparent paradoxes in probability theory is to fail to condition properly and explicitly on all background information used Let B represent all background information except E For example, in the toy example B includes physics, e.g.., T → E B can also be regarded as including the priors P(T | B) . . . This point of view makes Glymour’s error embarrassingly

bvious. Just systematically insert the conditioning

information E, B that he actually used into his proof

Jefferys Old Data

SLIDE 18

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

What Glymour Actually Proved

P(E | E, B) = 1 !!! P(E | E, T, B) = 1 P(T | E, E, B) = P(E | E, T, B) P(E | E, B) P(T | E, B) = P(T | E, B) Glymour has actually proved that no one can abuse the Bayesian machinery by using the same evidence twice

Jefferys Old Data

SLIDE 19

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

It’s Logic, Not Epistemology

Glymour’s failure to condition explicitly on all the background information he used misled him into thinking that P(E) is an epistemological statement about our knowledge of E at any point t in time (he even used subscripts t to make this point). This viewpoint is incorrect. P(E | B) is a logical relationship between B and E Its value depends on B It predicts the probability of observing E, given that we know only the theory and the priors, under the mixture model defined by the likelihood and priors

Jefferys Old Data

SLIDE 20

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

The Toy Example

The background information B for the toy example consists

f physics (T → E and T → E) and the priors (P(T | B)

and P(T | B)) This yields P(E | B) = P(E | T, B)P(T | B) + P(E | T, B)P(T | B) = P(T | B)

Jefferys Old Data

SLIDE 21

General Overview Glymour’s Argument Summary and Conclusions Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong

Violates Desideratum #2

Logic deduces T from E by using T → E Glymour’s argument does not use that fact, and even denies that it is true The correct Bayesian calculation makes use of that information in the form P(E | T, B) = 0, when calculating the denominator P(E | B) Misled by poor notation, Glymour “jumped to a conclusion” and never considered the T part of the likelihood, thus ignoring crucial information He didn’t condition carefully, and the consequence was that he also violated Jaynes’ Desideratum # 2

Jefferys Old Data

SLIDE 22

General Overview Glymour’s Argument Summary and Conclusions

Jaynes Was Right

Jaynes was right. Probability theory is a generalization of

rdinary logic; Probability theory and logic must give

consistent results. “Time plays the same role in probability theory as it does in logic: That is to say, no role whatsoever” — Tom Loredo There is one, and only one Bayesian way to take into account your knowledge of a particular piece of information: To condition on it. You leave out explicit conditioning at your peril!

Jefferys Old Data

SLIDE 23

General Overview Glymour’s Argument Summary and Conclusions

Philosophy of Science and the Likelihood Function

The philosophy of science community seems unclear on the exact nature of the likelihood They routinely regard it as a probability whose absolute value is relevant, rather than as an equivalence class of functions on the hypothesis space for fixed data that happens to be proportional to the sampling distribution Arguments that depend on the absolute value of the denominator in Bayes’ theorem, rather than regarding it as a mere normalizing factor to be computed, are common Glymour’s argument would probably not have gotten far, had this community had a firmer grasp of the likelihood concept

Jefferys Old Data

SLIDE 24

General Overview Glymour’s Argument Summary and Conclusions

Thanks!

I thank Jim Berger, David van Dyk, and especially Rob Pennock and Tom Loredo for valuable discussions and suggestions I dedicate this paper to the memory of Edwin T. Jaynes. I was not fortunate to know him personally, but I have learned much from his writings.

Jefferys Old Data