1
Probability and Likelihood, a brief introduction in support of a course on molecular evolution (BIOL 3046)
1 A statistical definition of probability: frequentist 2 concepts: - - PDF document
Probability and Likelihood, a brief introduction in support of a course on molecular evolution (BIOL 3046) Probability The subject of probability is a branch of mathematics dedicated to building models to describe conditions of uncertainty and
Probability and Likelihood, a brief introduction in support of a course on molecular evolution (BIOL 3046)
Month Number of subjects (S) Number Controlled (E) Cumulative S Cumulative E crf
1 100 80 100 80 0.800 2 100 88 200 168 0.840 3 100 75 300 243 0.810 4 100 77 400 320 0.800 5 100 80 500 400 0.800 6 100 76 600 476 0.793 7 100 82 700 558 0.797 8 100 79 800 637 0.796 9 100 80 900 717 0.797 10 100 76 1000 793 0.793 11 100 77 1100 970 0.791 12 100 78 1200 948 0.790 [data for example is after McColl (1995)]
∞ →
n n
In words, the probability of an event E, written as P(E), is the long run (cumulative) relative frequency of E.
Hypothetical plot of crf of an event 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2500 5000 7500 10000
∞ →
n n
Probability axioms:
“space of all possible outcomes” (S), where P(S) = 1. Hence, if the probability
probability of disjoint events E or F = P(E or F) = P(E) + P(F).
extension of axiom 3 to an infinite sequence of events.
Product rule:
The product rule applies when two events E1 and E2 are independent. E1 and E2 are independent if the occurrence or non-occurrence of E1 does not change the probability of E2 [and vice versa]. [A further statistical definition requires the use of the multiplication theorem] It is important to note that a proof of statistical independence for a specific case by using the multiplication theorem is rarely possible; hence, most models incorporate independence as a model assumption. When E1 and E2 occur together they are joint events. The joint probability of the independent events E1 and E2 = P(E1,E2) = P(E1) × P(E2). Hence the term “product rule” or “multiplication principle”, or whatever you call it.
There is a logically satisfying result: since the two rolls are independent, it should not matter what the first roll was, and indeed the outcome of the second roll [conditional on the first roll] was 1/6.
k n k
−
Case 1: probability The question is the same: “If I toss a fair coin 12 times, what is the probability that I will obtain 5 heads and 7 tails?” The answer comes directly from the above formula where n = 12, and k = 5. The probability of such a future event is 0.193359. Axiom 2: P(S) = 1; the probability of each outcome (i.e., 0 to 12 heads) sum to 1.
Probability of 5 heads & 7 tails = 0.1933 Our outcome of 5 heads & 7 tails = 0.1933
Case 2: likelihood The second question is: “What is the probability that my coin is fair if I tossed it 12 times and
We have inverted the problem: In case 1: we were interested in the probability of a future outcome given that my coin is fair. In case 2: we are interested in the probability that my coin is fair, given a particular outcome. So, in the likelihood framework we have inverted the question such that the hypothesis (H) is variable, and the outcome (let’s call it the data, D) is constant.
Case 2: likelihood
A problem: What we want to measure is P(H|D). The problem is that we can’t work with the probability of a hypothesis, only the relative frequencies of
The solution:
Constant value of proportionality
Constant value of proportionality
Constant value of proportionality
k n k
−
PROBABILITIES Data D1: 1H & 1T D2: 2H Hypotheses H1: p(H) = 1/4 0.375 0.0625 H2: p(H) = 1/2 0.5 0.25
k n k
−
LIKELIHOODS Data D1: 1H & 1T D2: 2H Hypotheses H1: p(H) = 1/4 !1 " 0.375 !2 " 0.0625 H2: p(H) = 1/2 !1 " 0.5 !2 " 0.25
Coin toss: What is likelihood that my coin is “fair” given 12 tosses with 5 heads and 7 tails? Is the hypothesis of “fairness” the best explanation of these data? The P(H|D) = α × P(D|H) L = α × P(D|H) L = P(D|H)
k n k
−
D = outcome (n choose k) H = probability (p)
Maximum Likelihood score = 0.228
0.05 0.1 0.15 0.2 0.25 0.2 0.4 0.6 0.8 1
ML estimate of p = 0.42
Likelihood that the coin is fair (p = 0.5) is 0.193. This (p = 0.5) is less likely than the MLE by about:
p =0.5 (L = 0.193) Recall Case 1: probability The question is the same: “If I toss a fair coin 12 times, what is the probability that I will obtain 5 heads and 7 tails?” The answer comes directly from the above formula where n = 12, and k = 5. The probability of such a future event is 0.193359. Axiom 2: P(S) = 1; the probability of each outcome (i.e., 0 to 12 heads) sum to 1.
Probability of 5 heads & 7 tails = 0.1933 Our outcome of 5 heads & 7 tails = 0.1933