Mathematical Tools for Neural and Cognitive Science
Probability & Statistics: Intro, summary statistics, probability
Fall semester, 2018
1
- Efron & Tibshirani, Introduction to the Bootstrap, 1998
Probability & Statistics: Intro, summary statistics, probability - - PDF document
1 Mathematical Tools for Neural and Cognitive Science Fall semester, 2018 Probability & Statistics: Intro, summary statistics, probability 2 - Efron & Tibshirani, Introduction to the Bootstrap , 1998 3 Some history 1600s:
estimation/decision theory (e.g., Shannon, Wiener, etc)
Wald, Savage, Jaynes…)
Tukey)
statistical inference + lots of data)
Summarize/fit model(s), compare with predictions Create/modify hypothesis/model Generate predictions, design experiment Observe / measure data
L∞ m(~ x) µ(~ x) = arg min
c
1 N
N
X
n=1
2 = 1 N
N
X
n=1
xn arg min
c
" 1 N
N
X
n=1
|xn − c|p #1/p
d(~ x) = 1 N
N
X
n=1
x)
x) = min
c
" 1 N
N
X
n=1
(xn − c)2 #1/2 = " 1 N
N
X
n=1
(xn − µ(~ x))2 #1/2
{xn}
{ck, hk}
p(x)
data { ⃗ x n} probabilistic model pθ( ⃗ x )
Measurement Inference
You pick a family at random and discover that one
The stork delivers boys and girls randomly, with family probability {BB,BG,GB,GG}={0.2,0.3,0.2,0.3} probabilistic model In Middleville, every family has two children, brought by the stork. What are the chances that the other child is a girl?
inference
In Middleville, every family has two children, brought by the stork. In a survey of 100 of the Middleville families, 32 have two girls, 23 have two boys, and the remainder one of each.
inference The stork delivers boys and girls randomly, with family probability {BB,BG,GB,GG}={0.2,0.3,0.2,0.3} You pick a family at random and discover that one
What are the chances that the other child is a girl?
let X, Y, Z be random variables they can take on values (like ‘heads’ or ‘tails’; or integers 1-6; or real-valued numbers) let x, y, z stand generically for values they can take, and denote events such as X = x write the probability that X takes on value x as P(X = x), or PX(x), or sometimes just P(x) P(x) is a function over values x, which we call the probability “distribution” function (pdf) (for continuous variables, “density”)
Useful to have this notation up on slid, while introducing concepts on board
P(x) p(x)
i
−∞ ∞
Discrete random variable Continuous random variable
1 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1 2 3 4 5 6 7 8 9 10 11 0.05 0.1 0.15 0.2 0.25 200 400 600 800 1000 0.02 0.04 0.06 0.08 0.1 2 3 4 5 6 7 8 9 10 11 12 0.05 0.1 0.15 0.2 1 2 3 4 5 6 0.05 0.1 0.15 0.2
a not-quite-fair coin sum of two rolled fair dice clicks of a Geiger counter, in a fixed time interval horizontal velocity of gas molecules exiting a fan ... and, time between clicks
roll of a fair die
2 4 5 3 6 7 8 9 10
[the mean, ] µ E(X ) = xi p(xi)
i=1 N
1 2 3 4 # of credit cards 0.1 0.2 0.3 0.4 0.5 0.6 0.7 P(x)
µ E( f (X )) = f (xi)p(xi)
i=1 N
More generally:
E(x) = Z x p(x) dx E(x2) = Z x2 p(x) dx E
= Z (x − µ)2 p(x) dx = Z x2 p(x) dx − µ2 E (f(x)) = Z f(x) p(x) dx [mean, ] µ [“second moment”, m2] σ2 [variance, ] Note: this is an inner product, and thus linear: [equal to m2 minus ] μ2 E (af(x) + bg(x)) = aE (f(x)) + bE (g(x)) [“expected value of f ”]
2 3 4 5 6 7 8 9 101112 0.05 0.1 0.15 0.2 x p(x) 2 4 6 8 10 12 1 x c(x) 50 100 150 x p(x) 50 100 150 0.5 1 x c(x)
−∞
0.125 0.25 0.375 0.5 0.25 0.5 0.75 1
[on board]
P(Ace) P(Heart) P(Ace & Heart) P(Ace | Heart) P(not Jack of Diamonds) P(Ace | not Jack of Diamonds) “Independence”
A B A & B
p(A| B) = probability of A given that B is asserted to be true = p(A& B) p(B)
Neither A nor B
p(x, y) p(x|y = 68)
p(x|y = 68) = p(x, y = 68) Z p(x, y = 68)dx = p(x, y = 68) . p(y = 68)
P(x|Y=68)
p(x|y) = p(x, y)/p(y) More generally:
normalize (by marginal) slice joint distribution
A B A & B
p(A& B) = p(B)p(A| B) = p(A)p(B | A) ⇒ p(A| B) = p(B | A)p(A) p(B) p(A| B) = probability of A given that B is asserted to be true = p(A& B) p(B)
(a direct consequence of the definition of conditional probability)
P(x|Y=120) P(x)
In general, the marginals for different Y values differ. When are they they same? In particular, when are all conditionals equal to the marginal?
Random variables X and Y are statistically independent if (and only if): Independence implies that all conditionals are equal to the corresponding marginal: p(x, y) = p(x)p(y) ∀ x, y
[note: for discrete distributions, this is an outer product!]
In addition, if X and Y are independent, then
2 = E
2
2 +σ Y 2
and is a convolution of and
Let Z = X + Y. Since expectation is linear:
[on board]
−4 −3 −2 −1 1 2 3 4 50 100 150 200 250 300 350 400 450 500 (u+u+u+u)/sqrt(4) −4 −3 −2 −1 1 2 3 4 50 100 150 200 250 104 samples of uniform dist −4 −3 −2 −1 1 2 3 4 100 200 300 400 500 600 10 u’s divided by sqrt(10) −4 −3 −2 −1 1 2 3 4 50 100 150 200 250 300 350 400 450 (u+u)/sqrt(2)
Central limit for a uniform distribution...
10k samples, uniform density (sigma=1)
0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 5000 6000
0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 avg of 4 coins 0.2 0.4 0.6 0.8 1 500 1000 1500 2000 avg of 16 coins 0.2 0.4 0.6 0.8 1 500 1000 1500 2000 avg of 64 coins 0.2 0.4 0.6 0.8 1 500 1000 1500 2000 2500 avg of 256 coins
Central limit for a binary distribution...