Data Mining Techniques
CS 6220 - Section 3 - Fall 2016
Lecture 3: Probability
Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop)
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: - - PowerPoint PPT Presentation
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent ( credit : Zhao, CS 229, Bishop) Project Vote 1. Freeform : Develop your own project proposals 30% of grade (homework 30%) Present
CS 6220 - Section 3 - Fall 2016
Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop)
Homework 1 will be out today (due 30 Sep)
be completed individually (absolutely no sharing of code)
(no late submissions)
(TA’s have authority to deduct points)
Log joint probability of N independent data points Maximum Likelihood
1,2,3,4,5,6 if we roll a dice six times?
like pizza. If three students are chosen at random with replacement, what is the probability that all three students like pizza?
Red bin Blue bin
uit
intro-
Apple Orange If I take a fruit from the red bin, what is the probability that I get an apple?
Conditional Probability P(fruit = apple | bin = red) = 2 / 8 Red bin Blue bin
uit
intro-
Apple Orange
Red bin Blue bin
uit
intro-
Apple Orange Joint Probability P(fruit = apple , bin = red) = 2 / 12
Red bin Blue bin
uit
intro-
Apple Orange Joint Probability P(fruit = apple , bin = blue) = ?
Red bin Blue bin
uit
intro-
Apple Orange Joint Probability P(fruit = apple , bin = blue) = 3 / 12
Red bin Blue bin
uit
intro-
Apple Orange Joint Probability P(fruit = orange , bin = blue) = ?
Red bin Blue bin
uit
intro-
Apple Orange Joint Probability P(fruit = orange , bin = blue) = 1 / 12
P(fruit = apple) = P(fruit = apple , bin = blue) + P(fruit = apple , bin = red) = ?
uit
intro-
P(fruit = apple) = P(fruit = apple , bin = blue) + P(fruit = apple , bin = red) = 3 / 12 + 2 / 12 = 5 / 12
uit
intro-
P(fruit = apple , bin = red) = P(fruit = apple | bin = red) p(bin = red) = ?
uit
intro-
P(fruit = apple , bin = red) = P(fruit = apple | bin = red) p(bin = red) = 2 / 8 * 8 / 12 = 2 / 12
uit
intro-
P(fruit = apple , bin = red) = P(bin = red | fruit = apple) p(fruit = apple) = ?
uit
intro-
P(fruit = apple , bin = red) = P(bin = red | fruit = apple) p(fruit = apple) = 2 / 5 * 5 / 12 = 2 / 12
uit
intro-
Posterior Likelihood Prior
Sum Rule: Product Rule:
Posterior Likelihood Prior
Probability of rare disease: 0.005 Probability of detection: 0.98 Probability of false positive: 0.05 Probability of disease when test positive?
Posterior Likelihood Prior
0.99 * 0.005 + 0.05 * 0.995 = 0.0547 0.99 * 0.005 = 0.00495 0.00495 / 0.0547 = 0.09
The set of all outcomes ω ∈ Ω of an experiment
The set of all possible events A ∈ F, which are subsets A ⊆ Ω of possible outcomes
A function P: F → R
i
If A ⊆ B = ⇒ P(A) ≤ P(B) P(A ∩ B) ≤ min (P(A), P(B)) P(A ∪ B) ≤ P(A) + P(B) (Union Bound) P(Ω \ A) = 1 − P(A) If A1, . . . , Ak is a disjoint partition of Ω, then
k
P
i=1
P(Ak) = 1
Probability of event A, conditioned on
Events A and B are independent iff
which implies
P(B)
What is the probability P(B3)?
What is the probability P(B1 | B3)?
What is the probability P(B2 | A)?
What percent of those who passed the first test also passed the second test?
What is the probability that a house has a backyard given that it has a garage?
Rolling a die:
Rolling two dice at the same time:
a PMF is a function p: R → R such that p(x) = P(X = x) Rolling a die:
Rolling two dice at the same time:
p(X,Y ) X Y = 2 Y = 1 p(Y ) p(X) X X p(X|Y = 1)
x- inter- i-
x δx p(x) P(x)
Statistics Machine Learning
Statistics Machine Learning
Mean Variance Covariance
Bern(x|µ) = µx(1 − µ)1−x E[x] = µ var[x] = µ(1 − µ) mode[x] =
if µ 0.5,
H[ ] = ln (1 ) ln
µ ∈ [0, 1] that ariable x ∈ {0, 1} by a single continuous
Bin(m|N, µ) =
N
m
E[m] = Nµ var[m] = Nµ(1 − µ) mode[m] = ⌊(N + 1)µ⌋
Beta(µ|a, b) = Γ(a + b) Γ(a)Γ(b)µa−1(1 − µ)b−1 E[µ] = a a + b var[µ] = ab (a + b)2(a + b + 1) mode[µ] = a − 1 a + b − 2.
Bin(m|N, µ) =
N
m
[ ] = Beta(µ|a, b) = Γ(a + b) Γ(a)Γ(b)µa−1(1 − µ)b−1 a
Bin(m|N, µ) =
N
m
[ ] = Beta(µ|a, b) = Γ(a + b) Γ(a)Γ(b)µa−1(1 − µ)b−1 a
Posterior Likelihood Prior Observed data (flip outcomes) Unknown variable (coin bias) Example: Biased Coin
Posterior Likelihood Prior Likelihood of outcome given bias Prior belief about bias Example: Biased Coin Posterior belief after trials
Posterior Likelihood Prior
(bias)
Posterior Likelihood Prior
(bias)
Posterior Likelihood Prior
(bias)
Posterior Likelihood Prior
(bias)
K
k
K
k
Dir(µ|α) = C(α)
K
µαk−1
k
E[µk] = αk
var[µk] = αk( α − αk)
α + 1) cov[µjµk] = − αjαk
α + 1) mode[µk] = αk − 1
E[ln ] = ( ) ( )
α = (0.1, 0.1, 0.1) α = (1, 1, 1) α = (10, 10, 10)
N(x|µ, Σ) = 1 (2π)D/2 1 |Σ|1/2 exp
2(x − µ)TΣ−1(x − µ)
= µ cov[x] = Σ mode[x] = µ 1 D
p(x) = N(x|µ, Λ−1) p(y|x) = N(y|Ax + b, L−1)
p(y) = N(y|Aµ + b, L−1 + AΛ−1AT) p(x|y) = N(x|Σ{ATL(y − b) + Λµ}, Σ)
Prior and Likelihood Posterior Maximum A Posteriori (MAP) gives Ridge Regression