COMP90051 Statistical Machine Learning
- 23. PGM Statistical Inference
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - - PowerPoint PPT Presentation
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data fitting probability tables to
Deck 23 Statistical Machine Learning (S2 2017)
2
Deck 23 Statistical Machine Learning (S2 2017)
* PGMs encode conditional independence
* Computing other distributions from joint * Elimination, sampling algorithms
* Learn parameters from data
3
Deck 23 Statistical Machine Learning (S2 2017)
4
False ? True ? False ? True ? False ? True ? HT false true FG f t f t False ? ? ? ? True ? ? ? ? FA false true HG f t f t False ? ? ? ? True ? ? ? ?
Deck 23 Statistical Machine Learning (S2 2017)
* If we observe all r.v.’s 𝒀 in a PGM independently 𝑜 times 𝒚𝑗 * Then maximise the full joint
arg max
*∈, ∏
∏ 𝑞 𝑌𝑘 = 𝑦34|𝑌6789:;< 4 = 𝑦36789:;< 4
: 3>?
* Maximise log-likelihood instead; becomes sum of logs
arg max
*∈, ∑
∑ log 𝑞 𝑌𝑘 = 𝑦𝑗𝑘|𝑌6789:;< 4 = 𝑦36789:;< 4
: 3>?
* Big maximisation of all parameters together, decouples into small independent problems
5
ASi HGi FAi HTi FGi i=1..n
Deck 23 Statistical Machine Learning (S2 2017)
6
i=1..n
false ? true ? false ? true ? false ? true ? FA false true HG f t f t false ? ? ? ? true ? ? ? ? HT false true FG f t f t false ? ? ? ? true ? ? ? ?
# 𝒚𝒋|𝑮𝑯𝒋 = 𝒖𝒔𝒗𝒇 𝒐 # 𝒚𝒋|𝑮𝑯𝒋 = 𝒈𝒃𝒎𝒕𝒇 𝒐 # 𝒚𝒋|𝑰𝑯𝒋 = 𝒖𝒔𝒗𝒇, 𝑰𝑼𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯𝒋 = 𝒈𝒃𝒎𝒕𝒇 # 𝒚𝒋|𝑰𝑼𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯𝒋 = 𝒈𝒃𝒎𝒕𝒇
Deck 23 Statistical Machine Learning (S2 2017)
* Maximise likelihood of observed data only * Marginalise full joint to get to desired “partial” joint
* arg max
*∈, ∏
∑ ∏ 𝑞 𝑌𝑘 = 𝑦34|𝑌6789:;< 4 = 𝑦36789:;< 4
: 3>?
* This won’t decouple – oh-no’s!!
7
ASi HGi FAi HTi FGi i=1..n
Deck 23 Statistical Machine Learning (S2 2017)
* If we had guesses for the missing variables * We could employ MLE on fully-observed data
* Updating missing data * Updating probability tables/parameters
8
ASi HGi FAi HTi FGi i=1..n
Deck 23 Statistical Machine Learning (S2 2017)
9
i=1..n
false
?
true
?
false
?
true
?
false ? true ? FA false true HG f T f t false
? ? ? ?
true
? ? ? ?
HT false true FG f t f t false
? ? ? ?
true
? ? ? ?
Deck 23 Statistical Machine Learning (S2 2017)
10
i=1..n
false
?
true
?
false
?
true
?
false 0.9 true 0.1 FA false true HG f T f t false
? ? ? ?
true
? ? ? ?
HT false true FG f t f t false
? ? ? ?
true
? ? ? ?
Deck 23 Statistical Machine Learning (S2 2017)
11
i=1..n
false 0.5 true 0.5 false 0.5 true 0.5 false 0.9 true 0.1 FA false true HG f t f t false
0.5 0.5 0.5 0.5
true
0.5 0.5 0.5 0.5
HT false true FG f t f t false
0.5 0.5 0.5 0.5
true
0.5 0.5 0.5 0.5
Deck 23 Statistical Machine Learning (S2 2017)
12
i=1..n
false 0.5 true 0.5 false 0.5 true 0.5 false 0.9 true 0.1 FA false true HG f t f t false
0.5 0.5 0.5 0.5
true
0.5 0.5 0.5 0.5
HT false true FG f t f t false
0.5 0.5 0.5 0.5
true
0.5 0.5 0.5 0.5
Deck 23 Statistical Machine Learning (S2 2017)
13
i=1..n
false 0.7 true 0.3 false 0.6 true 0.4 false 0.9 true 0.1 FA false true HG f t f T false
0.7 0.3 0.4 0.8
true
0.3 0.7 0.6 0.2
HT false true FG f t f t false
0.7 0.4 0.3 0.6
true
0.3 0.6 0.7 0.4
Deck 23 Statistical Machine Learning (S2 2017)
14
i=1..n
false 0.7 true 0.3 false 0.6 true 0.4 false 0.9 true 0.1 FA false true HG f t f T false
0.7 0.3 0.4 0.8
true
0.3 0.7 0.6 0.2
HT false true FG f t f t false
0.7 0.4 0.3 0.6
true
0.3 0.6 0.7 0.4
Deck 23 Statistical Machine Learning (S2 2017)
15
Seed parameters randomly E-step: Complete unobserved data as expectations (point estimates) M-step: Update parameters with MLE on the fully-observed data posterior distributions (prob inference)
Deck 23 Statistical Machine Learning (S2 2017)
* Randomly assign cluster centres * Repeat
clusters
* Randomly seed parameters * Repeat
variables
16
Hard E-step Soft E-step
belonging to each cluster
(e.g., 10% C1 20% C2 70% C3)
variables given observed, current parameters
Deck 23 Statistical Machine Learning (S2 2017)
* What is it and why do we care? * Straight MLE for fully-observed data * EM algorithm for mixed latent/observed data
17