Daphne Koller
Structure' Learning'
Probabilis2c' Graphical' Models'
BN'Structure' Learning'
Structure' Learning' Daphne Koller Why Structure Learning To - - PowerPoint PPT Presentation
Learning' Probabilis2c' Graphical' BN'Structure' Models' Structure' Learning' Daphne Koller Why Structure Learning To learn model for new queries, when domain expertise is not perfect For structure discovery, when inferring network
Daphne Koller
Probabilis2c' Graphical' Models'
BN'Structure' Learning'
Daphne Koller
Daphne Koller
cannot be learned
A
B D C
Adding an arc Missing an arc A
B D C
A
B D C
Daphne Koller
A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>
A B C C B A C B A
Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data
Daphne Koller
Probabilis3c) Graphical) Models)
BN)Structurds) Learning)
Daphne Koller
Daphne Koller
X Y X Y
Daphne Koller
Daphne Koller
– In empirical distribution
almost always helps
connected network
X Y X Y
Daphne Koller
– restrict # of parents or # of parameters
– Explicitly – Bayesian score averages over all possible parameter values
Daphne Koller
relative to G, using MLE parameters
– Parameters optimized for D
terms of (in)dependencies in G
don’t impose constraints)
Daphne Koller
Probabilis3c$ Graphical$ Models$
BN$Structure$ Learning$
Daphne Koller
Daphne Koller
complexity grows logarithmically with M
– As M grows, more emphasis is given to fit to data
Daphne Koller
– Asymptotically, spurious edges will not contribute to likelihood and will be penalized – Required edges will be added due to linear growth of likelihood term compared to logarithmic growth of model complexity
Daphne Koller
– Its negation often called MDL
– If data generated by G*, networks I-equivalent to G* will have highest score as M grows to ∞
Daphne Koller
Probabilis0c( Graphical( Models(
BN(Structure( Learning(
Daphne Koller
Marginal likelihood Prior over structures Marginal probability of Data
Daphne Koller
Likelihood Prior over parameters
Daphne Koller
Daphne Koller
∞ − −
= Γ
1
) ( dt e t x
t x
) 1 ( ) ( − Γ ⋅ = Γ x x x
Daphne Koller
Daphne Koller
– Uniform prior: P(G) ∝ constant – Prior penalizing # of edges: P(G) ∝ c|G| (0<c<1) – Prior penalizing # of parameters
Daphne Koller
– α: equivalent sample size – B0: network representing prior probability of events – Set α(xi,pai
G) = α P(xi,pai G| B0)
G are not the same as parents of Xi in B0
candidate networks
networks have the same Bayesian score
Daphne Koller
Daphne Koller
– BDe requires assessing prior network – Can naturally incorporate prior knowledge – I-equivalent networks have same score
– Asymptotically equivalent to BIC – Asymptotically consistent – But for small M, BIC tends to underfit
Daphne Koller
Probabilis4c' Graphical' Models'
BN'Structure' Learning'
Daphne Koller
A,B,C <1,0,0> <1,1,1> <0,0,1> <0,1,1> . . <0,1,0>
A B C C B A C B A
Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data
Daphne Koller
Input:
– Training data – Scoring function (including priors, if needed) – Set of possible structures
Output: A network that maximizes the score Key Property: Decomposability
Daphne Koller
– At most one parent per variable
– Elegant math – Efficient optimization – Sparse parameterization
Daphne Koller
Score of “empty” network Improvement over “empty” network
Daphne Koller
Optimal structure is always a tree
Optimal structure might be a forest
P ˆ
Daphne Koller
– Such scores include likelihood, BIC, and BDe
Daphne Koller
– Standard algorithms for max-weight spanning trees (e.g., Prim’s or Kruskal’s) in O(n2) time – Remove all edges of weight 0 to produce a forest
(for score-equivalent scores)
Daphne Koller
PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP HYPOVOLEMIA CVP BP
Correct edges Spurious edges
Tree learned from data of Alarm network
Daphne Koller
Daphne Koller
Probabilis2c' Graphical' Models'
BN'Structure' Learning'
Daphne Koller
– Training data – Scoring function – Set of possible structures
Daphne Koller
– Example: Allowing two parents, greedy algorithm is no longer guaranteed to find the optimal network
– Finding maximal scoring network structure with at most k parents for each variable is NP-hard for k>1
Daphne Koller
A B C D A B C D A B C D A B C D
Daphne Koller
– local steps: edge addition, deletion, reversal – global steps
– Greedy hill-climbing – Best first search – Simulated Annealing – ...
Daphne Koller
– empty network – best tree – a random network – prior knowledge
– Consider score for all possible changes – Apply change that most improves the score
Daphne Koller
– Local maxima – Plateaux
neighbors in the search space
Daphne Koller
B A C B A C
Daphne Koller
– When we get stuck, take some number of random steps and then start climbing again
– Keep a list of K steps most recently taken – Search cannot reverse any of these steps
Daphne Koller
0.5 1 1.5 2 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
KL Divergence M True Structure/BDe α = 10 Unknown Structure/BDe α = 10
Daphne Koller
Horvitz, Apacible, Sarin, & Liao, UAI 2005
Daphne Koller
Horvitz, Apacible, Sarin, & Liao, UAI 2005
Daphne Koller
Horvitz, Apacible, Sarin, & Liao, UAI 2005
Daphne Koller
Horvitz, Apacible, Sarin, & Liao, UAI 2005
Daphne Koller
Known 15/17 Supported 2/17 Reversed 1 Missed 3
From “Causal protein-signaling networks derived from multiparameter single-cell data” Sachs et al., Science 308:523, 2005. Reprinted with permission from AAAS.
Phospho-Proteins Phospho-Lipids Perturbed in data
PKC Raf Erk Mek Plcγ PKA Akt Jnk P38 PIP2 PIP3 Subsequently validated in wetlab
This figure may be used for non-commercial and classroom purposes only. Any other uses require the prior written permission from AAAS
Daphne Koller
– when domain experts don’t know the structure – for knowledge discovery
– local steps: edge addition, deletion, reversal – hill-climbing with tabu lists and random restarts
Daphne Koller
Probabilis5c' Graphical' Models'
BN'Structure' Learning'
Daphne Koller
A B C D A B C D A B C D A B C D
Daphne Koller
– Components in score – Compute sufficient statistics – Acyclicity check
Daphne Koller
A B C D A B C D Δscore(D) = Score(D | {B,C})
score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {C}) score = Score(A | {}) + Score(B | {}) + Score(C | {A,B}) + Score(D | {B,C})
Daphne Koller
A B C D A B C D A B C D A B C D Δscore(D) = Score(D | {B,C})
Δscore(C) = Score(C | {A})
Δscore(C)+Δscore(B) = Score(C | {A})
+ Score(B | {C})
Daphne Koller
A B C D A B C D A B C D A B C D
To recompute scores,
that changed in the last move
Δscore(C) = Score(C | {A})
Daphne Koller
– Compute O(n) delta-scores damaged by move – Each one takes O(M) time
Daphne Koller
Daphne Koller