Structured Probabilistic Models for Deep Learning
Lecture slides for Chapter 16 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-10-04
Structured Probabilistic Models for Deep Learning Lecture slides for - - PowerPoint PPT Presentation
Structured Probabilistic Models for Deep Learning Lecture slides for Chapter 16 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-10-04 Roadmap Challenges of Unstructured Modeling Using Graphs to Describe Model Structure
Structured Probabilistic Models for Deep Learning
Lecture slides for Chapter 16 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-10-04
Tasks for Generative Models
(Berthelot et al, 2017) Images are 128 pixels wide, 128 pixels tall R, G, and B pixel at each location.
Number of values per variable Number of variables For BEGAN faces: 256 For BEGAN faces: 128 × 128 = 16384 There are roughly ten to the power of forty thousand times more points in the discretized domain of the BEGAN face model than there are atoms in the universe.
Tabular Approach is Infeasible
parameters requires extremely high number of training examples
t0 t0 t1 t1 t2 t2
Alice Bob Carol
Figure 16.2
p(x) = Πip(xi | PaG(xi)). (16.1)
p(t0, t1, t2) = p(t0)p(t1 | t0)p(t2 | t1). (16.2)
Directed models work best when influence clearly flows in one direction
hr hr hy hy hc hc
Undirected models work best when influence has no clear direction or is best modeled as flowing in both directions Do you have a cold? Does your roommate have a cold? Does your work colleague have a cold?
˜ p(x) = ΠC∈Gφ(C). (16.3)
p(x) = 1 Z ˜ p(x), (16.4)
Z = Z ˜ p(x)dx. (16.5)
Unnormalized probability Partition function
a s b a s b
(a) (b)
When s is not observed, influence can flow from a to b and vice versa through s. When s is observed, it blocks the flow of influence between a and b: they are separated
a b c d
The nodes a and c are separated One path between a and d is still active, though the other path is blocked, so these two nodes are not separated.
The flow of influence is more complicated for directed models The path between a and b is active for all of these graphs:
a s b a s b
(a)
a s b
(b)
(a)
a s b
(c)
(b)
a s b c
a b c d e
Observing variables can activate paths!
A complete graph can represent any probability distribution
The benefits of graphical models come from omitting edges
Converting between graphs
represented by either an undirected or a directed graph
independences that one kind of graph fails to imply (the distribution is simpler than the graph describes; need to know the conditional probability distributions to see the independences)
Converting directed to undirected
h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 a b c a c b h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 a b c a c bMust add an edge between unconnected coparents
Converting undirected to directed
a b d c a b d c a b d c
No loops of length greater than three allowed! Add edges to triangulate long loops Assign directions to
directed cycles allowed.
Factor graphs are less ambiguous
a b c a b c f1 f1 a b c f1 f1 f2 f2 f3 f3
Undirected graph: is this three pairwise potentials or one potential over three variables? Factor graphs disambiguate by placing each potential in the graph
Sampling from directed models
model
topological order. Sample each node given its parents.
unless the observed nodes are at the start of the topology
Sampling from undirected models
approximate
variables in factor with largest scope
parameters
Learning about dependencies
Structure learning strategy
graph so far (remove edge / add edge / flip edge)
strongly with only a small subset of observed variables
Inference and Approximate Inference
conditional distribution of some nodes given other nodes is #P hard
hardness describes counting problems, e.g., how many solutions are there to a problem where finding one solution is NP-hard
chapter 19
Deep Learning Stylistic Tendencies
h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 h4 h4
Figure 16.14: An RBM drawn as a Markov network.