Confronting the Partition Function Lecture slides for Chapter 18 of - PowerPoint PPT Presentation

Nov 13, 2022 •332 likes •425 views

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2017-12-29 Unnormalized models 1 p ( x ; ) = Z ( ) p ( x ; ) . (18.1) where Z is Z p ( x ) d x

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2017-12-29
Unnormalized models 1 p ( x ; θ ) = Z ( θ ) ˜ p ( x ; θ ) . (18.1) where Z is Z p ( x ) d x ˜ (18.2) or X p ( x ) . ˜ (18.3) x (Goodfellow 2017)
Gradient of log-likelihood r θ log p ( x ; θ ) = r θ log ˜ p ( x ; θ ) � r θ log Z ( θ ) . (18.4) Negative phase: Positive phase: push down model push up on data points samples (Goodfellow 2017)
Negative phase sampling r θ log Z (18.5) (18.13) = E x ∼ p ( x ) r θ log ˜ p ( x ) . (Goodfellow 2017)
Basic learning algorithm for undirected models • For each minibatch: • Generate model samples • Compute positive phase using data samples • Compute negative phase using model samples • Combine positive and negative phases, do a gradient step to update parameters (Goodfellow 2017)
The positive phase The negative phase p model ( x ) p model ( x ) p data ( x ) p data ( x ) p(x) p(x) x x Figure 18.1: The view of algorithm 18.1 as having a “positive phase” and a “negative phase.” (Left) In the positive phase, we sample points from the data distribution and push up on their unnormalized probability. This means points that are likely in the data get pushed up on more. (Right) In the negative phase, we sample points from the model distribution and push down on their unnormalized probability. This counteracts the positive phase’s tendency to just add a large constant to the unnormalized probability everywhere. When the data distribution and the model distribution are equal, the positive phase has the same chance to push up at a point as the negative phase has to push down. When this occurs, there is no longer any gradient (in expectation), and training must terminate. (Goodfellow 2017)
Challenge: model samples are slow • Undirected models usually need Markov chains • Naive approach: run the Markov chain for a long time starting from random initialization each minibatch • Speed tricks: • Contrastive divergence: start the Markov chain from data • Persistent contrastive divergence: for each minibatch, continue the Markov chain from where it was for the previous minibatch (Goodfellow 2017)
Sidestep the problem • Use other criteria besides likelihood so that there is no need to compute Z or its gradient • Pseudolikelihood • Score matching • Ratio matching • Noise contrastive estimation (Goodfellow 2017)
Estimating the Partition Function • To evaluate a trained model, we want to know the likelihood • This requires estimating Z , even if we trained using a method that doesn’t di ff erentiate Z • Can estimate Z using annealed importance sampling (Goodfellow 2017)

Recommend

SECTORAL PRESENTATION 2009 2010 CONFRONTING OUR CHALLENGES CONFRONTING OUR CHALLENGES

SECTORAL PRESENTATION 2009 2010 CONFRONTING OUR CHALLENGES CONFRONTING OUR CHALLENGES SECURING OUR FUTURE SECURING OUR FUTURE 1 PRESENTED BY: HON. EDMUND BARTLETT, MP MINISTER OF TOURISM GORDON HOUSE KINGSTON TUESDAY, JUNE 30, 2009

1.88k views • 73 slides

CONFRONTING THE GIANT Confronting the Giant 1 Control of Blue Gum Eucalyptus in Coastal

Control of Blue Gum Eucalyptus in Coastal California Workshop Coastal Training Program Elkhorn Slough National Estuarine Research Reserve Little eucs CONFRONTING THE GIANT Confronting the Giant 1 Control of Blue Gum Eucalyptus in Coastal

327 views • 17 slides

Confronting Credit Rating Reform Confronting Credit Rating Reform A Public Good Approach

Confronting Credit Rating Reform Confronting Credit Rating Reform A Public Good Approach Jin-Chuan DUAN Risk Management Institute National University of Singapore (July 2010) JC Duan (07/2010)

430 views • 27 slides

Transforming ORD: Transforming ORD: Confronting Today s Reality s Reality Confronting

Transforming ORD: Transforming ORD: Confronting Today s Reality s Reality Confronting Today and Building a Successful Future and Building a Successful Future March 2009 Office of Research and Development Partnering to Solve Complex

666 views • 9 slides

Vector-partition functions Matthias Beck San Francisco State University math.sfsu.edu/beck

Vector-partition functions Matthias Beck San Francisco State University math.sfsu.edu/beck Vector partition functions A an ( m d ) -integral matrix b Z m x Z d Goal: Compute vector partition function A ( b ) := #

838 views • 41 slides

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

Partition Actions: Everything You Never Realized You Ought to Know About Them William J. Maffucci, J.D. Partition is a hoary proceeding. I say that mischievously, hoping that the phonetics will fool a few readers into thinking that the topic

306 views • 14 slides

On partition identities of Capparelli and Primc Jehanne Dousse CNRS and Universit e Lyon 1

On partition identities of Capparelli and Primc Jehanne Dousse CNRS and Universit e Lyon 1 FPSAC 2019 Ljubljana, 4 July 2019 Jehanne Dousse (CNRS) Partition identities of Capparelli and Primc FPSAC 2019 1 / 30 Introduction: partition

609 views • 35 slides

IT1100 : Introduction to Operating Systems Chapter 15 What is a partition? A partition is just a

IT1100 : Introduction to Operating Systems Chapter 15 What is a partition? A partition is just a logical division of your hard drive. This is done to put data in different locations for flexibility, scalability, ease of administration, and a

262 views • 7 slides

SET-PARTITION TABLEAUX Tom Halverson Macalester College FPSAC 2019 Ljubljana July 2, 2019

SET-PARTITION TABLEAUX Tom Halverson Macalester College FPSAC 2019 Ljubljana July 2, 2019 1/25 Set-Partition Tableaux Integer Partition: Organization: I. Origins: Representation 8 , 13 14 Theory of the Symmetric 2 , 3 1 , 6 , 10 16

586 views • 30 slides

Information Information partition Player 's information partition is a collection of his

Information by Prof. Kyung Hwan Baik. Slides for chapter 7 of Games and Information Information partition Player 's information partition is a collection of his information sets i such that each path is represented by one node in

799 views • 43 slides

GUID Partition Table (GPT) A Forensic Perspective Villanova University Department of

GUID Partition Table (GPT) A Forensic Perspective Villanova University Department of Computing Sciences D. Justin Price Spring 2014 GPT Protective Partition Partition GPT Backup MBR Header Table Area Supports up to

235 views • 6 slides

Minimal k -partition for the p -norm of the eigenvalues V. Bonnaillie-No el DMA, CNRS, ENS

Definition 2-partition Properties 3-partition k -partitions Conclusion Minimal k -partition for the p -norm of the eigenvalues V. Bonnaillie-No el DMA, CNRS, ENS Paris joint work with B. Bogosel, B. Helffer, C. L ena, G. Vial Calculus

527 views • 50 slides

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Algorithmics of Function Fields 1 Function Fields, Curves, Lecture 1 Global Sections Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function Fields Curves Representation and Definition

703 views • 44 slides

Function Calls Function Calls Python supports expressions with math-like functions A

Module 3 Function Calls Function Calls Python supports expressions with math-like functions A function in an expression is a function call Function calls have the form name (x,y,) function argument name Arguments are

333 views • 20 slides

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Evaluation function Cost function g g Evaluation function Cost function expand vertex minimizing expand vertex minimizing actual cost from actual cost from s s to to v v along the cheapest path along the cheapest path found

524 views • 3 slides

Partition functions for complex fugacity Part I Barry M. McCoy CN Yang Institute of Theoretical

Partition functions for complex fugacity Part I Barry M. McCoy CN Yang Institute of Theoretical Physics State University of New York, Stony Brook, NY, USA Partition functions for complex fugacity p.1/51 In collaboration with Michael

522 views • 51 slides

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock University of California - Los Angeles MURI-UCI June 3, 2011 For details, see: van Duijn, Marijtje A. J., Gile, Krista J. and Handcock, Mark S.

429 views • 21 slides

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23

782 views • 40 slides

This reduces to a generalized eigenvalue problem, i.e. to finding generalized eigenvectors of

This reduces to a generalized eigenvalue problem, i.e. to finding generalized eigenvectors of the following form, with the lowest eigenvalues: Ly Dy This technique is called Laplacian Eigenmaps since the matrix L is

464 views • 27 slides

+ m: iTEIi:' -f;'o:&

503 views • 37 slides

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline Example - Image Denoising Formulation Inference Learning Undirected Graphical Model Also called Markov Random Field (MRF) or Markov networks

339 views • 23 slides

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4 , Jian Tang 1 3 4 1 Quebec AI Institute (Mila) 2 University of Montreal 3 HEC Montreal 4 Canadian Institute for Advanced Research (CIFAR) Se Semi

204 views • 7 slides

Model inference s e from l b a v observed data r e s b o time dynamics underlying

Model inference s e from l b a v observed data r e s b o time dynamics underlying mechanisms predictive model (correlations) (interactions) Many issues : limitation over temporal and spatial sampling, noise (measurement,

554 views • 41 slides

!"#$%$&'&()&+,"%-.&%'+/#01'(+234-5)+6789:

Cmput 651 - Learning Undirected Models {14,17}/11/2008 !"#$%$&'&()&*+,"%-.&*%'+/#01'(+234-5)+6789: ;1%"<&<=+><0&"1*)10+/#01'( /%)).1?+@"#?< A8BC8DEF88FGHHI

529 views • 29 slides