Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 1, January 31, 2013 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 1 / 44

One of the most exciting advances in machine learning (AI, signal processing, coding, control, . . . ) in the last decades David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 2 / 44

How can we gain global insight based on local observations ? David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 3 / 44

Key idea 1 Represent the world as a collection of random variables X 1 , . . . , X n with joint distribution p ( X 1 , . . . , X n ) 2 Learn the distribution from data 3 Perform “ inference ” (compute conditional distributions p ( X i | X 1 = x 1 , . . . , X m = x m )) David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 4 / 44

Reasoning under uncertainty As humans, we are continuously making predictions under uncertainty Classical AI and ML research ignored this phenomena Many of the most recent advances in technology are possible because of this new, probabilistic , approach David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 5 / 44

Applications: Deep question answering David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 6 / 44

Applications: Machine translation David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 7 / 44

Applications: Speech recognition David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 8 / 44

Applications: Stereo vision output: disparity ! input: two images ! David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 9 / 44

Key challenges 1 Represent the world as a collection of random variables X 1 , . . . , X n with joint distribution p ( X 1 , . . . , X n ) How does one compactly describe this joint distribution? Directed graphical models (Bayesian networks) Undirected graphical models (Markov random fields, factor graphs) 2 Learn the distribution from data Maximum likelihood estimation. Other estimation methods? How much data do we need? How much computation does it take? 3 Perform “ inference ” (compute conditional distributions p ( X i | X 1 = x 1 , . . . , X m = x m )) David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 10 / 44

Syllabus overview We will study Representation, Inference & Learning First in the simplest case Only discrete variables Fully observed models Exact inference & learning Then generalize Continuous variables Partially observed data during learning (hidden variables) Approximate inference & learning Learn about algorithms, theory & applications David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 11 / 44

Logistics: class Class webpage: http://cs.nyu.edu/~dsontag/courses/pgm13/ Sign up for mailing list! Draft slides posted before each lecture Book: Probabilistic Graphical Models: Principles and Techniques by Daphne Koller and Nir Friedman, MIT Press (2009) Required readings for each lecture posted to course website. Many additional reference materials available! Office hours: Wednesday 5-6pm and by appointment. 715 Broadway, 12th floor, Room 1204 Teaching Assistant: Li Wan (wanli@cs.nyu.edu) Li’s Office hours: Monday 5-6pm. 715 Broadway, Room 1231 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 12 / 44

Logistics: prerequisites & grading Prerequisites: Previous class on machine learning Basic concepts from probability and statistics Algorithms (e.g., dynamic programming, graphs, complexity) Calculus Grading: problem sets (65%) + in class final exam (30%) + participation (5%) Class attendance is required. 7-8 assignments (every 1–2 weeks). Both theory and programming. First homework out today , due next Thursday (Feb. 7) at 5pm See collaboration policy on class webpage Important: Solutions to the theoretical questions require formal proofs. For the programming assignments, I recommend Python, Java, or Matlab. Do not use C++. David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 13 / 44

Review of probability: outcomes Reference: Chapter 2 and Appendix A An outcome space specifies the possible outcomes that we would like to reason about, e.g. Ω = { , } Coin toss } Ω = { , , , , , Die toss We specify a probability p ( ω ) for each outcome ω such that � p ( ω ) ≥ 0 , p ( ω ) = 1 ω ∈ Ω E.g., p( ) = .6 p( ) = .4 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 14 / 44

Review of probability: events An event is a subset of the outcome space, e.g. } E = { , , Even die tosses } O = { , , Odd die tosses The probability of an event is given by the sum of the probabilities of the outcomes it contains, � p ( E ) = p ( ω ) ω ∈ E E.g., p(E) = p( ) + p( ) + p( ) = 1/2, if fair die David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 15 / 44

Independence of events Two events A and B are independent if p ( A ∩ B ) = p ( A ) p ( B ) Are these two events independent? B A � 2 � 1 No! p ( A ∩ B ) = 0 , p ( A ) p ( B ) = 6 Now suppose our outcome space had two different die: } Ω = { , , … , , 2 die tosses 6 2 = 36 outcomes and the probability distribution is such that each die is independent, p( ) = p( ) p( ) p( ) = p( ) p( ) David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 16 / 44

Independence of events Two events A and B are independent if p ( A ∩ B ) = p ( A ) p ( B ) Are these two events independent? A B p(A) = p( ) p(B) = p( ) Yes! p( ) ! p ( A ∩ B ) = 0 � 2 � 2 � 1 p( ) p( ) p ( A ) p ( B ) = 6 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 17 / 44

Conditional probability A 1 ∩ B A B Let A , B be events, p ( B ) > 0. p ( A | B ) = p ( A ∩ B ) p ( B ) Claim 1: � ω ∈ S p ( ω | S ) = 1 Claim 2: If A and B are independent, then p ( A | B ) = p ( A ) David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 18 / 44

Two important rules 1 Chain rule Let S 1 , . . . S n be events, p ( S i ) > 0. p ( S 1 ∩ S 2 ∩ · · · ∩ S n ) = p ( S 1 ) p ( S 2 | S 1 ) · · · p ( S n | S 1 , . . . , S n − 1 ) 2 Bayes’ rule Let S 1 , S 2 be events, p ( S 1 ) > 0 and p ( S 2 ) > 0. p ( S 1 | S 2 ) = p ( S 1 ∩ S 2 ) = p ( S 2 | S 1 ) p ( S 1 ) p ( S 2 ) p ( S 2 ) David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 19 / 44

Discrete random variables Often each outcome corresponds to a setting of various attributes (e.g., “age”, “gender”, “hasPneumonia”, “hasDiabetes”) A random variable X is a mapping X : Ω → D D is some set (e.g., the integers) Induces a partition of all outcomes Ω For some x ∈ D , we say p ( X = x ) = p ( { ω ∈ Ω : X ( ω ) = x } ) “probability that variable X assumes state x ” Notation: Val( X ) = set D of all values assumed by X (will interchangeably call these the “values” or “states” of variable X ) p ( X ) is a distribution: � x ∈ Val ( X ) p ( X = x ) = 1 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 20 / 44

Multivariate distributions Instead of one random variable, have random vector X ( ω ) = [ X 1 ( ω ) , . . . , X n ( ω )] X i = x i is an event. The joint distribution p ( X 1 = x 1 , . . . , X n = x n ) is simply defined as p ( X 1 = x 1 ∩ · · · ∩ X n = x n ) We will often write p ( x 1 , . . . , x n ) instead of p ( X 1 = x 1 , . . . , X n = x n ) Conditioning, chain rule, Bayes’ rule, etc. all apply David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 21 / 44

Working with random variables For example, the conditional distribution p ( X 1 | X 2 = x 2 ) = p ( X 1 , X 2 = x 2 ) . p ( X 2 = x 2 ) This notation means p ( X 1 = x 1 | X 2 = x 2 ) = p ( X 1 = x 1 , X 2 = x 2 ) ∀ x 1 ∈ Val ( X 1 ) p ( X 2 = x 2 ) Two random variables are independent , X 1 ⊥ X 2 , if p ( X 1 = x 1 , X 2 = x 2 ) = p ( X 1 = x 1 ) p ( X 2 = x 2 ) for all values x 1 ∈ Val ( X 1 ) and x 2 ∈ Val ( X 2 ). David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 22 / 44

Example Consider three binary-valued random variables Val ( X i ) = { 0 , 1 } X 1 , X 2 , X 3 Let outcome space Ω be the cross-product of their states: Ω = Val ( X 1 ) × Val ( X 2 ) × Val ( X 3 ) X i ( ω ) is the value for X i in the assignment ω ∈ Ω Specify p ( ω ) for each outcome ω ∈ Ω by a big table: x 1 x 2 x 3 p ( x 1 , x 2 , x 3 ) 0 0 0 .11 0 0 1 .02 . . . 1 1 1 .05 How many parameters do we need to specify? 2 3 − 1 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 23 / 44

Marginalization Suppose X and Y are random variables with distribution p ( X , Y ) X : Intelligence, Val( X ) = { “Very High”, “High” } Y : Grade, Val( Y ) = { “a”, “b” } Joint distribution specified by: X vh h Y a 0.7 0.15 b 0.1 0.05 p ( Y = a ) = ?= 0 . 85 More generally, suppose we have a joint distribution p ( X 1 , . . . , X n ). Then, � � � � � p ( X i = x i ) = · · · · · · p ( x 1 , . . . , x n ) x 1 x 2 x i − 1 x i +1 x n David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 24 / 44

Probabilistic Graphical Models David Sontag New York University - PowerPoint PPT Presentation

Probabilistic Graphical Models David Sontag New York University Lecture 1, January 31, 2013 David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 1 / 44 One of the most exciting advances in machine learning (AI, signal processing,

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

CS 473: Artificial Intelligence Conclusion Dan Weld University of Washington [Many of these

Introduction to Artificial Intelligence ITK 340, Spring 2009 For Wednesday Read Russell and

a ; 90 o c -face centered lattice, having the

Syllabus for CMSC 722, AI Planning Dana S. Nau University of Maryland 2:06 PM January 25,

Introduction to Artificial Intelligence CSCE 476-876, Fall 2017 URL: www.cse.unl.edu/~cse476 1

Making clinical AI and decision support a reality through adaptive user interfaces Malcolm Pradhan

On optimal FEM and impedance conditions for thin electromagnetic shielding sheets Kersten Schmidt

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov

Sambuz

Useful Links

Newsletter

Mail Us