Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 1 / 25

Outline Probabilistic Models 1 Maximum Likelihood Estimation 2 Linear Regression Logistic Regression Maximum A Posteriori Estimation 3 Bayesian Estimation** 4 Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 2 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Model F : a collection of functions parametrized by Θ Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Model F : a collection of functions parametrized by Θ Goal: to train a function f such that, given a new data point x 0 , the output value y = f ( x 0 ; Θ ) ˆ is closest to the correct label y 0 Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Model F : a collection of functions parametrized by Θ Goal: to train a function f such that, given a new data point x 0 , the output value y = f ( x 0 ; Θ ) ˆ is closest to the correct label y 0 Examples in X are usually assumed to be i.i.d. sampled from random variables ( x , y ) following some data generating distribution P ( x , y ) Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Model F : a collection of functions parametrized by Θ Goal: to train a function f such that, given a new data point x 0 , the output value y = f ( x 0 ; Θ ) ˆ is closest to the correct label y 0 Examples in X are usually assumed to be i.i.d. sampled from random variables ( x , y ) following some data generating distribution P ( x , y ) In probabilistic models, f is replaced by P ( y = y | x = x 0 ) and a prediction is made by: P ( y = y | x = x 0 ; Θ ) y = argmax ˆ y Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Predictions based on Probability Supervised learning, we are given a training set X = { ( x ( i ) , y ( i ) ) } N i = 1 Model F : a collection of functions parametrized by Θ Goal: to train a function f such that, given a new data point x 0 , the output value y = f ( x 0 ; Θ ) ˆ is closest to the correct label y 0 Examples in X are usually assumed to be i.i.d. sampled from random variables ( x , y ) following some data generating distribution P ( x , y ) In probabilistic models, f is replaced by P ( y = y | x = x 0 ) and a prediction is made by: P ( y = y | x = x 0 ; Θ ) y = argmax ˆ y How to find Θ ? Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 4 / 25

Function ( Θ ) as Point Estimate Regard Θ ( f ) as an estimate of the “true” Θ ⇤ ( f ⇤ ) Mapped from the training set X Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 5 / 25

Function ( Θ ) as Point Estimate Regard Θ ( f ) as an estimate of the “true” Θ ⇤ ( f ⇤ ) Mapped from the training set X Maximum a posteriori (MAP) estimation : argmax Θ P ( Θ | X ) = argmax Θ P ( X | Θ ) P ( Θ ) By Bayes’ rule ( P ( X ) is irrelevant) Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 5 / 25

Function ( Θ ) as Point Estimate Regard Θ ( f ) as an estimate of the “true” Θ ⇤ ( f ⇤ ) Mapped from the training set X Maximum a posteriori (MAP) estimation : argmax Θ P ( Θ | X ) = argmax Θ P ( X | Θ ) P ( Θ ) By Bayes’ rule ( P ( X ) is irrelevant) Solves Θ first, then uses it as a constant in P ( y | x ; Θ ) to get ˆ y Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 5 / 25

Function ( Θ ) as Point Estimate Regard Θ ( f ) as an estimate of the “true” Θ ⇤ ( f ⇤ ) Mapped from the training set X Maximum a posteriori (MAP) estimation : argmax Θ P ( Θ | X ) = argmax Θ P ( X | Θ ) P ( Θ ) By Bayes’ rule ( P ( X ) is irrelevant) Solves Θ first, then uses it as a constant in P ( y | x ; Θ ) to get ˆ y Maximum likelihood (ML) estimation : argmax Θ P ( X | Θ ) Assumes uniform P ( Θ ) and does not prefer particular Θ Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 5 / 25

Probability Interpretation Assumption: y = f ⇤ ( x )+ ε , ε ⇠ N ( 0 , β � 1 ) Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 8 / 25

Probability Interpretation Assumption: y = f ⇤ ( x )+ ε , ε ⇠ N ( 0 , β � 1 ) The unknown deterministic function is defined as f ⇤ ( x ; w ⇤ ) = w ⇤> x All variables are z -normalized, so no bias term ( b ) Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 8 / 25

Probability Interpretation Assumption: y = f ⇤ ( x )+ ε , ε ⇠ N ( 0 , β � 1 ) The unknown deterministic function is defined as f ⇤ ( x ; w ⇤ ) = w ⇤> x All variables are z -normalized, so no bias term ( b ) We have ( y | x ) ⇠ N ( w ⇤> x , β � 1 ) Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 8 / 25

Probability Interpretation Assumption: y = f ⇤ ( x )+ ε , ε ⇠ N ( 0 , β � 1 ) The unknown deterministic function is defined as f ⇤ ( x ; w ⇤ ) = w ⇤> x All variables are z -normalized, so no bias term ( b ) We have ( y | x ) ⇠ N ( w ⇤> x , β � 1 ) So, out goal is to find w as close to w ⇤ as possible such that: P ( y | x = x ; w ) = w > x y = argmax ˆ y Note that ˆ y is irrelevant to β , so we don’t need to solve β Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 8 / 25

Probability Interpretation Assumption: y = f ⇤ ( x )+ ε , ε ⇠ N ( 0 , β � 1 ) The unknown deterministic function is defined as f ⇤ ( x ; w ⇤ ) = w ⇤> x All variables are z -normalized, so no bias term ( b ) We have ( y | x ) ⇠ N ( w ⇤> x , β � 1 ) So, out goal is to find w as close to w ⇤ as possible such that: P ( y | x = x ; w ) = w > x y = argmax ˆ y Note that ˆ y is irrelevant to β , so we don’t need to solve β ML estimation: w P ( X | w ) argmax Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 8 / 25

ML Estimation I Problem: w P ( X | w ) argmax Since we assume i.i.d. samples, we have i = 1 P ( x ( i ) , y ( i ) | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) | w ) P ( X | w ) = ∏ N Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 9 / 25

ML Estimation I Problem: w P ( X | w ) argmax Since we assume i.i.d. samples, we have i = 1 P ( x ( i ) , y ( i ) | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) | w ) P ( X | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) ) = ∏ i N ( y ( i ) ; w > x ( i ) , σ 2 ) P ( x ( i ) ) = ∏ N Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 9 / 25

ML Estimation I Problem: w P ( X | w ) argmax Since we assume i.i.d. samples, we have i = 1 P ( x ( i ) , y ( i ) | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) | w ) P ( X | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) ) = ∏ i N ( y ( i ) ; w > x ( i ) , σ 2 ) P ( x ( i ) ) = ∏ N q ⇣ 2 ( y ( i ) � w > x ( i ) ) 2 ⌘ β � β P ( x ( i ) ) = ∏ i 2 π exp Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 9 / 25

ML Estimation I Problem: w P ( X | w ) argmax Since we assume i.i.d. samples, we have i = 1 P ( x ( i ) , y ( i ) | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) | w ) P ( X | w ) = ∏ N i = 1 P ( y ( i ) | x ( i ) , w ) P ( x ( i ) ) = ∏ i N ( y ( i ) ; w > x ( i ) , σ 2 ) P ( x ( i ) ) = ∏ N q ⇣ 2 ( y ( i ) � w > x ( i ) ) 2 ⌘ β � β P ( x ( i ) ) = ∏ i 2 π exp To make the problem tractable, we prefer “sums” over “products” Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 9 / 25

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 1 / 25 Outline Probabilistic Models

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Morphable Models 2019: Hands-on part Ghazi Bouabene Probabilistic Morphable Models

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch.

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

CSC 311: Introduction to Machine Learning Lecture 7 - Probabilistic Models Roger Grosse Chris

Modelling and Verification Lecture 1 Lecturer: Luca Aceto luca@ru.is or luca.aceto@gmail.com

Generalized Probit Model in Design of Dose Finding Experiments Yuehui Wu Valerii V. Fedorov

III.3 Probabilistic Retrieval Models 1. Probabilistic Ranking Principle 2. Binary Independence

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

Overview Multi-Attribute Probabilistic Choice Models Probabilistic choice models Florian

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Probabilistic Models Machine Learning 1 / 25 Outline Probabilistic Models

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Morphable Models 2019: Hands-on part Ghazi Bouabene Probabilistic Morphable Models

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch.

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

CSC 311: Introduction to Machine Learning Lecture 7 - Probabilistic Models Roger Grosse Chris

Modelling and Verification Lecture 1 Lecturer: Luca Aceto luca@ru.is or luca.aceto@gmail.com

Generalized Probit Model in Design of Dose Finding Experiments Yuehui Wu Valerii V. Fedorov

III.3 Probabilistic Retrieval Models 1. Probabilistic Ranking Principle 2. Binary Independence

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Search Marco Chiarandini Department of Mathematics &amp; Computer Science University of Southern

Overview Multi-Attribute Probabilistic Choice Models Probabilistic choice models Florian

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib &amp; Torsten

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten