Basics of Model-Based Learning Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Basics of Model-Based Learning Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018

Recap � z p ( x , y o , z ) p ( x | y o ) = � x , z p ( x , y o , z ) Assume that x , y , z each are d = 500 dimensional, and that each element of the vectors can take K = 10 values. ◮ Issue 1: To specify p ( x , y , z ), we need to specify K 3 d − 1 = 10 1500 − 1 non-negative numbers, which is impossible. Topic 1: Representation What reasonably weak assumptions can we make to efficiently represent p ( x , y , z )? ◮ Directed and undirected graphical models, factor graphs ◮ Factorisation and independencies Michael Gutmann Basics of Model-Based Learning 2 / 66

Recap � p ( x , y o , z ) z p ( x | y o ) = � p ( x , y o , z ) x , z ◮ Issue 2: The sum in the numerator goes over the order of K d = 10 500 non-negative numbers and the sum in the denominator over the order of K 2 d = 10 1000 , which is impossible to compute. Topic 2: Exact inference Can we further exploit the assumptions on p ( x , y , z ) to efficiently compute the posterior probability or derived quantities? ◮ Yes! Factorisation can be exploited by using the distributive law and by caching computations. ◮ Variable elimination and sum/max-product message passing ◮ Inference for hidden Markov models. Michael Gutmann Basics of Model-Based Learning 3 / 66

Recap � z p ( x , y o , z ) p ( x | y o ) = � x , z p ( x , y o , z ) ◮ Issue 3: Where do the non-negative numbers p ( x , y , z ) come from? Topic 3: Learning How can we learn the numbers from data? Michael Gutmann Basics of Model-Based Learning 4 / 66

Program 1. Basic concepts 2. Learning by maximum likelihood estimation 3. Learning by Bayesian inference Michael Gutmann Basics of Model-Based Learning 5 / 66

Program 1. Basic concepts Observed data as a sample drawn from an unknown data generating distribution Probabilistic, statistical, and Bayesian models Partition function and unnormalised statistical models Learning = parameter estimation or learning = Bayesian inference 2. Learning by maximum likelihood estimation 3. Learning by Bayesian inference Michael Gutmann Basics of Model-Based Learning 6 / 66

Learning from data ◮ Use observed data D to learn about their source ◮ Enables probabilistic inference, decision making, . . . Data space Data source Observation Unknown properties Insight Michael Gutmann Basics of Model-Based Learning 7 / 66

Data ◮ We typically assume that the observed data D correspond to a random sample (draw) from an unknown distribution p ∗ ( D ) D ∼ p ∗ ( D ) ◮ In other words, we consider the data D to be a realisation (observation) of a random variable with distribution p ∗ . Michael Gutmann Basics of Model-Based Learning 8 / 66

Data ◮ Example: You use some transition and emission distribution and generate data from the hidden Markov model using ancestral sampling. h 1 h 2 h 3 h 4 v 1 v 2 v 3 v 4 ◮ You know the visibles ( v 1 , v 2 , v 3 , . . . , v T ) ∼ p ( v 1 , . . . , v T ). ◮ You give the generated visibles to a friend who does not know about the distributions that you used, nor possibly that you used a HMM. For your friend: D = ( v 1 , v 2 , v 3 , . . . , v T ) D ∼ p ∗ ( D ) Michael Gutmann Basics of Model-Based Learning 9 / 66

Independent and identically distributed (iid) data ◮ Let D = { x 1 , . . . , x n } . If n � p ∗ ( D ) = p ∗ ( x i ) i =1 then the data (or the corresponding random variables) are said to the iid. D is also said to be a random sample from p ∗ . ◮ In other words, the x i were independently drawn from the same distribution p ∗ ( x ). ◮ Example: n time series ( v 1 , v 2 , v 3 , . . . , v T ) each independently generated with the same transition and emission distribution. Michael Gutmann Basics of Model-Based Learning 10 / 66

Independent and identically distributed (iid) data ◮ Example: For a distribution p ( x 1 , x 2 , x 3 , x 4 , x 5 ) = p ( x 1 ) p ( x 2 ) p ( x 3 | x 1 , x 2 ) p ( x 4 | x 3 ) p ( x 5 | x 2 ) with known conditional probabilities, you run ancestral sampling n times. ◮ You record the n observed values of x 4 , i.e. x 1 x 2 x (1) 4 , . . . , x ( n ) x 3 x 5 4 and give them to a friend who x 4 does not know how you generated the data but that they are iid. ◮ For your friend, the x ( i ) are data points x i ∼ p ∗ . 4 ◮ Remark: if the subscript index is occupied, we often use superscripts to enumerate the data points. Michael Gutmann Basics of Model-Based Learning 11 / 66

Using models to learn from data ◮ Set up a model with potential properties θ (parameters) ◮ See which θ are in line with the observed data D Data space Data source Observation Unknown properties M( θ ) Learning Model Michael Gutmann Basics of Model-Based Learning 12 / 66

Models ◮ The term “model” has multiple meanings, see e.g. https://en.wikipedia.org/wiki/Model ◮ In our course: ◮ probabilistic model ◮ statistical model ◮ Bayesian model ◮ See Section 3 in the background document Introduction to Probabilistic Modelling ◮ Note: the three types are often confounded, and often just called probabilistic or statistical model, or just “model”. Michael Gutmann Basics of Model-Based Learning 13 / 66

Probabilistic model Example from the first lecture: cognitive impairment test ◮ Sensitivity of 0.8 and specificity of 0.95 (Scharre, 2010) ◮ Probabilistic model for presence of impairment ( x = 1) and detection by the test ( y = 1). Pr( x = 1) = 0 . 11 (prior) Pr( y = 1 | x = 1) = 0 . 8 (sensitivity) Pr( y = 0 | x = 0) = 0 . 95 (specificity) (Example from sagetest.osu.edu) ◮ From first lecture: A probabilistic model is an abstraction of reality that uses probability theory to quantify the chance of uncertain events. Michael Gutmann Basics of Model-Based Learning 14 / 66

Probabilistic model ◮ More technically: probabilistic model ≡ probability distribution (pmf/pdf). ◮ Probabilistic model was written in terms of the probability Pr. In terms of the pmf it is p x (1) = 0 . 11 p y | x (1 | 1) = 0 . 8 p y | x (0 | 0) = 0 . 95 ◮ Commonly written as p ( x = 1) = 0 . 11 p ( y = 1 | x = 1) = 0 . 8 p ( y = 0 | x = 0) = 0 . 95 where the notation for probability measure Pr and pmf p are confounded. Michael Gutmann Basics of Model-Based Learning 15 / 66

Statistical model ◮ If we substitute the numbers with parameters, we obtain a (parametric) statistical model p ( x = 1) = θ 1 p ( y = 1 | x = 1) = θ 2 p ( y = 0 | x = 0) = θ 3 ◮ For each value of the θ i , we obtain a different pmf. Dependency highlighted by writing p ( x = 1; θ 1 ) = θ 1 p ( y = 1 | x = 1; θ 2 ) = θ 2 p ( y = 0 | x = 0; θ 3 ) = θ 3 ◮ Or: p ( x , y ; θ ) where θ = ( θ 1 , θ 2 , θ 3 ) is a vector of parameters. ◮ A statistical model corresponds to a set of probabilistic models indexed by the parameters: { p ( x ; θ ) } θ Michael Gutmann Basics of Model-Based Learning 16 / 66

Bayesian model ◮ In Bayesian models, we combine statistical models with a (prior) probability distribution on the parameters θ . ◮ Each member of the family { p ( x ; θ ) } θ is considered a conditional pmf/pdf of x given θ ◮ Use conditioning notation p ( x | θ ) ◮ The conditional p ( x | θ ) and the pmf/pdf p ( θ ) for the (prior) distribution of θ together specify the joint distribution (product rule) p ( x , θ ) = p ( x | θ ) p ( θ ) ◮ Bayesian model for x = probabilistic model for ( x , θ ). ◮ The prior may be parametrised, e.g. p ( θ ; α ). The parameters α are called “hyperparameters”. Michael Gutmann Basics of Model-Based Learning 17 / 66

Graphical models as statistical models ◮ Directed or undirected graphical models are sets of probability distributions, e.g. all p that factorise as � � p ( x i | pa i ) p ( x ) ∝ φ i ( X i ) p ( x ) = or i i They are thus statistical models. ◮ If we consider parametric families for p ( x i | pa i ) and φ i ( X i ), they correspond to parametric statistical models � � p ( x i | pa i ; θ i ) p ( x ; θ ) ∝ φ i ( X i ; θ i ) p ( x ; θ ) = or i i where θ = ( θ 1 , θ 2 , . . . ). Michael Gutmann Basics of Model-Based Learning 18 / 66

Cancer-asbestos-smoking example ( Barber Figure 9.4 ) ◮ Very simple toy example about the relationship between lung Cancer, Asbestos exposure, and Smoking DAG: Parametric models: (for binary vars) p ( a = 1; θ a ) = θ a p ( s = 1; θ s ) = θ s a s p ( c = 1 | a , s ) a s c θ 1 0 0 c θ 2 1 0 c θ 3 Factorisation: 0 1 c θ 4 1 1 p ( c , a , s ) = p ( c | a , s ) p ( a ) p ( s ) c All parameters are ≥ 0 ◮ Factorisation + parametric models for the factors gives parametric statistical model p ( c , a , s ; θ ) = p ( c | a , s ; θ 1 c , . . . , θ 4 c ) p ( a ; θ a ) p ( s ; θ s ) Michael Gutmann Basics of Model-Based Learning 19 / 66

Cancer-asbestos-smoking example ◮ The model specification p ( a = 1; θ a ) = θ a is equivalent to a (1 − θ a ) 1 − a p ( a ; θ a ) = θ a = θ ✶ ( a =1) (1 − θ a ) ✶ ( a =0) a Note: subscript “a” of θ a is used to label θ and is not a variable. ◮ a is a Bernoulli random variable with “success” probability θ a . ◮ Equivalently for s . Michael Gutmann Basics of Model-Based Learning 20 / 66

Basics of Model-Based Learning Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Basics of Model-Based Learning Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap z p ( x , y o , z ) p ( x | y o ) = x , z p ( x , y o , z )

MODULE 6 PLUMBING AND ELECTRICAL BASICS OF MODERN LABORATORY DESIGN 6 6 PLUMBING AND ELECTRICAL

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Basics of model-based RL:

Outline Random Networks Basics Basics Basics Definitions Definitions How to build

Qt 3D Basics Kvin Ottens, Software Craftsman at KDAB Qt 3D Basics Feature Set Entity

Management of Classification Lookup Files The basics of classification The basics of

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra,

Lecture 3: reflex-based, model-based, goal-based, utility-based, learning-based Systematic

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Breakout Session Social Emotional Learning: Tension and Complement to Blended and Personalized

1. Keys to solving problems recursively (1) Whats the most simple case (e.g., smallest n, or

SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTION PETR HOSEK CRISTIAN CADAR Petr Hosek is a

Paxos Made Simple Lamport Thomas Marshall Motivation We need a way to maintain consistency

Enhancing Memory and Attention Through the Science of Learning Karla A. Lassonde Ph.D. 41 st

Sharing [Kotlin code across platforms] is caring! Eugenio Marletti @workingkills

Migrating Legacy.com Migrating a top 50 most visited site in the U.S. onto Drupal - Legacy.com

Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B.

Basics of Model-Based Learning Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Basics of Model-Based Learning Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap z p ( x , y o , z ) p ( x | y o ) = x , z p ( x , y o , z )

MODULE 6 PLUMBING AND ELECTRICAL BASICS OF MODERN LABORATORY DESIGN 6 6 PLUMBING AND ELECTRICAL

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Basics of model-based RL:

Outline Random Networks Basics Basics Basics Definitions Definitions How to build

Qt 3D Basics Kvin Ottens, Software Craftsman at KDAB Qt 3D Basics Feature Set Entity

Management of Classification Lookup Files The basics of classification The basics of

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi*, Dipendra Misra*,

Lecture 3: reflex-based, model-based, goal-based, utility-based, learning-based Systematic

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Breakout Session Social Emotional Learning: Tension and Complement to Blended and Personalized

1. Keys to solving problems recursively (1) Whats the most simple case (e.g., smallest n, or

SAFE SOFTWARE UPDATES VIA MULTI-VERSION EXECUTION PETR HOSEK CRISTIAN CADAR Petr Hosek is a

Paxos Made Simple Lamport Thomas Marshall Motivation We need a way to maintain consistency

Enhancing Memory and Attention Through the Science of Learning Karla A. Lassonde Ph.D. 41 st

Sharing [Kotlin code across platforms] is caring! Eugenio Marletti @workingkills

Migrating Legacy.com Migrating a top 50 most visited site in the U.S. onto Drupal - Legacy.com

Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B.

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra,