CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted - PowerPoint PPT Presentation

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer

Your fjrst consultjng job • A billionaire from the suburbs of Seatule asks you a questjon: – He says: I have thumbtack, if I fmip it, what’s the probability it will fall with the nail up? – You say: Please fmip it a few tjmes: – You say: The probability is: • P(H) = 3/5 – He says: Why??? – You say: Because…

Thumbtack – Binomial Distributjon • P(Heads) = θ , P(Tails) = 1- θ … • Flips are i.i.d. : D ={ x i | i =1 …n }, P ( D | θ ) = Π i P ( x i | θ ) – Independent events – Identjcally distributed according to Binomial distributjon • Sequence D of α H Heads and α T Tails

Maximum Likelihood Estjmatjon • Data: Observed set D of α H Heads and α T Tails • Hypothesis space: Binomial distributjons • Learning: fjnding θ is an optjmizatjon problem – What’s the objectjve functjon? • MLE: Choose θ to maximize probability of D

Your fjrst parameter learning algorithm • Set derivatjve to zero, and solve!

But, how many fmips do I need? • Billionaire says: I fmipped 3 heads and 2 tails. • You say: θ = 3/5, I can prove it! • He says: What if I fmipped 30 heads and 20 tails? • You say: Same answer, I can prove it! • He says: What’s betuer? • You say: Umm… The more the merrier??? • He says: Is this why I am paying you the big bucks???

A bound (from Hoefgding’s inequality) • For N = α H + α T , and • Let θ * be the true parameter, for any ε >0 : Prob. of Mistake Exponential Decay! N

PAC Learning • PAC: Probably Approximate Correct • Billionaire says: I want to know the thumbtack θ , within ε = 0.1, with probability at least 1- δ = 0.95. • How many fmips? Or, how big do I set N ? Interesting! Lets look at some numbers! • ε = 0.1, δ=0.05

What if I have prior beliefs? • Billionaire says: Wait, I know that the thumbtack is “close” to 50-50. What can you do for me now? • You say: I can learn it the Bayesian way… • Rather than estjmatjng a single θ , we obtain a distributjon over possible values of θ In the beginning After observations Observe flips e.g.: {tails, tails}

Bayesian Learning Prior Data Likelihood • Use Bayes rule: Posterior Normalization • Or equivalently: • Also, for uniform priors:  reduces to MLE objective

Bayesian Learning for Thumbtacks Likelihood functjon is Binomial: • What about prior? – Represent expert knowledge – Simple posterior form • Conjugate priors: – Closed-form representatjon of posterior – For Binomial, conjugate prior is Beta distributjon

Beta prior distributjon – P( θ ) • Likelihood functjon: • Posterior:

Posterior distributjon • Prior: • Data: α H heads and α T tails • Posterior distributjon:

MAP for Beta distributjon • MAP: use most likely parameter: • Beta prior equivalent to extra thumbtack flips • As N → ∞, prior is “forgotten” • But, for small sample size, prior is important!

What about contjnuous variables? • Billionaire says: If I am measuring a contjnuous variable, what can you do for me? • You say: Let me tell you about Gaussians…

Some propertjes of Gaussians • Affjne transformatjon (multjplying by scalar and adding a constant) are Gaussian – X ~ N ( µ , σ 2 ) – Y = aX + b  Y ~ N (a µ +b,a 2 σ 2 ) • Sum of Gaussians is Gaussian – X ~ N ( µ X , σ 2X ) – Y ~ N ( µ Y , σ 2Y ) – Z = X+Y  Z ~ N ( µ X + µ Y , σ 2X + σ 2Y ) • Easy to difgerentjate, as we will see soon!

Learning a Gaussian x i Exam Score i = • Collect a bunch of data 0 85 1 95 – Hopefully, i.i.d. samples 2 100 – e.g., exam scores 3 12 • Learn parameters … … – Mean: μ 99 89 – Variance: σ

MLE for Gaussian: • Prob. of i.i.d. samples D ={x 1 ,…,x N }: • Log-likelihood of data:

Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean?

MLE for variance • Again, set derivatjve to zero:

Learning Gaussian parameters • MLE: • BTW. MLE for the variance of a Gaussian is biased – Expected result of estjmatjon is not true parameter! – Unbiased variance estjmator:

Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Wishart Distributjon • Prior for mean:

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted - PowerPoint PPT Presentation

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer Your fjrst consultjng job A billionaire from the suburbs of Seatule asks you a questjon: He says: I have thumbtack, if

CSE446: Kernels and Kernelized Perceptron Winter 2015 Luke

CSE446: Decision Tree Part2 Winter 2016 Ali Farhadi

CSE446: Decision Trees Winter 2015 Luke Ze;lemoyer Slides

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone & telegraph History of Information October 22 overview point to

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

POINT CLOUD TO CAD LOD LEVELS POINT CLOUD MODEL POINT

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

Spring 3 Spring without XML Agenda Industry Forces Whats New Spring 2.0

Insert Presentation Title: Insert Presentation Subtitle Your Name Point 1: Make your point here

#ifndef POINT_H_ #define POINT_H_ typedef struct { float x; float y; } Point; Point

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

The prior model Alicia Johnson Associate Professor, Macalester College DataCamp Bayesian

Fundamentals of bayesian statistics . Course of Machine Learning Master Degree in Computer

Efficient learning of smooth probability functions from Bernoulli tests with guarantees Paul

From Stochastic Search to Programming by Optimisation: My Quest for Automating the Design of

User Popula,ons Forgo=en usernames/ Distributed across networks; LOW-RATE passwords the

Thanks to R Parr, C Guesterin

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de

Introduction to Bayesian Statistics Lecture 9: Hierarchical Models Rung-Ching Tsai Department of

Sambuz

Useful Links

Newsletter

Mail Us

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted - PowerPoint PPT Presentation

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zetulemoyer Your fjrst consultjng job A billionaire from the suburbs of Seatule asks you a questjon: He says: I have thumbtack, if

CSE446: Kernels and Kernelized Perceptron Winter 2015 Luke

CSE446: Decision Tree Part2 Winter 2016 Ali Farhadi

CSE446: Decision Trees Winter 2015 Luke Ze;lemoyer Slides

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone &amp; telegraph History of Information October 22 overview point to

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

POINT CLOUD TO CAD LOD LEVELS POINT CLOUD MODEL POINT

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

Spring 3 Spring without XML Agenda Industry Forces Whats New Spring 2.0

Insert Presentation Title: Insert Presentation Subtitle Your Name Point 1: Make your point here

#ifndef POINT_H_ #define POINT_H_ typedef struct { float x; float y; } Point; Point

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

The prior model Alicia Johnson Associate Professor, Macalester College DataCamp Bayesian

Fundamentals of bayesian statistics . Course of Machine Learning Master Degree in Computer

Efficient learning of smooth probability functions from Bernoulli tests with guarantees Paul

From Stochastic Search to Programming by Optimisation: My Quest for Automating the Design of

User Popula,ons Forgo=en usernames/ Distributed across networks; LOW-RATE passwords the

Thanks to R Parr, C Guesterin

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de

Introduction to Bayesian Statistics Lecture 9: Hierarchical Models Rung-Ching Tsai Department of

Sambuz

Useful Links

Newsletter

Mail Us

point to point telephone & telegraph History of Information October 22 overview point to