Maximum Likelihood (ML), Expecta6on Maximiza6on (EM) - PowerPoint PPT Presentation

Maximum ¡Likelihood ¡(ML), ¡ ¡ Expecta6on ¡Maximiza6on ¡(EM) ¡ ¡ ¡ Pieter ¡Abbeel ¡ UC ¡Berkeley ¡EECS ¡ ¡ ¡ ¡ Many ¡slides ¡adapted ¡from ¡Thrun, ¡Burgard ¡and ¡Fox, ¡ProbabilisAc ¡RoboAcs ¡ ¡

Outline ¡ n Maximum ¡likelihood ¡(ML) ¡ n Priors, ¡and ¡maximum ¡a ¡posteriori ¡(MAP) ¡ n Cross-‑validaAon ¡ n ExpectaAon ¡MaximizaAon ¡(EM) ¡

Thumbtack ¡ n Let ¡θ ¡= ¡P(up), ¡ ¡1-‑θ ¡= ¡P(down) ¡ n How ¡to ¡determine ¡θ ¡? ¡ n Empirical ¡esAmate: ¡ ¡8 ¡up, ¡2 ¡down ¡ à ¡

hTp://web.me.com/todd6ton/Site/Classroom_Blog/Entries/ n 2009/10/7_A_Thumbtack_Experiment.html ¡

Maximum ¡Likelihood ¡ θ ¡= ¡P(up), ¡ ¡1-‑θ ¡= ¡P(down) ¡ n Observe: ¡ n Likelihood ¡of ¡the ¡observaAon ¡sequence ¡depends ¡on ¡θ: ¡ n Maximum ¡likelihood ¡finds ¡ ¡ n à extrema ¡at ¡θ ¡= ¡0, ¡ θ ¡= ¡1, ¡θ ¡= ¡0.8 ¡ à InspecAon ¡of ¡each ¡extremum ¡yields ¡ θ ML ¡= ¡0.8 ¡ ¡

Maximum ¡Likelihood ¡ More ¡generally, ¡consider ¡binary-‑valued ¡random ¡variable ¡with ¡θ ¡= ¡P(1), ¡1-‑θ ¡= ¡P(0), ¡assume ¡we ¡ n observe ¡ n 1 ¡ones, ¡and ¡ n 0 ¡zeros ¡ Likelihood: ¡ n DerivaAve: ¡ n Hence ¡we ¡have ¡for ¡the ¡extrema: ¡ n n1/(n0+n1) ¡is ¡the ¡maximum ¡ n = ¡empirical ¡counts. ¡ ¡ n

Log-‑likelihood ¡ The ¡funcAon ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ n is ¡a ¡monotonically ¡increasing ¡funcAon ¡of ¡x ¡ Hence ¡for ¡any ¡(posiAve-‑valued) ¡funcAon ¡f: ¡ n Oaen ¡more ¡convenient ¡to ¡opAmize ¡log-‑likelihood ¡rather ¡than ¡likelihood ¡ ¡ n Example: ¡ ¡ n ¡ ¡

Log-‑likelihood ¡ ßà ¡Likelihood ¡ Reconsider ¡thumbtacks: ¡8 ¡up, ¡2 ¡down ¡ n n Likelihood ¡ n Log-‑likelihood ¡ Concave ¡ Not ¡Concave ¡ DefiniAon: ¡A ¡funcAon ¡f ¡is ¡concave ¡if ¡and ¡only ¡ n Concave ¡funcAons ¡are ¡generally ¡easier ¡to ¡maximize ¡then ¡non-‑concave ¡ n funcAons ¡ ¡

Concavity ¡and ¡Convexity ¡ f is convex if and only f ¡is ¡ concave ¡if ¡and ¡only ¡ ¡ ¡ ¡ ¡ ¡ x 1 x 1 x 2 x 2 λ x 2 +(1- λ )x 2 λ x 2 +(1- λ )x 2 “Easy” to minimize “Easy” ¡to ¡maximize ¡

ML ¡for ¡MulAnomial ¡ n Consider ¡having ¡received ¡samples ¡

ML ¡for ¡Fully ¡Observed ¡HMM ¡ Given ¡samples ¡ n Dynamics ¡model: ¡ n ObservaAon ¡model: ¡ ¡ ¡ n ¡ ¡ ¡ à ¡Independent ¡ML ¡problems ¡for ¡each ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡and ¡each ¡ ¡

ML ¡for ¡ExponenAal ¡DistribuAon ¡ Source: wikipedia n Consider ¡having ¡received ¡samples ¡ n 3.1, ¡8.2, ¡1.7 ¡ ll

ML ¡for ¡ExponenAal ¡DistribuAon ¡ Source: wikipedia n Consider ¡having ¡received ¡samples ¡ ¡ ¡

Uniform ¡ n Consider ¡having ¡received ¡samples ¡ ¡ ¡

ML ¡for ¡Gaussian ¡ n Consider ¡having ¡received ¡samples ¡

ML ¡for ¡CondiAonal ¡Gaussian ¡ Equivalently: ¡ ¡ More ¡generally: ¡

ML ¡for ¡CondiAonal ¡Gaussian ¡

ML ¡for ¡CondiAonal ¡MulAvariate ¡Gaussian ¡

Aside: ¡Key ¡IdenAAes ¡for ¡DerivaAon ¡on ¡Previous ¡Slide ¡

ML ¡EsAmaAon ¡in ¡Fully ¡Observed ¡Linear ¡Gaussian ¡Bayes ¡Filter ¡Segng ¡ Consider ¡the ¡Linear ¡Gaussian ¡segng: ¡ n Fully ¡observed, ¡i.e., ¡given ¡ n à ¡Two ¡separate ¡ML ¡esAmaAon ¡problems ¡for ¡condiAonal ¡mulAvariate ¡ n Gaussian: ¡ ¡ 1: ¡ n 2: ¡ ¡ ¡ ¡ n

Priors ¡-‑-‑-‑ ¡Thumbtack ¡ Let ¡θ ¡= ¡P(up), ¡ ¡1-‑θ ¡= ¡P(down) ¡ n How ¡to ¡determine ¡θ ¡? ¡ n ML ¡esAmate: ¡ ¡5 ¡up, ¡0 ¡down ¡ à ¡ n Laplace ¡esAmate: ¡add ¡a ¡fake ¡count ¡of ¡1 ¡for ¡each ¡outcome ¡ n

Priors ¡-‑-‑-‑ ¡Thumbtack ¡ n AlternaAvely, ¡consider ¡θ ¡to ¡be ¡random ¡variable ¡ n Prior ¡P(θ) ¡ = C ¡θ(1-‑θ) ¡ n Measurements: ¡P( ¡x ¡| ¡θ ¡) ¡ n Posterior: ¡ n Maximum ¡A ¡Posterior ¡(MAP) ¡esAmaAon ¡ ¡ n = ¡find ¡θ ¡that ¡maximizes ¡the ¡posterior ¡ ¡ ¡ ¡ ¡ à ¡ ¡

Priors ¡-‑-‑-‑ ¡Beta ¡DistribuAon ¡ Figure source: Wikipedia

Priors ¡-‑-‑-‑ ¡Dirichlet ¡DistribuAon ¡ n Generalizes ¡Beta ¡distribuAon ¡ n MAP ¡esAmate ¡corresponds ¡to ¡adding ¡fake ¡ counts ¡ n 1 , ¡…, ¡ n K

MAP ¡for ¡Mean ¡of ¡Univariate ¡Gaussian ¡ Assume ¡variance ¡known. ¡ ¡(Can ¡be ¡extended ¡to ¡also ¡find ¡MAP ¡for ¡variance.) ¡ n n Prior: ¡ ¡

MAP ¡for ¡Univariate ¡CondiAonal ¡Linear ¡Gaussian ¡ Assume ¡variance ¡known. ¡ ¡(Can ¡be ¡extended ¡to ¡also ¡find ¡MAP ¡for ¡variance.) ¡ n Prior: ¡ ¡ n [Interpret!]

MAP ¡for ¡Univariate ¡CondiAonal ¡Linear ¡Gaussian: ¡Example ¡ TRUE --- Samples . ML --- MAP ---

Cross ¡ValidaAon ¡ Choice ¡of ¡prior ¡will ¡heavily ¡influence ¡quality ¡of ¡result ¡ n Fine-‑tune ¡choice ¡of ¡prior ¡through ¡cross-‑validaAon: ¡ n n 1. ¡Split ¡data ¡into ¡“training” ¡set ¡and ¡“validaAon” ¡set ¡ n 2. ¡For ¡a ¡range ¡of ¡priors, ¡ ¡ n Train: ¡compute ¡θ MAP on training set n Cross-validate: evaluate performance on validation set by evaluating the likelihood of the validation data under θ MAP just found n 3. ¡Choose ¡prior ¡with ¡highest ¡validaAon ¡score ¡ ¡ n For ¡this ¡prior, ¡compute ¡θ MAP on (training+validation) set ¡ Typical ¡training ¡/ ¡validaAon ¡splits: ¡ n 1-‑fold: ¡70/30, ¡random ¡split ¡ n 10-‑fold: ¡parAAon ¡into ¡10 ¡sets, ¡average ¡performance ¡for ¡each ¡set ¡being ¡the ¡validaAon ¡set ¡and ¡the ¡other ¡9 ¡being ¡the ¡training ¡set ¡ n

Outline ¡ n Maximum ¡likelihood ¡(ML) ¡ n Priors, ¡and ¡maximum ¡a ¡posteriori ¡(MAP) ¡ n Cross-‑validaAon ¡ n Expecta6on ¡Maximiza6on ¡(EM) ¡

Mixture ¡of ¡Gaussians ¡ Generally: ¡ n Example: ¡ ¡ n ML ¡ObjecAve: ¡given ¡data ¡z (1) , ¡…, ¡z (m) n Setting derivatives w.r.t. θ , µ , Σ equal to zero does not enable to solve for their ML estimates in closed form n We ¡can ¡evaluate ¡funcAon ¡ à ¡we ¡can ¡in ¡principle ¡perform ¡local ¡opAmizaAon. ¡ ¡In ¡this ¡lecture: ¡“EM” ¡algorithm, ¡which ¡is ¡typically ¡used ¡to ¡efficiently ¡opAmize ¡ the ¡objecAve ¡(locally) ¡ ¡

ExpectaAon ¡MaximizaAon ¡(EM) ¡ Example: ¡ n Model: ¡ n Goal: ¡ ¡ n Given ¡data ¡z (1) , ¡…, ¡z (m) (but no x (i) observed) ¡ n Find ¡maximum ¡likelihood ¡esAmates ¡of ¡μ 1 , ¡μ 2 n EM basic idea: if x (i) were known à two easy-to-solve separate ML problems n EM iterates over n E-step : For i=1,…,m fill in missing data x (i) according to what is most likely given the n current model ¹ M-step : run ML for completed data, which gives new model ¹ n

EM ¡DerivaAon ¡ EM ¡solves ¡a ¡Maximum ¡Likelihood ¡problem ¡of ¡the ¡form: ¡ n ¡ ¡ µ : ¡parameters ¡of ¡the ¡probabilisAc ¡model ¡we ¡try ¡to ¡find ¡ x: ¡unobserved ¡variables ¡ z: ¡observed ¡variables ¡ ¡ ¡ ¡ Jensen’s Inequality

Jensen’s ¡inequality ¡ Illustration: P(X= x 1 ) = 1- λ , P(X= x 2 ) = λ x 1 x 2 E[X] = λ x 1 +(1- λ )x 2

EM ¡DerivaAon ¡(ctd) ¡ Jensen’s Inequality: equality holds when is an affine function. This is achieved for EM ¡Algorithm: ¡Iterate ¡ ¡1. ¡E-‑step: ¡Compute ¡ ¡2. ¡M-‑step: ¡Compute ¡ ¡ ¡ ¡ M-step optimization can be done efficiently in most cases E-step is usually the more expensive step It does not fill in the missing data x with hard values, but finds a distribution q(x)

Maximum Likelihood (ML), Expecta6on Maximiza6on (EM) - PowerPoint PPT Presentation

Maximum Likelihood (ML), Expecta6on Maximiza6on (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox,

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Design and Implementa/on of a Carrier Grade So6ware Defined

Network Economics -- Lecture 2: Incen5ves in online

Probe and Pray: Using UPnP for Home Network Measurements

Return of the Hidden Number Problem A Widespread and Novel Key Extraction Attack on ECDSA and DSA

(or Informa5onized Force Opera5ons) Michael K. Daly November 4, 2009 What is meant by Advanced,

Sets of Arithmetical Invariants in Transfer Krull Monoids Alfred Geroldinger Spring Central and

NOTE Machine Learning for NLP: New Developments and Challenges These slides are still

Safety Check: A Semantic Web Application for Emergency Management Yogesh Pandey Srividya K

Sambuz

Useful Links

Newsletter

Mail Us