Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1

Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 2

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution 3

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters 3

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance 3

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution 3

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution Then use that distribution to make decisions 3

Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) 4

Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) 4

Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) 4

Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) In this chapter, we use single variables ( � x = [ x ]) 4

Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i 5

Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Gaussian (Normal): − ( x − µ ) 2 1 � � √ p ( x ) = exp 2 σ 2 2 πσ 6

Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 7

Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) 8

Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? 8

Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? Because the x t are indep, the likelihood of θ given X is N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 8

Distributions Estimating Distribution Parameters Parametric Classification Regression Maximum Likelihood Estimation (MLE) Likelihood N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 In MLE, find the θ that makes X the most likely to be seen Search for θ that maximizes l ( θ |X ) To simplify, we often instead maximize the log likelihood : N � log p ( x t | θ ) L ( θ |X ) ≡ log l ( θ |X ) = t =1 Maximum Likelihood Estimator θ ∗ = argmax θ L ( θ |X ) 9

Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) � p x t 0 (1 − p 0 ) (1 − x t ) L ( p 0 |X ) = log t t x t � MLE : p 0 = N Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i x t � � L ( p 1 , p 2 , . . . , p k |X ) = log p i i t i t x t � i MLE : p i = N 10

Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Gaussian (Normal): − ( x − µ ) 2 1 � � p ( x ) = N ( µ, σ 2 ) = √ exp 2 σ 2 2 πσ t x t � MLE for µ : m = N t ( x t − m ) 2 � MLE for σ 2 : s 2 = N 11

Distributions Estimating Distribution Parameters Parametric Classification Regression Bias and Variance Population X drawn from p ( x | θ ) Estimator of θ , d i = d ( X i ) on sample X i Bias: b θ ( d ) = E [ d ] − θ � ( d − E [ d ]) 2 � Variance: E Mean square error: ( d − θ ) 2 � � r ( d , θ ) = E ( E [ d ] − θ ) 2 + E � ( d − E [ d ]) 2 � = Bias 2 + Variance = 12

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators If form is tractable, we can do Bayes’ estimator 13

Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) 14

Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) Maximum Likelihood (ML) θ ML = argmax θ p ( X| θ ) 14

Distributions Estimating Distribution Parameters Parametric Classification Regression Bayes’ Estimator Bayes: � θ Bayes = E [ θ |X ] = θ p ( θ |X ) d θ Example: x t ∼ N ( θ, σ 2 0 ) and θ ∼ N ( µ, σ 2 ) Let m be mean of the sample By the Central limit theorem, the distribution of even a non-normal poupulation’s mean is approx. normal, centered on the population mean, with a standard dev. of σ √ N θ ML = m θ MAP = θ Bayes = N /σ 2 1 /σ 2 0 E [ θ |X ] = 0 + 1 /σ 2 m + 0 + 1 /σ 2 µ N /σ 2 N /σ 2 15

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 - PowerPoint PPT Presentation

Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating Distribution Parameters Parametric Classification Regression

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Reachability In Parametric Timed Automata With Two Parametric Clocks And One Parameter Is

Dose-response evaluation using a combined parametric/non-parametric approach John-Philip Lawo

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

OPA in our lab: TOPAS C 1 Optical Parametric Generation Sum frequency generation: Parametric

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

A Composite Randomized Incremental Gradient Method Junyu Zhang (University of Minnesota) and

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Logistics Midterm we will be in two rooms The room you are assigned to depends on the first

RTX-RSim Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging Peter Thoman,

Supplemental notes: Kuhn-Tucker first-order conditions P. Dybvig Minimization problem (like in

Generalized Polynomial Decomposition for S-boxes with Application to Side-Channel Countermeasures

GI using Deep Parameter Tuning Mark Fan Wu Wes Weimer Yue Jia Jens Krinke Harman Why GI for