Bias- -Variance Theory Variance Theory Bias Decompose Error Rate - PowerPoint PPT Presentation

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some Decompose Error Rate into components, some of which can be measured on unlabeled data of which can be measured on unlabeled data Bias- -Variance Variance Decomposition for Regression Decomposition for Regression Bias Bias- -Variance Decomposition for Classification Variance Decomposition for Classification Bias Bias- -Variance Analysis of Learning Algorithms Variance Analysis of Learning Algorithms Bias Effect of Bagging on Bias and Variance Effect of Bagging on Bias and Variance Effect of Boosting on Bias and Variance Effect of Boosting on Bias and Variance Summary and Conclusion Summary and Conclusion

Bias- -Variance Analysis in Variance Analysis in Bias Regression Regression ε True function is y = f(x) + ε True function is y = f(x) + ε is normally distributed with zero mean where ε – where is normally distributed with zero mean – σ . and standard deviation σ . and standard deviation Given a set of training examples, {(x i , y i )}, Given a set of training examples, {(x i , y i )}, · x + b to we fit an hypothesis h(x) = w · x + b to we fit an hypothesis h(x) = w the data to minimize the squared error the data to minimize the squared error Σ i Σ )] 2 2 [y i – h(x h(x i i [y i – i )]

Example: 20 points Example: 20 points y = x + 2 sin(1.5x) + N(0,0.2) y = x + 2 sin(1.5x) + N(0,0.2)

50 fits (20 examples each) 50 fits (20 examples each)

Bias- -Variance Analysis Variance Analysis Bias Now, given a new data point x* (with Now, given a new data point x* (with ε ), we would observed value y* = f(x*) + ε ), we would observed value y* = f(x*) + like to understand the expected prediction like to understand the expected prediction error error 2 ] h(x*)) 2 E[ (y* – – h(x*)) ] E[ (y*

Classical Statistical Analysis Classical Statistical Analysis Imagine that our particular training sample Imagine that our particular training sample S is drawn from some population of S is drawn from some population of possible training samples according to possible training samples according to P(S). P(S). 2 ] h(x*)) 2 Compute E P [ (y* – – h(x*)) ] Compute E P [ (y* Decompose this into “ “bias bias” ”, , “ “variance variance” ”, , Decompose this into and “ “noise noise” ” and

Lemma Lemma Let Z be a random variable with probability Let Z be a random variable with probability distribution P(Z) distribution P(Z) Let Z Z = E = E P [ Z ] be the average value of Z. Let P [ Z ] be the average value of Z. 2 ] = E[Z 2 ] 2 Lemma: E[ (Z – – Z Z) ) 2 ] = E[Z 2 ] – – Z Z 2 Lemma: E[ (Z 2 ] = E[ Z 2 – 2 ] E[ (Z – – Z Z) ) 2 ] = E[ Z 2 – 2 Z 2 Z Z Z + + Z Z 2 ] E[ (Z 2 ] 2 = E[Z 2 ] – – 2 E[Z] 2 E[Z] Z Z + + Z Z 2 = E[Z 2 + 2 ] 2 = E[Z 2 ] – – 2 2 Z Z 2 + Z Z 2 = E[Z 2 ] 2 = E[Z 2 ] – – Z Z 2 = E[Z 2 ] + 2 ] = E[ (Z 2 Corollary: E[Z 2 ] = E[ (Z – – Z Z) ) 2 ] + Z Z 2 Corollary: E[Z

Bias- -Variance Variance- -Noise Noise Bias Decomposition Decomposition 2 ] = E[ h(x*) 2 – 2 ] y*) 2 ] = E[ h(x*) 2 2 h(x*) y* + y* 2 E[ (h(x*) – – y*) – 2 h(x*) y* + y* ] E[ (h(x*) = E[ h(x*) 2 2 ] 2 E[ h(x*) ] E[y*] + E[y* 2 2 ] ] – – 2 E[ h(x*) ] E[y*] + E[y* ] = E[ h(x*) 2 ] + ) 2 ) 2 2 = E[ (h(x*) – – h(x*) h(x*)) ] + h(x* h(x*) = E[ (h(x*) (lemma) (lemma) – 2 2 h(x*) h(x*) f(x*) f(x*) – 2 ] + f(x*) f(x*)) 2 ] + f(x*) 2 2 + E[ (y* – – f(x*)) + E[ (y* (lemma) (lemma) 2 ] + [variance] ) 2 = E[ (h(x*) – – h(x*) h(x*)) ] + [variance] = E[ (h(x*) 2 + [bias f(x*)) 2 + [bias 2 2 ] (h(x*) h(x*) – – f(x*)) ] ( 2 ] [noise] f(x*)) 2 E[ (y* – – f(x*)) ] [noise] E[ (y*

Derivation (continued) Derivation (continued) 2 ] = y*) 2 E[ (h(x*) – – y*) ] = E[ (h(x*) 2 ] + = E[ (h(x*) – – h(x*) h(x*)) ) 2 ] + = E[ (h(x*) 2 + f(x*)) 2 (h(x*) h(x*) – – f(x*)) + ( 2 ] E[ (y* – – f(x*)) f(x*)) 2 ] E[ (y* 2 + E[ 2 ] ε 2 + E[ ε = Var(h(x*)) + Bias(h(x*)) 2 ] = Var(h(x*)) + Bias(h(x*)) 2 + σ 2 + σ 2 = Var(h(x*)) + Bias(h(x*)) 2 = Var(h(x*)) + Bias(h(x*)) 2 + Noise 2 Expected prediction error = Variance + Bias 2 + Noise 2 Expected prediction error = Variance + Bias

Bias, Variance, and Noise Bias, Variance, and Noise 2 ] Variance: E[ (h(x*) Variance: E[ (h(x*) – – h(x*) h(x*)) ) 2 ] Describes how much h(x*) varies from one Describes how much h(x*) varies from one training set S to another training set S to another Bias: [ Bias: [h(x*) h(x*) – – f(x*)] f(x*)] Describes the average average error of h(x*). error of h(x*). Describes the 2 ] = E[ ε 2 σ 2 ] = E[ ε ] = σ Noise: E[ (y* Noise: f(x*)) 2 2 ] = 2 E[ (y* – – f(x*)) Describes how much y* varies from f(x*) Describes how much y* varies from f(x*)

Bias Bias

Variance Variance

Noise Noise

Distribution of predictions at Distribution of predictions at x=2.0 x=2.0

Distribution of predictions at Distribution of predictions at x=5.0 x=5.0

Measuring Bias and Variance Measuring Bias and Variance In practice (unlike in theory), we have only In practice (unlike in theory), we have only ONE training set S. ONE training set S. We can simulate multiple training sets by We can simulate multiple training sets by bootstrap replicates bootstrap replicates – S S’ ’ = {x | x is drawn at random with = {x | x is drawn at random with – replacement from S} and |S’ ’| = |S|. | = |S|. replacement from S} and |S

Procedure for Measuring Bias Procedure for Measuring Bias and Variance and Variance Construct B bootstrap replicates of S (e.g., Construct B bootstrap replicates of S (e.g., B = 200): S 1 , … …, S , S B B = 200): S 1 , B Apply learning algorithm to each replicate Apply learning algorithm to each replicate S b to obtain hypothesis h b S b to obtain hypothesis h b Let T b = S \ \ S S b be the data points that do Let T b = S b be the data points that do not appear in S b (out of bag out of bag points) points) not appear in S b ( Compute predicted value h b (x) for each x Compute predicted value h b (x) for each x in T b in T b

Estimating Bias and Variance Estimating Bias and Variance (continued) (continued) For each data point x, we will now have For each data point x, we will now have the observed corresponding value y and the observed corresponding value y and several predictions y 1 , … …, y , y K . several predictions y 1 , K . Compute the average prediction h h. . Compute the average prediction Estimate bias as (h h – – y) y) Estimate bias as ( Σ k Estimate variance as Σ 2 /(K ) 2 (y k – h h) /(K – – 1) 1) Estimate variance as k (y k – Assume noise is 0 Assume noise is 0

Approximations in this Approximations in this Procedure Procedure Bootstrap replicates are not real data Bootstrap replicates are not real data We ignore the noise We ignore the noise – If we have multiple data points with the same If we have multiple data points with the same – x value, then we can estimate the noise x value, then we can estimate the noise – We can also estimate noise by pooling y We can also estimate noise by pooling y – values from nearby x values values from nearby x values

Ensemble Learning Methods Ensemble Learning Methods Given training sample S Given training sample S Generate multiple hypotheses, h 1 , h 2 , … …, , Generate multiple hypotheses, h 1 , h 2 , h L . h L . Optionally: determining corresponding Optionally: determining corresponding weights w 1 , w 2 , … …, , w w L weights w 1 , w 2 , L Classify new points according to Classify new points according to ∑ l θ ∑ > θ w l h l l w l h l >

Bagging: Bootstrap Aggregating Bagging: Bootstrap Aggregating For b = 1, … …, B do , B do For b = 1, – S S b = bootstrap replicate of S – b = bootstrap replicate of S – Apply learning algorithm to S Apply learning algorithm to S b to learn h b – b to learn h b Classify new points by unweighted unweighted vote: vote: Classify new points by ∑ b [ ∑ – [ h b (x)]/B > 0 )]/B > 0 – b h b (x

Bagging Bagging Bagging makes predictions according to Bagging makes predictions according to Σ b y = Σ h b (x) / B y = b h b (x) / B Hence, bagging’ ’s predictions are s predictions are h h(x) (x) Hence, bagging

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate - PowerPoint PPT Presentation

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some Decompose Error Rate into components, some of which can be measured on unlabeled data of which can be measured on unlabeled data Bias- -Variance

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance

The role of over-parametrisation in NNs The role of over-parametrisation in NNs Levent Sagun,

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

LT Matt DeShazo Medical Service Corps, United States Navy Navy Medical Officer Program Manager

2019 QPP Final Rule: Understanding CMSs Quality Payment Program With Highlights from the PFS

Texas Department of Licensing and Regulation Texas Military Summit January 13, 2020 Brian

Welcome! If not using speakers and you havent already, please call into the call center number

Virtual Visits Mary Graham Director of Network Innovation & Education Our stakeholders Plan

Palliative Care Partnerships: Leveraging Collaboration to Improve Access to Community Based

Welcome! The Webinar will Begin Shortly Technical Assistance FAQs 1. Why cant I hear anything?

AAP ZIKA ECHO (EXTENSION FOR COMMUNITY HEALTHCARE OUTCOMES) Zika in Infants and Pregnancy (ZIP)

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate - PowerPoint PPT Presentation

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some Decompose Error Rate into components, some of which can be measured on unlabeled data of which can be measured on unlabeled data Bias- -Variance

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance

The role of over-parametrisation in NNs The role of over-parametrisation in NNs Levent Sagun,

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

LT Matt DeShazo Medical Service Corps, United States Navy Navy Medical Officer Program Manager

2019 QPP Final Rule: Understanding CMSs Quality Payment Program With Highlights from the PFS

Texas Department of Licensing and Regulation Texas Military Summit January 13, 2020 Brian

Welcome! If not using speakers and you havent already, please call into the call center number

Virtual Visits Mary Graham Director of Network Innovation &amp; Education Our stakeholders Plan

Palliative Care Partnerships: Leveraging Collaboration to Improve Access to Community Based

Welcome! The Webinar will Begin Shortly Technical Assistance FAQs 1. Why cant I hear anything?

AAP ZIKA ECHO (EXTENSION FOR COMMUNITY HEALTHCARE OUTCOMES) Zika in Infants and Pregnancy (ZIP)

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Virtual Visits Mary Graham Director of Network Innovation & Education Our stakeholders Plan