Bias, Variance and Error Bias and Variance given algorithm that - PowerPoint PPT Presentation

Bias, Variance and Error

Bias and Variance given algorithm that outputs estimate for , we define: the bias of the estimator: the variance of estimator: e.g., estimator for probability of heads, based on n independent coin flips what is its bias? variance?

Bias and Variance given algorithm that outputs estimate for , we define: the bias of the estimator: the variance of estimator: which estimator has higher bias? higher variance?

Bias – Variance decomposition of error Reading: Bishop chapter 9.1, 9.2 • Consider simple regression problem f:X à Y y = f(x) + ε noise N(0, σ ) deterministic Define the expected prediction error: expectation learned over estimate of f(x) training D

Sources of error What if we have perfect learner, infinite data? – Our learned h(x) satisfies h(x)=f(x) – Still have remaining, unavoidable error σ 2

Sources of error • What if we have only n training examples? • What is our expected error – Taken over random training sets of size n, drawn from distribution D=p(x,y)

Sources of error

L2 vs. L1 Regularization Gaussian P(W) Laplace P(W) à L2 regularization à L1 regularization constant P(Data|W) w2 w2 w1 w1 constant P(W)

Summary • Bias of parameter estimators • Variance of parameter estimators • We can define analogous notions for estimators (learners) of functions • Expected error in learned functions comes from – unavoidable error (invariant of training set size, due to noise) – bias (can be caused by incorrect modeling assumptions) – variance (decreases with training set size) • MAP estimates generally more biased than MLE – but bias vanishes as training set size à • Regularization corresponds to producing MAP estimates – L2 / Gaussian prior / leads to smaller weights – L1 / Laplace prior / leads to fewer non-zero weights

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Readings: • Bishop chapter 8, through 8.2 • Graphical models • Bayes Nets: • Representing distributions • Conditional independencies • Simple inference • Simple learning

Graphical Models • Key Idea: – Conditional independence assumptions useful – but Naïve Bayes is extreme! – Graphical models express sets of conditional independence assumptions via graph structure – Graph structure plus associated parameters define joint probability distribution over set of variables 10-601 • Two types of graphical models: – Directed graphs (aka Bayesian Networks) – Undirected graphs (aka Markov Random Fields)

Graphical Models – Why Care? • Among most important ML developments of the decade • Graphical models allow combining: – Prior knowledge in form of dependencies/independencies – Prior knowledge in form of priors over parameters – Observed training data • Principled and ~general methods for – Probabilistic inference – Learning • Useful in practice – Diagnosis, help systems, text analysis, time series models, ...

Conditional Independence Definition : X is conditionally independent of Y given Z, if the probability distribution governing X is independent of the value of Y, given the value of Z Which we often write E.g.,

Marginal Independence Definition : X is marginally independent of Y if Equivalently, if Equivalently, if

Represent Joint Probability Distribution over Variables

Describe network of dependencies

Bayes Nets define Joint Probability Distribution in terms of this graph, plus parameters Benefits of Bayes Nets: • Represent the full joint distribution in fewer parameters, using prior knowledge about dependencies • Algorithms for inference and learning

Bayesian Networks Definition A Bayes network represents the joint probability distribution over a collection of random variables A Bayes network is a directed acyclic graph and a set of conditional probability distributions (CPD’s) • Each node denotes a random variable • Edges denote dependencies • For each node X i its CPD defines P(X i | Pa(X i )) • The joint distribution over all variables is defined to be Pa(X) = immediate parents of X in the graph

Bayesian Network Nodes = random variables A conditional probability distribution (CPD) StormClouds is associated with each node N, defining P(N | Parents(N)) Parents P(W|Pa) P(¬W|Pa) L, R 0 1.0 Rain Lightning L, ¬R 0 1.0 ¬L, R 0.2 0.8 ¬L, ¬R 0.9 0.1 WindSurf WindSurf Thunder The joint distribution over all variables:

What can we say about conditional Bayesian Network independencies in a Bayes Net? One thing is this: Each node is conditionally independent of StormClouds its non-descendents, given only its immediate parents. Parents P(W|Pa) P(¬W|Pa) Rain L, R 0 1.0 Lightning L, ¬R 0 1.0 ¬L, R 0.2 0.8 ¬L, ¬R 0.9 0.1 WindSurf Thunder WindSurf

Some helpful terminology Parents = Pa(X) = immediate parents Antecedents = parents, parents of parents, ... Children = immediate children Descendents = children, children of children, ...

Bayesian Networks • CPD for each node X i describes P(X i | Pa(X i )) Chain rule of probability says that in general: But in a Bayes net:

How Many Parameters? StormClouds Parents P(W|Pa) P(¬W|Pa) L, R 0 1.0 L, ¬R 0 1.0 Rain Lightning ¬L, R 0.2 0.8 ¬L, ¬R 0.9 0.1 WindSurf WindSurf Thunder To define joint distribution in general? To define joint distribution for this Bayes Net?

Inference in Bayes Nets StormClouds Parents P(W|Pa) P(¬W|Pa) L, R 0 1.0 L, ¬R 0 1.0 Rain Lightning ¬L, R 0.2 0.8 ¬L, ¬R 0.9 0.1 WindSurf WindSurf Thunder P(S=1, L=0, R=1, T=0, W=1) =

Learning a Bayes Net StormClouds Parents P(W|Pa) P(¬W|Pa) L, R 0 1.0 L, ¬R 0 1.0 Rain Lightning ¬L, R 0.2 0.8 ¬L, ¬R 0.9 0.1 WindSurf WindSurf Thunder Consider learning when graph structure is given, and data = { <s,l,r,t,w> } What is the MLE solution? MAP?

Bias, Variance and Error Bias and Variance given algorithm that - PowerPoint PPT Presentation

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we define: the bias of the estimator: the variance of estimator: e.g., estimator for probability of heads, based on n independent coin

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

The role of over-parametrisation in NNs The role of over-parametrisation in NNs Levent Sagun,

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Lecture 22: Point Estimation 0/ 23 Today we start Chapter 6 and with it the statistics port of

CROSS-BORDER HEALTHCARE AND EUROPEAN UNION LAW Ferrara, 15 th March 2016 Fabiana Panin -

PREFACE SOFTWARE TESTING What is it? Ben Simo Ben@QualityFrog.com Sep-12 2 1 9/8/2012

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

Bias, Variance and Error Bias and Variance given algorithm that - PowerPoint PPT Presentation

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we define: the bias of the estimator: the variance of estimator: e.g., estimator for probability of heads, based on n independent coin

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

The role of over-parametrisation in NNs The role of over-parametrisation in NNs Levent Sagun,

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Estimation &amp; Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Lecture 22: Point Estimation 0/ 23 Today we start Chapter 6 and with it the statistics port of

CROSS-BORDER HEALTHCARE AND EUROPEAN UNION LAW Ferrara, 15 th March 2016 Fabiana Panin -

PREFACE SOFTWARE TESTING What is it? Ben Simo Ben@QualityFrog.com Sep-12 2 1 9/8/2012

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)