Inductive Bias: How to generalize on novel data CS 478 - Inductive - PowerPoint PPT Presentation

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1

Non-Linear Tasks l Linear Regression will not generalize well to the task below l Needs a non-linear surface – Could use one or our future models l Could also do a feature pre-process as with the quadric machine – For example, we could use an arbitrary polynomial in x – Thus it is still linear in the coefficients, and can be solved with delta rule 1 X + β 2 X 2 + … + β n X n Y = β 0 + β – What order polynomial should we use? – Overfit issues can occur CS 478 – Inductive Bias 2

Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 3

Regression Regularization l How to avoid overfit – Keep the model simple – For regression, keep the function smooth l Assume sample points are drawn from f ( x ) with added noise l Regularization approach: Model ( h ) selection – Minimize F ( h ) = Error ( h ) + λ·Complexity ( h ) – Tradeoff accuracy vs complexity l Ridge Regression (L2 regularization) – Minimize: – F ( w ) = TSS( w ) + λ|| w || 2 = S ( predicted i – actual i ) 2 + λ S w i 2 – Gradient of F ( w ): (Weight decay) Δ w i = c ( t − net ) x i − λ w i – Especially useful when the features are a non-linear transform from the initial features (e.g. polynomials in x ) – Also when the number of initial features is greater than the number of examples – Lasso regression uses an L1 vs an L2 weight penalty: - TSS( w ) - λ S | w i | and thus decay is just - λ since derivative drops weight from the term CS 478 - Regression 4

Hypothesis Space l The Hypothesis space H is the set of all possible models h which can be learned by the current learning algorithm – e.g. Set of possible weight settings for a perceptron l Restricted hypothesis space – Can be easier to search – May avoid overfit since they are usually simpler (e.g. linear or low order decision surface) – Often will underfit l Unrestricted Hypothesis Space – Can represent any possible function and thus can fit the training set well – Mechanisms must be used to avoid overfit CS 478 - Inductive Bias 5

Avoiding Overfit - Regularization Regularization: any modification we make to learning algorithm that is l intended to reduce its generalization error but not its training error Occam’s Razor – William of Ockham (c. 1287-1347) l Favor simplest explanation which fits the data – Simplest accurate model: accuracy vs. complexity trade-off. Find h Î H l which minimizes an objective function of the form: F ( h ) = Error ( h ) + λ·Complexity ( h ) Complexity could be number of nodes, size of tree, magnitude of weights, order of – decision surface, etc. L2 and L1 common . More Training Data (vs. overtraining on same data) l Also Data set augmentation – Fake data, Can be very effective, Jitter, but take care… – Denoising – add random noise to inputs during training – can act as a regularizer – Adding noise to nodes, weights, outputs, etc. e.g. Dropout (discuss with ensembles) – Most common regularization approach: Early Stopping – Start with simple l model (small parameters/weights) and stop training as soon as we attain good generalization accuracy (before parameters get large) Use a validation Set (next slide: requires separate test set) – Will discuss other approaches with specific models l CS 478 - Inductive Bias 6

Stopping/Model Selection with Validation Set SSE Validation Set Training Set Epochs (new h at each) There is a different model h after each epoch l Select a model in the area where the validation set accuracy flattens l When no improvement occurs over m epochs – The validation set comes out of training set data l Still need a separate test set to use after selecting model h to predict future l accuracy Simple and unobtrusive, does not change objective function, etc l Can be done in parallel on a separate processor – Can be used alone or in conjunction with other regularizers – CS 478 - Inductive Bias 7

Inductive Bias l The approach used to decide how to generalize novel cases l One common approach is Occam’s Razor – The simplest hypothesis which explains/fits the data is usually the best l Many other rationale biases and variations ABC ⇒ Z AB C ⇒ Z ABC ⇒ Z AB C ⇒ Z A B C ⇒ Z A BC ⇒ ? l When you get the new input Ā B C . What is your output? CS 478 - Inductive Bias 8

One Definition for Inductive Bias Inductive Bias: Any basis for choosing one generalization over another, other than strict consistency with the observed training instances Sometimes just called the Bias of the algorithm (don't confuse with the bias weight in a neural network). Bias-Variance Trade-off – Will discuss in more detail when we discuss ensembles CS 478 - Inductive Bias 9

Some Inductive Bias Approaches l Restricted Hypothesis Space - Can just try to minimize error since hypotheses are already simple – Linear or low order threshold function – k-DNF, k-CNF, etc. – Low order polynomial l Preference Bias – Prefer one hypothesis over another even though they have similar training accuracy – Occam’s Razor – “Smallest” DNF representation which matches well – Shallow decision tree with high information gain – Neural Network with low validation error and small magnitude weights CS 478 - Inductive Bias 10

Need for Bias 2 2 n Boolean functions of n inputs x1 x2 x3 Class Possible Consistent Function Hypotheses 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 ? CS 478 - Inductive Bias 11

Need for Bias 2 2 n Boolean functions of n inputs x1 x2 x3 Class Possible Consistent Function Hypotheses 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 ? 0 CS 478 - Inductive Bias 12

Need for Bias 2 2 n Boolean functions of n inputs x1 x2 x3 Class Possible Consistent Function Hypotheses 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 ? 0 1 CS 478 - Inductive Bias 13

Need for Bias 2 2 n Boolean functions of n inputs x1 x2 x3 Class Possible Consistent Function Hypotheses 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 ? 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Without an Inductive Bias we have no rationale to choose one hypothesis over another and thus a random guess would be as good as any other option. CS 478 - Inductive Bias 14

Need for Bias 2 2 n Boolean functions of n inputs x1 x2 x3 Class Possible Consistent Function Hypotheses 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 ? 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Inductive Bias guides which hypothesis we should prefer? What happens in this case if we use simplicity (Occam’s Razor) as our inductive Bias ? CS 478 - Inductive Bias 15

Learnable Problems l “Raster Screen” Problem l Pattern Theory Regularity in a task – Compressibility – l Don’t care features and Impossible states l Interesting/Learnable Problems What we actually deal with – Can we formally characterize them? – l Learning a training set vs. generalizing A function where each output is set randomly (coin-flip) – Output class is independent of all other instances in the data set – l Computability vs. Learnability (Optional) CS 478 - Inductive Bias 16

Computable and Learnable Functions l Can represent any function with a look-up table (Addition) – Finite function/table – Fixed/capped input size – Infinite function/table – arbitrary finite input size – All finite functions are computable – Why? – Infinite addition computable because it has regularity which allows us to represent the infinite table with a finite representation/program l Random function – outputs are set randomly – Can we compute these? – Can we learn these? l Assume learnability means we can get better than random when classifying novel examples l Arbitrary functions – Which are computable? l Arbitrary functions – Which are learnable? CS 478 - Inductive Bias 17

Computability and Learnability – Finite Problems l Finite problems assume finite number of mappings (Finite Table) – Fixed input size arithmetic – Random memory in a RAM l Learnable: Can do better than random on novel examples CS 478 - Inductive Bias 18

Computability and Learnability – Finite Problems l Finite problems assume finite number of mappings (Finite Table) – Fixed input size arithmetic – Random memory in a RAM l Learnable: Can do better than random on novel examples Finite Problems All are Computable Learnable Problems: Those with Regularity CS 478 - Inductive Bias 19

Computability and Learnability – Infinite Problems l Infinite number of mappings (Infinite Table) – Arbitrary input size arithmetic – Halting Problem (no limit on input size) – Do two arbitrary strings match CS 478 - Inductive Bias 20

Inductive Bias: How to generalize on novel data CS 478 - Inductive - PowerPoint PPT Presentation

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l Linear Regression will not generalize well to the task below l Needs a non-linear surface Could use one or our future models l Could also do a

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

DMIP DMIP team DMIP DMIP team team team Data Mining and Inductive Data Mining and Inductive

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Inductive types in Coq Wessel van Staal November 23, 2012 Inductive types Inductive nattree :

Inductive Types for Free Representing Nested Inductive Types using W-types Michael Abbott (U.

Interpreting inductive-inductive definitions as indexed inductive definitions Fredrik Nordvall

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Inductive Theorem Proving Automated Reasoning Petros Papapanagiotou

Inductive Definitions with Inference Rules 1 / 25 Outline Introduction Specifying inductive

Inductive Programming A Unifying Framework for Analysis and Evaluation of Inductive Programming

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Machine Learning 2 DS 4420 - Spring 2020 Bias and fairness Byron C. Wallace Material in this

Long-Range Planning and Behavioral Biases: A Computational Approach Jon Kleinberg Including

Understanding Unconscious/Implicit Bias Tuesday, April 3, 2018 1:00 p.m. 1:50 p.m. NIH

ANLP Lecture 29: Gender Bias in NLP Sharon Goldwater 19 Nov 2019 Recap Some co- reference

Gender Bias in Contextualized Word Embeddings Jieyu Zhao 1 , Tianlu Wang 2 , Mark Yatskar 3 ,

Implicit Bias Definition : The assumptions, attitudes or stereotypes that affect our

Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy Kareem Amin, Alex

When asking about preventative measures for a given health condition, a provider commented

Inductive Bias: How to generalize on novel data CS 478 - Inductive - PowerPoint PPT Presentation

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l Linear Regression will not generalize well to the task below l Needs a non-linear surface Could use one or our future models l Could also do a

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

DMIP DMIP team DMIP DMIP team team team Data Mining and Inductive Data Mining and Inductive

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Inductive types in Coq Wessel van Staal November 23, 2012 Inductive types Inductive nattree :

Inductive Types for Free Representing Nested Inductive Types using W-types Michael Abbott (U.

Interpreting inductive-inductive definitions as indexed inductive definitions Fredrik Nordvall

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Inductive Theorem Proving Automated Reasoning Petros Papapanagiotou

Inductive Definitions with Inference Rules 1 / 25 Outline Introduction Specifying inductive

Inductive Programming A Unifying Framework for Analysis and Evaluation of Inductive Programming

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Machine Learning 2 DS 4420 - Spring 2020 Bias and fairness Byron C. Wallace Material in this

Long-Range Planning and Behavioral Biases: A Computational Approach Jon Kleinberg Including

Understanding Unconscious/Implicit Bias Tuesday, April 3, 2018 1:00 p.m. 1:50 p.m. NIH

ANLP Lecture 29: Gender Bias in NLP Sharon Goldwater 19 Nov 2019 Recap Some co- reference

Gender Bias in Contextualized Word Embeddings Jieyu Zhao 1 , Tianlu Wang 2 , Mark Yatskar 3 ,

Implicit Bias Definition : The assumptions, attitudes or stereotypes that affect our

Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy Kareem Amin, Alex

When asking about preventative measures for a given health condition, a provider commented

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias