CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning - PowerPoint PPT Presentation

CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems Regression Multi-Class Problems General Steps of Machine Learning sscott@cse.unl.edu 1 / 21

Introduction CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott Introduction Supervised learning is most fundamental, “classic” form of Outline machine learning Learning a Class from “Supervised” part comes from the part of labels for Examples examples (instances) Noise and Other Problems Regression Multi-Class Problems General Steps of Machine Learning 2 / 21

Outline CSCE 478/878 Lecture 2: Learning a class from labeled examples Supervised Learning Definition Stephen Scott Thinking about C Hypotheses and error Introduction Margin Outline Learning a Noise and other problems Class from Examples Noise Noise and Model selection Other Inductive bias Problems Regression Regression Multi-Class Multi-class problems Problems General Steps General steps of machine learning of Machine Learning 3 / 21

Learning a Class from Examples CSCE 478/878 Lecture 2: Let C be the target concept to be learned Supervised Learning Think of C as a function that takes as input an example Stephen Scott (or instance ) and outputs a label Goal: Given a training set X = { ( x t , r t ) } N Introduction t = 1 where r t = C ( x t ) , output a hypothesis h ∈ H that approximates Outline Learning a C in its classifications of new instances Class from Examples Each instance x represented as a vector of attributes or Definitions Thinking about C features Hypotheses and Error E.g., let each x = ( x 1 , x 2 ) be a vector describing Margin attributes of a car; x 1 = price and x 2 = engine power Noise and Other In this example, label is binary (positive/negative, Problems yes/no, 1/0, + 1 / − 1 ) indicating whether instance x is a Regression “family car” Multi-Class Problems General Steps of Machine 4 / 21 Learning

Learning a Class from Examples (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error t� x� Margin 2� Noise and Other Problems Regression Multi-Class t� Problems x� 1� x� : Price� 1� General Steps of Machine 5 / 21 Learning

Thinking about C CSCE 478/878 Lecture 2: Supervised Learning Can think of target concept C as a function Stephen Scott In example, C is an axis-parallel box, equivalent to upper and lower bounds on each attribute Introduction Might decide to set H (set of candidate hypotheses) to Outline the same family that C comes from Learning a Class from Not required to do so Examples Can also think of target concept C as a set of positive Definitions Thinking about C instances Hypotheses and Error Margin In example, C the continuous set of all positive points in Noise and the plane Other Problems Use whichever is convenient at the time Regression Multi-Class Problems General Steps of Machine 6 / 21 Learning

Thinking about C (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x C� Introduction e� 2� Outline Learning a Class from Examples Definitions Thinking about C e� 1� Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class p� p� Problems 1� 2� x� : Price� 1� General Steps of Machine 7 / 21 Learning

Hypotheses and Error CSCE 478/878 A learning algorithm uses training set X and finds a Lecture 2: Supervised hypothesis h ∈ H that approximates C Learning In example, H can be set of all axis-parallel boxes Stephen Scott If C guaranteed to come from H , then we know that a Introduction perfect hypothesis exists Outline In this case, we choose h from the version space = Learning a Class from subset of H consistent with X Examples What learning algorithm can you think of to learn C ? Definitions Thinking about C Can think of two types of error (or loss ) of h Hypotheses and Error Empirical error is fraction of X that h gets wrong Margin Generalization error is probability that a new, randomly Noise and Other selected, instance is misclassified by h Problems Depends on the probability distribution over instances Regression Can further classify error as false positive and false Multi-Class Problems negative General Steps of Machine 8 / 21 Learning

Hypotheses and Error (cont’d) r e CSCE w o 478/878 p Lecture 2: e n Supervised i Learning g n E Stephen Scott : 2 x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps of Machine 9 / 21 Learning

Margin CSCE Since we will have many (infinitely?) choices of h , often will 478/878 choose one with maximum margin (min distance to any Lecture 2: Supervised point in X ) Learning r e Stephen Scott w o p e n Introduction i g n E Outline : 2 x Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps Why? of Machine 10 / 21 Learning

Noise and Other Problems CSCE 478/878 Lecture 2: Supervised Learning In reality, it’s unlikely that there exists an h ∈ H that is Stephen Scott perfect on X Introduction Could be noise in the data (attribute errors, labeling Outline errors) Learning a Could be attributes that are hidden or latent , which Class from Examples impact the label but are unobserved Noise and Could find a better (or even perfect) fit to X if we choose Other Problems a more powerful (expressive) hypothesis class H Noise Model Selection Is this a good idea? Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 11 / 21

Noise and Other Problems (cont’d) CSCE � x� 2 478/878 Lecture 2: Supervised Learning Stephen Scott h� 2� h� Introduction 1� Outline Learning a Class from Examples Noise and Other Problems Noise Model Selection Inductive Bias Regression x� Multi-Class 1� Problems For what reasons might we prefer h 1 over h 2 ? General Steps of Machine Learning 12 / 21

Model Selection CSCE Might prefer simpler hypothesis because it is: 478/878 Easier/more efficient to evaluate Lecture 2: Supervised Easier to train (fewer parameters) Learning Easier to describe/justify prediction Stephen Scott Better fits Occam’s Razor: Tend to prefer simpler Introduction explanation among similar ones Model selection is the act of choosing a hypothesis Outline class H Learning a Class from Need to balance H ’s complexity with that of the model Examples that labels the data: Noise and Other If H not sophisticated enough, might underfit and not Problems generalize well (e.g., fit line to data from cubic model) Noise If H too sophisticated, might overfit and not generalize Model Selection Inductive Bias well (e.g., fit the noise) Regression Can validate choice of h (and H ) if some data held back Multi-Class from X to serve as validation set Problems Still part of training, but not directly used to select h General Steps Independent test set often used to do final evaluation of of Machine Learning chosen h 13 / 21

Inductive Bias CSCE Must assume something about the learning task 478/878 Lecture 2: Supervised Otherwise, learning becomes rote memorization Learning Imagine allowing H to be set of arbitrary functions over Stephen Scott set of all possible instances Introduction Every hypothesis in version space V ⊆ H is consistent Outline with all instances in X Learning a For every other instance, exactly half the hypotheses in Class from V will predict positive, the rest negative (see next slide) Examples ⇒ No way to generalize on new, unseen instances without Noise and Other way to favor one hypothesis over another Problems Noise Inductive bias is a set of assumptions that we make to Model Selection enable generalization over rote memorization Inductive Bias Regression Manifests in choice of H Multi-Class Instead (or in addition), can have bias in preference of Problems some hypotheses over others (e.g., based on specificity General Steps of Machine or simplicity) Learning 14 / 21

Inductive Bias (cont’d) CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott E.g., if X = { ( � 0 , 0 , 0 � , +) , ( � 1 , 1 , 0 � , +) , ( � 0 , 1 , 0 � , − ) , ( � 1 , 0 , 1 � , − ) } then version space V is the set of truth Introduction tables satisfying Outline 000 + 010 − 100 110 + Learning a Class from 001 011 101 − 111 Examples Noise and Since there are 4 holes, |V| = 2 4 = 16 = number of Other Problems ways to fill holes, and for any yet unclassified example Noise Model Selection x , exactly half of hyps in V classify x as + and half as − Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 15 / 21

CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning - PowerPoint PPT Presentation

CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems

Introduction CSCE CSCE In Homework 1, you are (supposedly) 478/878 478/878 Lecture 4:

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural network, 478/878 478/878

Introduction CSCE CSCE If no label information is available, can still perform 478/878 478/878

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Introduction Decision Tree for PlayTennis (Mitchell) CSCE CSCE 478/878 478/878 Outlook

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU 2 Shell CSCE 625 TAMU

CSCE 478/878 Lecture 4: Artificial Neural Networks Stephen D. Scott (Adapted from Tom

CSCE 478/878 Lecture 2: Concept Learning General-to-specific ordering over hypotheses and the

CSCE 478/878 Lecture 3: Computational Learning Theory Examines the worst-case minimum and

CSCE 478/878 Lecture 5: Evaluating will misclassify an instance drawn at random accord-

CSCE 478/878 Lecture 8: Instance-Based Learning Stephen D. Scott (Adapted from Tom Mitchells

CSCE 478/878 Lecture 3: Computational Learning Theory Stephen D. Scott (Adapted from Tom

CSCE 478/878 Lecture 7: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchells slides)

Why Are We Here? CSCE CSCE 496/896 496/896 Lecture 10: Lecture 10: CSCE 496/896 Lecture 10:

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU CSCE 625: Artificial

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Large Margin Classification Using the Perceptron Algorithm (Part 2) Henry Tan Georgetown

An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using

To: Interested Parties From: GBAO Date: November 16, 2020 Poll Analysis: Michigan Educators On

Confidence Intervals II 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Polling:

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

DUNE Fitter Validation Daniel Cherdack Colorado State University DUNE LBPWG Meeting Monday July

COMP 138: Reinforcement Learning Instructor : Jivko Sinapov Webpage :

Sambuz

Useful Links

Newsletter

Mail Us