csce 478 878 lecture 2
play

CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning - PowerPoint PPT Presentation

CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems


  1. CSCE 478/878 Lecture 2: CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction Outline Learning a Stephen Scott Class from Examples Noise and Other (Adapted from Ethem Alpaydin) Problems Regression Multi-Class Problems General Steps of Machine Learning sscott@cse.unl.edu 1 / 21

  2. Introduction CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott Introduction Supervised learning is most fundamental, “classic” form of Outline machine learning Learning a Class from “Supervised” part comes from the part of labels for Examples examples (instances) Noise and Other Problems Regression Multi-Class Problems General Steps of Machine Learning 2 / 21

  3. Outline CSCE 478/878 Lecture 2: Learning a class from labeled examples Supervised Learning Definition Stephen Scott Thinking about C Hypotheses and error Introduction Margin Outline Learning a Noise and other problems Class from Examples Noise Noise and Model selection Other Inductive bias Problems Regression Regression Multi-Class Multi-class problems Problems General Steps General steps of machine learning of Machine Learning 3 / 21

  4. Learning a Class from Examples CSCE 478/878 Lecture 2: Let C be the target concept to be learned Supervised Learning Think of C as a function that takes as input an example Stephen Scott (or instance ) and outputs a label Goal: Given a training set X = { ( x t , r t ) } N Introduction t = 1 where r t = C ( x t ) , output a hypothesis h ∈ H that approximates Outline Learning a C in its classifications of new instances Class from Examples Each instance x represented as a vector of attributes or Definitions Thinking about C features Hypotheses and Error E.g., let each x = ( x 1 , x 2 ) be a vector describing Margin attributes of a car; x 1 = price and x 2 = engine power Noise and Other In this example, label is binary (positive/negative, Problems yes/no, 1/0, + 1 / − 1 ) indicating whether instance x is a Regression “family car” Multi-Class Problems General Steps of Machine 4 / 21 Learning

  5. Learning a Class from Examples (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error t� x� Margin 2� Noise and Other Problems Regression Multi-Class t� Problems x� 1� x� : Price� 1� General Steps of Machine 5 / 21 Learning

  6. Thinking about C CSCE 478/878 Lecture 2: Supervised Learning Can think of target concept C as a function Stephen Scott In example, C is an axis-parallel box, equivalent to upper and lower bounds on each attribute Introduction Might decide to set H (set of candidate hypotheses) to Outline the same family that C comes from Learning a Class from Not required to do so Examples Can also think of target concept C as a set of positive Definitions Thinking about C instances Hypotheses and Error Margin In example, C the continuous set of all positive points in Noise and the plane Other Problems Use whichever is convenient at the time Regression Multi-Class Problems General Steps of Machine 6 / 21 Learning

  7. Thinking about C (cont’d) � CSCE r e w 478/878 o Lecture 2: p Supervised e Learning n i g n Stephen Scott E : 2� � x C� Introduction e� 2� Outline Learning a Class from Examples Definitions Thinking about C e� 1� Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class p� p� Problems 1� 2� x� : Price� 1� General Steps of Machine 7 / 21 Learning

  8. Hypotheses and Error CSCE 478/878 A learning algorithm uses training set X and finds a Lecture 2: Supervised hypothesis h ∈ H that approximates C Learning In example, H can be set of all axis-parallel boxes Stephen Scott If C guaranteed to come from H , then we know that a Introduction perfect hypothesis exists Outline In this case, we choose h from the version space = Learning a Class from subset of H consistent with X Examples What learning algorithm can you think of to learn C ? Definitions Thinking about C Can think of two types of error (or loss ) of h Hypotheses and Error Empirical error is fraction of X that h gets wrong Margin Generalization error is probability that a new, randomly Noise and Other selected, instance is misclassified by h Problems Depends on the probability distribution over instances Regression Can further classify error as false positive and false Multi-Class Problems negative General Steps of Machine 8 / 21 Learning

  9. Hypotheses and Error (cont’d) r e CSCE w o 478/878 p Lecture 2: e n Supervised i Learning g n E Stephen Scott : 2 x Introduction Outline Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps of Machine 9 / 21 Learning

  10. Margin CSCE Since we will have many (infinitely?) choices of h , often will 478/878 choose one with maximum margin (min distance to any Lecture 2: Supervised point in X ) Learning r e Stephen Scott w o p e n Introduction i g n E Outline : 2 x Learning a Class from Examples Definitions Thinking about C Hypotheses and Error Margin Noise and Other Problems Regression Multi-Class Problems General Steps Why? of Machine 10 / 21 Learning

  11. Noise and Other Problems CSCE 478/878 Lecture 2: Supervised Learning In reality, it’s unlikely that there exists an h ∈ H that is Stephen Scott perfect on X Introduction Could be noise in the data (attribute errors, labeling Outline errors) Learning a Could be attributes that are hidden or latent , which Class from Examples impact the label but are unobserved Noise and Could find a better (or even perfect) fit to X if we choose Other Problems a more powerful (expressive) hypothesis class H Noise Model Selection Is this a good idea? Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 11 / 21

  12. Noise and Other Problems (cont’d) CSCE � x� 2 478/878 Lecture 2: Supervised Learning Stephen Scott h� 2� h� Introduction 1� Outline Learning a Class from Examples Noise and Other Problems Noise Model Selection Inductive Bias Regression x� Multi-Class 1� Problems For what reasons might we prefer h 1 over h 2 ? General Steps of Machine Learning 12 / 21

  13. Model Selection CSCE Might prefer simpler hypothesis because it is: 478/878 Easier/more efficient to evaluate Lecture 2: Supervised Easier to train (fewer parameters) Learning Easier to describe/justify prediction Stephen Scott Better fits Occam’s Razor: Tend to prefer simpler Introduction explanation among similar ones Model selection is the act of choosing a hypothesis Outline class H Learning a Class from Need to balance H ’s complexity with that of the model Examples that labels the data: Noise and Other If H not sophisticated enough, might underfit and not Problems generalize well (e.g., fit line to data from cubic model) Noise If H too sophisticated, might overfit and not generalize Model Selection Inductive Bias well (e.g., fit the noise) Regression Can validate choice of h (and H ) if some data held back Multi-Class from X to serve as validation set Problems Still part of training, but not directly used to select h General Steps Independent test set often used to do final evaluation of of Machine Learning chosen h 13 / 21

  14. Inductive Bias CSCE Must assume something about the learning task 478/878 Lecture 2: Supervised Otherwise, learning becomes rote memorization Learning Imagine allowing H to be set of arbitrary functions over Stephen Scott set of all possible instances Introduction Every hypothesis in version space V ⊆ H is consistent Outline with all instances in X Learning a For every other instance, exactly half the hypotheses in Class from V will predict positive, the rest negative (see next slide) Examples ⇒ No way to generalize on new, unseen instances without Noise and Other way to favor one hypothesis over another Problems Noise Inductive bias is a set of assumptions that we make to Model Selection enable generalization over rote memorization Inductive Bias Regression Manifests in choice of H Multi-Class Instead (or in addition), can have bias in preference of Problems some hypotheses over others (e.g., based on specificity General Steps of Machine or simplicity) Learning 14 / 21

  15. Inductive Bias (cont’d) CSCE 478/878 Lecture 2: Supervised Learning Stephen Scott E.g., if X = { ( � 0 , 0 , 0 � , +) , ( � 1 , 1 , 0 � , +) , ( � 0 , 1 , 0 � , − ) , ( � 1 , 0 , 1 � , − ) } then version space V is the set of truth Introduction tables satisfying Outline 000 + 010 − 100 110 + Learning a Class from 001 011 101 − 111 Examples Noise and Since there are 4 holes, |V| = 2 4 = 16 = number of Other Problems ways to fill holes, and for any yet unclassified example Noise Model Selection x , exactly half of hyps in V classify x as + and half as − Inductive Bias Regression Multi-Class Problems General Steps of Machine Learning 15 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend