outline
play

Outline Learning from Examples 1 Motivation Supervised Learning - PowerPoint PPT Presentation

Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Outline Learning from Examples 1 Motivation


  1. Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Outline Learning from Examples 1 Motivation Supervised Learning Aspects of Supervised Learning Classification 2 Steven J Zeil Hypotheses Noise Old Dominion Univ. Multiple Classes Fall 2010 Regression 3 Model Selection & Generalization 4 Recap 5 1 2 Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Motivation Example: Spam Detection We have a collection of data Training set: A collection of email messages Humans have flagged some as spam. Some instances of which are known to be examples of an Knowledge Extraction: interesting class We scan the messages for “typical” spam vocabulary: names We wish to learn a rule by which we can determine of drugs, “mortgage”, Nigeria, . . . membership in that class. We also scan for irregularities in the formal content obfuscated URLs routing via open SMTP relays mismatches between “From:” domain and IP address of origin, etc. Input Representation: � x , x i , 0 ≤ i < k : count of number of occurrences of spam term w i x i , k ≤ i < n : 1 if irregularity z i − k is present, 0 ow. 3 4

  2. Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Example: “Family” Cars Aspects of Supervised Learning Is car x a “family car”? We have a sample x t , r t } N X : { � Knowledge Extraction: What features define a “family” car? t =1 that is independent and identically distributed Training set: positive (+) and negative ( − ) examples ordering is not important Input representation: x 1 : price, x 2 : engine power all instances are drawn from the same joint distribution p ( � x , r ) A model g ( � x | θ ) θ are the parameters An procedure to find the “best” parameter values θ ∗ 5 6 Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Training Set X : Family Cars Hypothesis Class x t , r t } X : { � � x 1 � Input: � x = x 2 Label: � 1if � x is a family car r = 0otherwise We have a hypothesis class H parameterized by p 1 , p 2 , e 1 , e 2 ), ( p 1 ≤ price ≤ p 2 ) ∧ ( e 1 ≤ enginepower ≤ e 2 ) 7 8 We want to “learn” a hypothesis h ∈ H to approximate our

  3. Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Empirical Error Version Space In this case, there are an Given an hypothesis h that infinite number of returns 1 for a positive hypotheses (the version prediction and 0 for a space ) for which negative prediction, E ( h |X ) = 0. The error of the hypothesis: Suppose that the blue area is the “true” class N If we choose the yellow � x t ) � = r t )) E ( h |X ) = 1( h ( � region as our h , the pure t =1 yellow bars represent false positives, the pure blue represent false negatives 9 10 Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Doubt Model Complexity What if not every point can be The most specific hypothesis accounted for by H ? S is the tightest rectangle Use a more complicated enclosing the positive hypothesis class, or examples. Stay with the simpler one prone to false negatives and tolerate a non-zero The most general hypothesis error? G is the largest rectangle enclosing the positive Depends partly on whether we examples but containing no believe this is noise or a failure of negative examples. modeling. prone to false positives Perhaps we should choose something in between? S ⊆ G , G − S is the region of doubt 11 12

  4. Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Coping with Noise Learning Multiple Classes Arguments in favor of staying with the simpler model Easier/faster to work with Easier to train (fewer examples needed) Easier to explain May generalize better (Occam’s razor) Training set is x t ,� r t } N X : { � t =1 where � 1 if � x t ∈ C i r t i = x t ∈ C j , j � = i 0 if � r t partitions the input space (even if the { C i } do Notice that � not). 13 14 Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Multiple Class Hypotheses Regression x t , r t } X = { � r t ∈ R r t = f ( x t ) + ǫ Train hypotheses h i ( � x ), i = 1 , . . . , k : N E ( g |X ) = 1 r t − g ( x t ) � � � � 1 if � N x t ∈ C i t =1 x t ) = h i ( � x t ∈ C j , j � = i 0 if � g ( x ) = w 1 x + w 0 We train k hypotheses to recognize k classes, w 2 x 2 + w 1 x + w 0 g ( x ) = Each class against the world 15 16

  5. Learning from Examples Classification Regression Model Selection & Generalization Recap Learning from Examples Classification Regression Model Selection & Generalization Recap Regression Triple Trade-Off There is a trade-off between three factors Learning is an ill-posed problem ; Complexity of H, c ( H ), 1 data is usually not sufficient to find Training set size, N 2 a unique solution. Generalization error on new data, E 3 As N increases, so does E Assumptions about H form our inductive basis As c ( H ) increases, E initially decreases and then increases Generalization : how well does a model perform on new data? Overfitting : H is more complex than C (or f ) Underfitting : H is less complex g ( x ) = w 1 x + w 0 than C (or f ) w 2 x 2 + w 1 x + w 0 g ( x ) = 17 18 Learning from Examples Classification Regression Model Selection & Generalization Recap Recap: Aspects of Supervised Learning We have a sample x t , r t } N X : { � t =1 that is independent and identically distributed ordering is not important all instances are drawn from the same joint distribution p ( � x , r ) A model g ( � x | θ ) θ are the parameters A Loss function L ( · ) computing the different between the desired output r t and our approximation to it, g ( � x t | θ ) Approximation Error or Loss � r t , g ( � x t | θ ) � � E ( θ |X ) = L t An optimization procedure to find θ ∗ θ ∗ = arg min θ E ( θ |X ) 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend