algorithms for machine learning
play

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of - PowerPoint PPT Presentation

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of CSA, IISc chibha@chalmers.se January 17, 2012 Agenda Introduction to classification Bayes Classifier Who is the person? Images of one person Who is the person? Images of one


  1. Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of CSA, IISc chibha@chalmers.se January 17, 2012

  2. Agenda Introduction to classification Bayes Classifier

  3. Who is the person? Images of one person

  4. Who is the person? Images of one person Is he the same person?

  5. Who is the person? Images of one person Is he the same person? easy

  6. Who is the person? Images of one person Is he the same person?

  7. Who is the person? Images of one person Is he the same person? not so easy

  8. Who is the person? Images of one person Is he the same person? not so easy But who is he? ALFRED NOBEL

  9. Introduction to Classification Lots of scope for improvement.

  10. The classification problem setup Alfred Nobel Bertha Von Suttner Objective From these images create a function, classifier, which can automatically recognize images of Nobel and Suttner

  11. The steps Step 1 Create representation from the Image, sometimes called a feature map. Step 2 From a training set and a feature map create a classifier Step 3 Evaluate the goodness of the classifier We will be concerned about Step 2 and Step 3.

  12. The classification problem setup Let ( X , Y ) ∼ P where P is a Distribution and D m = { ( X i , Y i ) | i . i . d X i , Y i ∼ P , i = 1 ,..., m } is a random sample Probability of misclassification R ( f ) = P ( f ( X ) � = Y )

  13. Finding the best classifier Suppose P ( Y = y | X = x ) was high then it is very likely that that x has the label y . Define η ( x ) = P ( Y = 1 | X = x ) , posterior probability computed from Bayes rule from Class-conditional densities P ( X = x | Y = y ) For 2 classes, f ∗ ( x ) = sign ( 2 η ( x ) − 1 ) is the Bayes classifier.

  14. Finding the best classifier Objective should be to choose f such that min f R ( f ) Theorem Let f be any other classifier and f ∗ be Bayes Classifier R ( f ) ≥ R ( f ∗ ) A very important result Bayes Classifier has the least error rate. R ( f ∗ ) is called the Bayes error-rate.

  15. Review Maximum Likelihood estimation Try to construct Bayes Classifier

  16. Naive Bayes Classifier Assume that the features are independent works well for many problems, specially on text classification

  17. Spam Emails

  18. Spam Emails

  19. Naive Bayes Classifier: Bernoulli model Create a feature list where each feature is on/off. Denote the feature map x = [ f 1 ,..., f d ] ⊤ P ( X = x | Y = y ) = ∏ d i = 1 P ( F i = f i | Y = y ) p 1 i = P ( F i = 1 | Y = 1 ) p 2 i = P ( F i = 1 | Y = 2 ) Bayes Classifier: Output the class with the higher score score 1 ( x ) = ∑ ( f i logp 1 i +( 1 − f i ) log ( 1 − p 1 i )) i similarly score 2 ( x )

  20. Naive Bayes: Bernoulli Source: Introduction to Information Retrieval. (Manning, Raghavan, Schutze) 13.3 The Bernoulli model 263 T RAIN B ERNOULLI NB ( C , D ) 1 V ← E XTRACT V OCABULARY ( D ) N ← C OUNT D OCS ( D ) 2 3 for each c ∈ C 4 do N c ← C OUNT D OCS I N C LASS ( D , c ) 5 prior [ c ] ← N c / N 6 for each t ∈ V do N ct ← C OUNT D OCS I N C LASS C ONTAINING T ERM ( D , c , t ) 7 8 condprob [ t ][ c ] ← ( N ct + 1 ) / ( N c + 2 ) 9 return V , prior , condprob A PPLY B ERNOULLI NB ( C , V , prior , condprob , d ) 1 V d ← E XTRACT T ERMS F ROM D OC ( V , d ) 2 for each c ∈ C 3 do score [ c ] ← log prior [ c ] 4 for each t ∈ V do if t ∈ V d 5 6 then score [ c ] += log condprob [ t ][ c ] 7 else score [ c ] += log ( 1 − condprob [ t ][ c ]) 8 return arg max c ∈ C score [ c ] � Figure 13.3 NB algorithm (Bernoulli model): Training and testing. The add-one smoothing in Line 8 (top) is in analogy to Equation (13.7) with B = 2.

  21. Discriminant functions Bayes Classifier � � d ∑ h ( x ) = sign f i θ i − b i = 1 θ i = log p 1 i ( 1 − p 2 i ) ( 1 − p 1 i ) p 2 i h ( x ) is sometimes called Discriminant functions

  22. Gaussian class conditional distributions Let the class conditional distributions be N ( µ 1 , Σ) and N ( µ 2 , Σ) . The Bayes classifier is given by h ( x ) = sign ( w ⊤ x − b ) w = Σ − 1 ( µ 1 − µ 2 )

  23. Fisher Discriminant Source: Pattern Recognition and Machine Learning (Chris Bishop) 4 4 2 2 0 0 −2 −2 −2 2 6 −2 2 6

  24. Fisher Discriminant Let ( µ 1 , Σ 1 ) be the mean and covariance of class 1 and ( µ 2 , Σ 2 ) be the mean and covariance of class 2. � 2 w ⊤ ( µ 1 − µ 2 ) � J ( w ) = max w w ⊤ Sw w = S − 1 ( µ 1 − µ 2 ) S = Σ 1 +Σ 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend