linear discrimination
play

Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1 Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination


  1. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1

  2. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Discriminant-Based Classification 1 Linearly Separable Systems Pairwise Separation Posteriors 2 Logistic Discrimination 3 2

  3. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Likelihood-based: Assume a model for p ( � x | C i ). Use Bayes’ rule to calculate P ( C i | � x ) x ) = log P ( C i | � g i ( � x ) x | � Discriminant-based: Assume a model for g i ( � φ i ). Vapnik: Estimating the class densities is a harder problem than estimating the class discriminants. It does not make sense to solve a hard problem to solve an easier one. 3

  4. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation 4

  5. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute 4

  6. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute x | C i ) are Gaussian with shared covariance Optimal when p ( � matrix 4

  7. Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute x | C i ) are Gaussian with shared covariance Optimal when p ( � matrix Useful when classes are (almost) linearly separable 4

  8. Discriminant-Based Classification Posteriors Logistic Discrimination More General Linear Models d � x | � g i ( � w i , w i 0 ) = w ij x j + w i 0 j =1 We can replace the x i on the right by any linearly independent set of basis functions: x ) − g 2 ( � g ( � x ) = g 1 ( � x ) w T � = x + w 0 � � C 1 if g ( � x ) > 0 Choose C 2 ow 5

  9. Discriminant-Based Classification Posteriors Logistic Discrimination Geometric Interpretation Rewrite � x as w � � x = � x p + r || � w || where � x p is the projection of � x onto the hyperplane g ( � x ) = 0 w is normal to � the hyperplane r = g ( � x ) w || is the || � (signed) distance 6

  10. Discriminant-Based Classification Posteriors Logistic Discrimination Linearly Separable Systems For multiple classes with w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 with the � w i normalized Choose C i if k g i ( � x ) = max j =1 g j ( � x ) 7

  11. Discriminant-Based Classification Posteriors Logistic Discrimination Pairwise Separation If not linearly separable, compute discriminants between each pair of classes: w T x | � g ij ( � w ij , w ij 0 ) = � ij � x + w ij 0 Choose C i if ∀ j � = i , g ij ( � x ) > 0 8

  12. Discriminant-Based Classification Posteriors Logistic Discrimination Revisiting Parametric Methods When p ( � x | C i ) ∼ N ( � µ, Σ), w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 w i = Σ − 1 � � µ i w i 0 = − 1 i Σ − 1 � µ T 2 � µ i + log P ( C i ) Let y ≡ P ( C 1 | � x ). Then P ( C 2 | � x ) = 1 − y y We choose C 1 if y > 0 . 5, or alternatively, if 1 − y > 1. � � y Equivalently, if log > 0 1 − y The latter is called the log odds of y or logit . 9

  13. Discriminant-Based Classification Posteriors Logistic Discrimination log odds For 2 normal classes with a shared cov. matrix, the log odds is linear log P ( C 1 | � x ) logit ( P ( C 1 | � x )) = P ( C 2 | � x ) log P ( � x | C 1 ) x | C 2 ) + log P ( C 1 ) = P ( � P ( C 2 ) x | C 2 ) + log P ( C 1 ) = log P ( � x | C 1 ) − log P ( � P ( C 2 ) The P ( � x | C ) terms are exponential in � x (Gaussian pdf), so the log is linear w T � logit ( P ( C 1 | � x )) = � x + w 0 w = Σ − 1 ( � µ 2 ), w 0 = − 1 µ 2 ) T Σ − 1 ( � with � µ 1 − � 2 ( � µ 1 + � µ 1 + � µ 2 ) 10

  14. Discriminant-Based Classification Posteriors Logistic Discrimination logistic The inverse of the logit function: w T � logit ( P ( C 1 | � x )) = � x + w 0 is called the logistic a.k.a. the sigmoid : 1 w T � P ( C 1 | � x ) = sigmoid ( � x + w 0 ) = w T � 1 + exp[ � x + w 0 ] 11

  15. Discriminant-Based Classification Posteriors Logistic Discrimination Using the Sigmoid During training During training, estimate m 1 , � � m 2 , S , then compute the � w During testing, either Calculate x | � w T � g ( � w , w 0 ) = � x + w 0 and choose C i if g ( � x ) > 0, or Calculate w T � y = sigmoid ( � x + w 0 ) and choose C i if y > 0 . 5 12

  16. Discriminant-Based Classification Posteriors Logistic Discrimination Logistic Discrimination For two classes, assume the log likelihood ratio is linear log p ( � x | C 1 ) w T � x | C 2 ) = � x + w 0 p ( � w T � logit ( p ( C 1 )) = � x + w 0 1 y = ˆ P ( C 1 | � x ) = w T � 1 + exp [ � x + w 0 ] Likelihood � ( y t ) r t (1 − y t ) 1 − r t l ( � w , w 0 |X ) = t Error (“cross-entropy”) r t log y t + (1 − r t ) log (1 − y t ) � E ( � w , w 0 |X ) = − t Train by numerical optimization to minimize E 13

  17. Discriminant-Based Classification Posteriors Logistic Discrimination Estimating w 14

  18. Discriminant-Based Classification Posteriors Logistic Discrimination Multiple classes For K classes, take C K as a reference class log p ( � x | C i ) w T � x | C K ) = � x + w 0 p ( � p ( C i | � x ) � � w T � x ) = exp � x + w 0 p ( C K | � w T � � � exp i � x + w i 0 y i = ˆ P ( C i | � x ) = � � 1 + � K w T j =1 exp � j � x + w j 0 This is called the softmax function because exponentiation combined with normalization tends to exaggerate weight of the maximum term Likelihood i ) r t � � ( y t l ( � w , w 0 |X ) = i t i 15

  19. Discriminant-Based Classification Posteriors Logistic Discrimination Multiple classes (cont.) Error (“cross-entropy)”) � � r t i log y t w , w 0 |X ) = − E ( � i t i Train by numerical optimization to minimize E 16

  20. Discriminant-Based Classification Posteriors Logistic Discrimination Softmax Classification 17

  21. Discriminant-Based Classification Posteriors Logistic Discrimination Softmax Discriminants 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend