linear discrimination
play

Linear Discrimination Discriminant-Based Classification 1 Linear - PowerPoint PPT Presentation

Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly Separable


  1. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly Separable Systems Pairwise Separation Steven J Zeil Old Dominion Univ. Posteriors 2 Fall 2010 Logistic Discrimination 3 1 2 Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Linear Discrimination Likelihood-based: Assume a model for p ( � x | C i ). Use Bayes’ rule to Linear discriminant: calculate P ( C i | � x ) d � w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 g i ( � x ) = log P ( C i | � x ) j =1 x | � Discriminant-based: Assume a model for g i ( � φ i ). Advantages: Vapnik: Estimating the class densities is a harder Simple: O(d) space/computation problem than estimating the class discriminants. It Knowledge extraction: Weights sizes give an indication of does not make sense to solve a hard problem to solve significance of contribution of each attribute an easier one. Optimal when p ( � x | C i ) are Gaussian with shared covariance matrix Useful when classes are (almost) linearly separable 3 4

  2. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination More General Linear Models Geometric Interpretation Rewrite � x as d � � w g i ( � x | � w i , w i 0 ) = w ij x j + w i 0 � x = � x p + r || � w || j =1 We can replace the x i on the right by any linearly independent set where � x p is the of basis functions: projection of � x onto the hyperplane x ) − g 2 ( � g ( � x ) = g 1 ( � x ) g ( � x ) = 0 w T � = � x + w 0 w is normal to � � C 1 the hyperplane if g ( � x ) > 0 r = g ( � x ) Choose w || is the C 2 ow || � (signed) distance 5 6 Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Linearly Separable Systems Pairwise Separation For multiple classes with If not linearly separable, compute discriminants w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 between each pair of classes: with the � w i normalized Choose C i if w T g ij ( � x | � w ij , w ij 0 ) = � x + w ij 0 ij � k g i ( � x ) = max j =1 g j ( � x ) Choose C i if ∀ j � = i , g ij ( � x ) > 0 7 8

  3. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Revisiting Parametric Methods log odds When p ( � x | C i ) ∼ N ( � µ, Σ), For 2 normal classes with a shared cov. matrix, the log odds is linear w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 log P ( C 1 | � x ) logit ( P ( C 1 | � x )) = w i = Σ − 1 � P ( C 2 | � x ) � µ i x | C 1 ) log P ( � x | C 2 ) + log P ( C 1 ) w i 0 = − 1 µ T i Σ − 1 � = µ i + log P ( C i ) 2 � P ( � P ( C 2 ) x | C 2 ) + log P ( C 1 ) Let y ≡ P ( C 1 | � x ). Then P ( C 2 | � x ) = 1 − y = log P ( � x | C 1 ) − log P ( � y P ( C 2 ) We choose C 1 if y > 0 . 5, or alternatively, if 1 − y > 1. � � y Equivalently, if log > 0 1 − y The P ( � x | C ) terms are exponential in � x (Gaussian pdf), so the log The latter is called the log odds of y or logit . is linear w T � logit ( P ( C 1 | � x )) = � x + w 0 µ 2 ), w 0 = − 1 w = Σ − 1 ( � µ 2 ) T Σ − 1 ( � µ 1 − � with � 2 ( � µ 1 + � µ 1 + � µ 2 ) 9 10 Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination logistic Using the Sigmoid The inverse of the logit function: During training During training, estimate w T � logit ( P ( C 1 | � x )) = � x + w 0 m 1 , � � m 2 , S , then compute the � w is called the logistic a.k.a. the sigmoid : During testing, either Calculate 1 w T � P ( C 1 | � x ) = sigmoid ( � x + w 0 ) = w T � x | � g ( � w , w 0 ) = � x + w 0 w T � 1 + exp[ � x + w 0 ] and choose C i if g ( � x ) > 0, or Calculate w T � y = sigmoid ( � x + w 0 ) and choose C i if y > 0 . 5 11 12

  4. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Logistic Discrimination Estimating w For two classes, assume the log likelihood ratio is linear log p ( � x | C 1 ) w T � x | C 2 ) = � x + w 0 p ( � w T � logit ( p ( C 1 )) = � x + w 0 1 y = ˆ P ( C 1 | � x ) = w T � 1 + exp [ � x + w 0 ] Likelihood ( y t ) r t (1 − y t ) 1 − r t � l ( � w , w 0 |X ) = t Error (“cross-entropy”) r t log y t + (1 − r t ) log (1 − y t ) � E ( � w , w 0 |X ) = − t Train by numerical optimization to minimize E 13 14 Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Multiple classes Multiple classes (cont.) For K classes, take C K as a reference class Error (“cross-entropy)”) log p ( � x | C i ) w T � x | C K ) = � x + w 0 p ( � � � r t i log y t w , w 0 |X ) = − E ( � i p ( C i | � x ) t � � i w T � x ) = exp � x + w 0 p ( C K | � Train by numerical optimization to minimize E � � w T � exp x + w i 0 i � y i = ˆ P ( C i | � x ) = � � 1 + � K w T j =1 exp � j � x + w j 0 This is called the softmax function because exponentiation combined with normalization tends to exaggerate weight of the maximum term Likelihood i ) r t � � ( y t w , w 0 |X ) = l ( � i t i 15 16

  5. Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Posteriors Logistic Discrimination Softmax Classification Softmax Discriminants 17 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend