cs 4100 artificial intelligence
play

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression - PDF document

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are


  1. CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Linear Classifiers

  2. Feature Vectors Hello, SP SPAM # free : 2 YOUR_NAME : 0 Do you want free printr or or MISSPELLED : 2 cartriges? Why pay more FROM_FRIEND : 0 when you can get them + ... ABSOLUTELY FREE! Just PIXEL-7,12 : 1 “2 “2” PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ... Some (Simplified) Biology • Ve Very loose se insp spiration: human neurons

  3. Linear Classifiers • In Inputs s are fe feature values • Ea Each feature has s a we weight • Su Sum is the act activat ation • If If the activa vation is: s: w 1 f 1 S • Po Positive ve , output +1 +1 w 2 >0? f 2 w 3 • Ne Negative , output -1 f 3 Weights • Bi Bina nary case: compare features to a weight vector • Le Learni ning ng: figure out the weight vector from examples # free : 4 YOUR_NAME :-1 # free : 2 MISSPELLED : 1 YOUR_NAME : 0 FROM_FRIEND :-3 MISSPELLED : 2 ... FROM_FRIEND : 0 ... # free : 0 YOUR_NAME : 1 MISSPELLED : 1 Do Dot t pr produ duct t po positive itive FROM_FRIEND : 1 me means the positive class ...

  4. Decision Rules Binary Decision Rule • In In the sp space of feature ve vectors • Examples are points • Any weight vector is a hyperplane • One side corresponds to Y= Y=+1 • Other corresponds to Y= Y=-1 money 2 +1 = SPAM 1 BIAS : -3 free : 4 money : 2 0 ... -1 = HAM 0 1 free

  5. Weight Updates Learning: Binary Perceptron • St Start with we weights = = 0 • Fo For each training instance: • Cl Classify with current weights • If If co correct ect: (i.e., y=y*), no change! • If If wrong: adjust the weight vector

  6. Learning: Binary Perceptron • St Start with we weights = = 0 • Fo For each training instance: • Cl Classify with current weights • If If co correct ect: (i.e., y= y=y* y* ), no change! • If If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* y* is -1 . Examples: Perceptron • Separable Case

  7. Multiclass Decision Rule • If If we e hav ave e multiple e cl clas asses es: • A we weig ight vector for each class: • Sc Score (activation) of a class y: • Prediction with hi highe ghest st sc scor ore wins Binary = multiclass where the negative class has weight zero Learning: Multiclass Perceptron Start with all we • St weights = = 0 • Pi Pick training examples one by one • Pr Predi dict with current weights • If If c correct: no change! • If If wr wrong: lower score of wrong answer, raise score of right answer

  8. Example: Multiclass Perceptron Qu Question: What will the weights w be for each class after 3 updates? y 1 = “p “politics” , x 1 = “wi “win the vote” y 2 = “p “politics” , x 2 = “wi “win the election” y 3 = “s “sports” , x 3 = “wi “win the game” ” BIAS : 1 BIAS : 0 BIAS : 0 win : 0 win : 0 win : 0 game : 0 game : 0 game : 0 vote : 0 vote : 0 vote : 0 the : 0 the : 0 the : 0 ... ... ... Example: Multiclass Perceptron Question: What will the weights w be for each class after 3 updates? Qu sports f( x 1 ) = 1 y 1 = “p “politics” , x 1 = “wi w sp “win the vote” Prediction: Pr politics f( x 1 ) = 0 y 2 = “p “politics” , x 2 = “wi w po “win the election” “sports” “s (wr (wrong) y 3 = “s “sports” , x 3 = “wi “win the game” ” w te tech f( x 1 ) = 0 1 BIAS : 1 - 1 BIAS : 0 + 1 BIAS : 0 1 win : 0 - 1 win : 0 + 1 win : 0 f( x 1 ) = 0 game : 0 - 0 game : 0 + 0 game : 0 1 vote : 0 - 1 vote : 0 + 1 vote : 0 1 the : 0 - 1 the : 0 + 1 the : 0 ... ... ...

  9. Example: Multiclass Perceptron Question: What will the weights w be for each class after 3 updates? Qu w sp sports f( x 1 ) = -2 y 1 = “p “politics” , x 1 = “wi “win the vote” Prediction: Pr w po politics f( x 1 ) = 3 y 2 = “p “politics” , x 2 = “wi “win the election” “p “politics” (c (correct) y 3 = “s “sports” , x 3 = “wi “win the game” ” w te tech f( x 1 ) = -3 1 BIAS : 0 BIAS : 1 BIAS : 0 1 win : -1 win : 1 win : 0 f( x 2 ) = 0 game : 0 game : 0 game : 0 0 vote : -1 vote : 1 vote : 0 1 the : -1 the : 1 the : 0 ... ... ... Example: Multiclass Perceptron Question: What will the weights w be for each class after 3 updates? Qu sports f( x 1 ) = -2 y 1 = “p “politics” , x 1 = “wi w sp “win the vote” Prediction: Pr politics f( x 1 ) = 3 y 2 = “p “politics” , x 2 = “wi w po “win the election” “p “politics” (wr (wrong) y 3 = “s “sports” , x 3 = “wi “win the game” ” w te tech f( x 1 ) = -3 1 BIAS : 0 + 1 BIAS : 1 - 1 BIAS : 0 1 win : -1 + 1 win : 1 - 1 win : 0 f( x 3 ) = 1 game : 0 + 1 game : 0 - 1 game : 0 0 vote : -1 + 0 vote : 1 - 0 vote : 0 1 the : -1 + 1 the : 1 - 1 the : 0 ... ... ...

  10. Example: Multiclass Perceptron Question: What will the weights w be for each class after 3 updates? Qu y 1 = “p “politics” , x 1 = “wi “win the vote” y 2 = “p “politics” , x 2 = “wi “win the election” y 3 = “s “sports” , x 3 = “wi “win the game” ” BIAS : 1 BIAS : 0 BIAS : 0 win : 0 win : 0 win : 0 game : 1 game : -1 game : 0 vote : -1 vote : 1 vote : 0 the : 0 the : 0 the : 0 ... ... ... Properties of Perceptrons Separable • Se Separability: y: tr true if there exists weights w w that get the training set perfectly correct • Co Conv nvergenc nce: if the training data are se separable , a perceptron will eventually converge (binary case) δ • Mistake ke Bound: the maximum number of mistakes (updates) Non-Separable (binary case) is related to the num number of featur ures k and the ma margin δ or degree of separability

  11. Problems with the Perceptron • No Noise: if the data isn’t separable, weights might thrash • Av Averaging weight vectors over time can help (averaged perceptron) ation: finds a • Med Mediocr cre e gen ener eral alizat “barely” separating solution • Ov Overtraining: te test t / held-ou out t accuracy usually rises, then falls • Overtraining is a kind of overfitting Improving the Perceptron

  12. Non-Separable Case: Deterministic Decision Even the best linear boundary makes at least one mistake Non-Separable Case: Probabilistic Decision 0.9 | 0.1 0.7 | 0.3 0.5 | 0.5 0.3 | 0.7 0.1 | 0.9

  13. How to get probabilistic decisions? • Pe Perceptron on scor orin ing: g: z = w · f ( x ) • If If very po positive à want probability going to 1 z = w · f ( x ) very ne ive à want probability going to 0 • If If negative z = w · f ( x ) • Sigmoid Sigmoid fu function ion 1 φ ( z ) = 1 + e − z Best w? • Maximum like kelihood estimation: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i 1 P ( y ( i ) = +1 | x ( i ) ; w ) = wi with th: 1 + e − w · f ( x ( i ) ) 1 P ( y ( i ) = − 1 | x ( i ) ; w ) = 1 − 1 + e − w · f ( x ( i ) ) Th This is is is calle lled Lo Logis istic ic Regressio ion

  14. Separable Case: Deterministic Decision – Many Options Separable Case: Probabilistic Decision – Clear Preference 0.7 | 0.3 0.5 | 0.5 0.7 | 0.3 0.3 | 0.7 0.5 | 0.5 0.3 | 0.7

  15. Multiclass Logistic Regression • Re Recall Perceptron: n: • A we weig ight vector for each class: • Sc Score (activation) of a class y: • Prediction with hi highe ghest st sc scor ore wins ns • Ho How w to tur urn n sc scores s in into pr proba babi bilities? ? e z 1 e z 2 e z 3 z 1 , z 2 , z 3 → e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 original activations softmax activations Best w? • Maximum like kelihood estimation: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i e w y ( i ) · f ( x ( i ) ) P ( y ( i ) | x ( i ) ; w ) = with wi th: y e w y · f ( x ( i ) ) P Th This is is is calle lled Mu Multi-Cl Class L ss Logist stic R Regressi ssion

  16. Next Lecture • Op Opti timizati tion • i.e., how do we solve: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend