BBM406 Fundamentals of Machine Learning Lecture 9: Logistic - PowerPoint PPT Presentation

Illustration: Theodore Modis BBM406 Fundamentals of   Machine Learning Lecture 9: Logistic Regression Discriminative vs. Generative Classification Aykut Erdem // Hacettepe University // Fall 2019

Last time… Naïve Bayes Classifier – Given : – ,… – Class prior P(Y) – d conditionally independent features X 1 ,… X d given the – class label Y – For each X i feature, we have the conditional likelihood P(X i |Y) Naïve Bayes Decision rule: slide by Barnabás Póczos & Aarti Singh 2

Last time… Naïve Bayes Algorithm for discrete features discrete features We need to estimate these probabilities! Estimators For Class Prior For Likelihood slide by Barnabás Póczos & Aarti Singh NB Prediction for test data: 19 3

Last time… Text Classification MEDLINE Article MeSH Subject Category   Hierarchy • Antogonists and Inhibitors • Blood Supply ? • Chemistry • Drug Therapy • Embryology • Epidemiology • … How to represent a text document? slide by Dan Jurafsky 4

Last time… Bag of words model Typical additional assumption: Position in ¡d�c�men� ¡d�e�n�� ¡ma��e� : P(X i =x i |Y=y) = P(X k =x i |Y=y) – “Bag ¡of ¡words” ¡model ¡– order of words on the page ignored The document is just a bag of words: i.i.d. words – Sounds really silly, but often works very well! ) K( 50000-1) parameters to estimate The probability of a document with words x 1 ,x 2 ,… ¡ slide by Barnabás Póczos & Aarti Singh 27 5

Last time… What if features are continuous? e.g., character recognition: X i is intensity at i th pixel ecognition: i is intensity a Gaussian Naïve Bayes (GNB): � Naïve Bayes (GNB): • Gaussian Naïve Bayes (GNB): � • � • tinuous mean and variance for each class k and each pixel i Di ff erent mean and variance for each class k and each pixel i. � • � Sometimes assume variance • slide by Barnabás Póczos & Aarti Singh � ates: • � • • is independent of Y (i.e., σ i ), � • � • • or independent of X i (i.e., σ k ) • or both (i.e., σ ) 6

Logistic Regression 7

      Recap: Naïve Bayes • NB Assumption:   :% • NB Classifier:   • Assume parametric form for P(X i |Y) and P(Y) - Estimate parameters using MLE/MAP and plug in slide by Aarti Singh & Barnabás Póczos 8

      Gaussian Naïve Bayes (GNB) • There are several distributions that can lead to a linear boundary. • As an example, consider Gaussian Naïve Bayes:   Gaussian class conditional densities Gaussian class conditional densities slide by Aarti Singh & Barnabás Póczos • What if we assume variance is independent of class, i.e. .e.%%%%%%%%%%%?% 9

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 slide by Aarti Singh & Barnabás Póczos 10

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos 11

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos { { Constant term First-order term 12

Gaussian Naive Bayes (GNB) Decision(Boundary( Decision Boundary = ( x 1 , x 2 ) X slide by Aarti Singh & Barnabás Póczos = P ( Y = 0) P 1 = P ( Y = 1) P 2 p ( X | Y = 0) ∼ N ( M 1 , Σ 1 ) p 1 ( X ) = p 2 ( X ) = p ( X | Y = 1) ∼ N ( M 2 , Σ 2 ) 13

Generative vs. Discriminative Classifiers • Generative classifiers (e.g. Naïve Bayes) - Assume some functional form for P(X,Y) (or P(X|Y) and P(Y)) - Estimate parameters of P(X|Y), P(Y) directly from training data • But arg max_Y P(X|Y) P(Y) = arg max_Y P(Y|X) • Why not learn P(Y|X) directly? Or better yet, why not learn the decision boundary directly? • Discriminative classifiers (e.g. Logistic Regression) - Assume some functional form for P(Y|X) or for the decision boundary slide by Aarti Singh & Barnabás Póczos - Estimate parameters of P(Y|X) directly from training data 14

Logistic Regression Assumes%the%following%func$onal%form%for%P(Y|X):% Assumes the following functional form for P(Y ∣ X): Logis$c%func$on%applied%to%a%linear% Logistic function applied to linear func$on%of%the%data% function of the data logit%(z)% Logis&c( Logistic func&on( function slide by Aarti Singh & Barnabás Póczos (or Sigmoid): (or(Sigmoid):( z% Features(can(be(discrete(or(con&nuous!( Features can be discrete or continuous! 8% 15

Logistic Regression is a Linear Classifier! Assumes%the%following%func$onal%form%for%P(Y|X):% Assumes the following functional form for P(Y ∣ X): % % % % Decision boundary: Decision%boundary:% 1% slide by Aarti Singh & Barnabás Póczos 1% (Linear Decision Boundary) (Linear Decision Boundary) 9% 16

Logistic Regression is a Linear Classifier! Assumes%the%following%func$onal%form%for%P(Y|X) Assumes the following functional form for P(Y ∣ X): % % % % slide by Aarti Singh & Barnabás Póczos 1% 1% 1% 17

Logistic Regression for more than 2 classes • Logis$c%regression%in%more%general%case,%where%% Logistic regression in more general case, where Y% {y 1 ,…,y K }% ∈ %for% k<K% for k<K % % % %for% k=K% (normaliza$on,%so%no%weights%for%this%class)% for k=K (normalization, so no weights for this class) slide by Aarti Singh & Barnabás Póczos % % 18

Training Logistic Regression We’ll%focus%on%binary%classifica$on:% We’ll focus on binary classification: % How(to(learn(the(parameters(w 0 ,(w 1 ,(…(w d ?( How to learn the parameters w 0 , w 1 , …, w d ? Training Data Training%Data% Maximum%Likelihood%Es$mates% Maximum Likelihood Estimates % slide by Aarti Singh & Barnabás Póczos But there is a problem … Don’t have a model for P(X) or P(X|Y) – only for P(Y|X) 12% 19

Training Logistic Regression We’ll%focus%on%binary%classifica$on:% We’ll focus on binary classification: % How(to(learn(the(parameters(w 0 ,(w 1 ,(…(w d ?( How to learn the parameters w 0 , w 1 , …, w d ? Training Data Training%Data% Maximum%Likelihood%Es$mates% Maximum Likelihood Estimates % slide by Aarti Singh & Barnabás Póczos But there is a problem … But there is a problem… But there is a problem … Don’t have a model for P(X) or P(X|Y) — only for P(Y|X) Don’t have a model for P(X) or P(X|Y) – only for P(Y|X) Don’t have a model for P(X) or P(X|Y) — only for P(Y|X) 12% 20

Training Logistic Regression How(to(learn(the(parameters(w 0 ,(w 1 ,(…(w d ?( How to learn the parameters w 0 , w 1 , …, w d ? Training Data Training%Data% Maximum%(Condi$onal)%Likelihood%Es$mates% Maximum (Conditional) Likelihood Estimates % % % slide by Aarti Singh & Barnabás Póczos Discriminative philosophy — Don’t waste e ff ort learning P(X),   Discrimina$ve%philosophy%–%Don’t%waste%effort%learning%P(X),% focus on P(Y|X) — that’s all that matters for classification! focus%on%P(Y|X)%–%that’s%all%that%maders%for%classifica$on!% 21

Expressing Conditional log Likelihood Y l ln P ( Y l = 1 | X l , W )+( 1 � Y l ) ln P ( Y l = 0 | X l , W ) l ( W ) = ∑ l 1 P ( Y = 0 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) exp ( w 0 + ∑ n i = 1 w i X i ) P ( Y = 1 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) slide by Aarti Singh & Barnabás Póczos we can reexpress the log of the conditional likelihood Y can take only values 0 or 1, so only one of the two terms in the expression will be non-zero for any given Y l 22

Expressing Conditional log Likelihood Y l ln P ( Y l = 1 | X l , W )+( 1 � Y l ) ln P ( Y l = 0 | X l , W ) = ∑ l ( W ) l slide by Aarti Singh & Barnabás Póczos 23

Expressing Conditional log Likelihood 1 P ( Y = 0 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) exp ( w 0 + ∑ n i = 1 w i X i ) P ( Y = 1 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) we can reexpress the log of the conditional likelihood Y l ln P ( Y l = 1 | X l , W )+( 1 � Y l ) ln P ( Y l = 0 | X l , W ) = ∑ l ( W ) l Y l ln P ( Y l = 1 | X l , W ) P ( Y l = 0 | X l , W ) + ln P ( Y l = 0 | X l , W ) = ∑ l n n slide by Aarti Singh & Barnabás Póczos 24

Expressing Conditional log Likelihood 1 P ( Y = 0 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) exp ( w 0 + ∑ n i = 1 w i X i ) P ( Y = 1 | X ) = 1 + exp ( w 0 + ∑ n i = 1 w i X i ) we can reexpress the log of the conditional likelihood Y l ln P ( Y l = 1 | X l , W )+( 1 � Y l ) ln P ( Y l = 0 | X l , W ) = ∑ l ( W ) l Y l ln P ( Y l = 1 | X l , W ) P ( Y l = 0 | X l , W ) + ln P ( Y l = 0 | X l , W ) = ∑ l n n = ∑ Y l ( w 0 + w i X l w i X l ∑ ∑ i ) � ln ( 1 + exp ( w 0 + i )) slide by Aarti Singh & Barnabás Póczos i i l 25

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic - PowerPoint PPT Presentation

Illustration: Theodore Modis BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs. Generative Classification Aykut Erdem // Hacettepe University // Fall 2019 Last time Nave Bayes Classifier Given

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

AN OPPORTUNITY FOR GERIATRICS Alessandro Morandi, MD, MPH Department of Rehabilitation,

BRITISH DENTAL ASSOCIATION LIBRARY October December 2012 BDA Library Packages BDA Library

+'%"/0'3.;J4K 1/".(234/5'3%(!(0632%( *

Science & Technology Librarian University of Hawaii at Manoa Library Topics Searching

DeKalb County Strategic Economic Development Plan Town Hall #3 December 12, 2018 1 Agenda

Clinical Trials 101 Gynecologic Cancer Intergroup Cervical Cancer Research Network Bucharest,

FINDING THE EVIDENCE Providence Health Care Research Challenge Research Skills and Resources H E

CIS 530: Computational Linguistics MONDAYS AND WEDNESDAYS 1:30-3PM 3401 WALNUT, ROOM 401B

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic - PowerPoint PPT Presentation

Illustration: Theodore Modis BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs. Generative Classification Aykut Erdem // Hacettepe University // Fall 2019 Last time Nave Bayes Classifier Given

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

AN OPPORTUNITY FOR GERIATRICS Alessandro Morandi, MD, MPH Department of Rehabilitation,

BRITISH DENTAL ASSOCIATION LIBRARY October December 2012 BDA Library Packages BDA Library

+'%&quot;/0'3.;*J4K * 1/&quot;.(23*4/5'3%(*!(0632%( *

Science &amp; Technology Librarian University of Hawaii at Manoa Library Topics Searching

DeKalb County Strategic Economic Development Plan Town Hall #3 December 12, 2018 1 Agenda

Clinical Trials 101 Gynecologic Cancer Intergroup Cervical Cancer Research Network Bucharest,

FINDING THE EVIDENCE Providence Health Care Research Challenge Research Skills and Resources H E

CIS 530: Computational Linguistics MONDAYS AND WEDNESDAYS 1:30-3PM 3401 WALNUT, ROOM 401B

+'%"/0'3.;J4K 1/".(234/5'3%(!(0632%( *

Science & Technology Librarian University of Hawaii at Manoa Library Topics Searching