Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression - PowerPoint PPT Presentation

Lecture 9: − Naïve Bayes Classifier (cont’d.) − Logistic Regression − Discriminative vs. Generative Classification − Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University

Last time… Naïve Bayes Classifier – Given : – ,… – Class prior P(Y) – d conditionally independent features X 1 ,… X d given the – class label Y – For each X i feature, we have the conditional likelihood P(X i |Y) Naïve Bayes Decision rule: slide by Barnabás Póczos & Aarti Singh 2

Last time… Naïve Bayes Algorithm for discrete features discrete features We need to estimate these probabilities! Estimators For Class Prior For Likelihood slide by Barnabás Póczos & Aarti Singh NB Prediction for test data: 19 3

Last time… Text Classification MEDLINE Article MeSH Subject Category   Hierarchy • Antogonists and Inhibitors • Blood Supply ? • Chemistry • Drug Therapy • Embryology • Epidemiology • … slide by Dan Jurafsky 4

Last time… Bag of words model Typical additional assumption: Position in ¡document ¡doesn’t ¡matter : P(X i =x i |Y=y) = P(X k =x i |Y=y) – “Bag ¡of ¡words” ¡model ¡– order of words on the page ignored The document is just a bag of words: i.i.d. words – Sounds really silly, but often works very well! ) K( 50000-1) parameters to estimate The probability of a document with words x 1 ,x 2 ,… ¡ slide by Barnabás Póczos & Aarti Singh 27 5

The bag of words representation I love this movie! It's sweet, but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 6

The bag of words representation I love this movie! It's sweet , but with satirical humor. The γ ( )=c dialogue is great and the adventure scenes are fun … It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. slide by Dan Jurafsky 7

The bag of words representation: using a subset of words x love xxxxxxxxxxxxxxxx sweet xxxxxxx satirical xxxxxxxxxx γ ( )=c xxxxxxxxxxx great xxxxxxx xxxxxxxxxxxxxxxxxxx fun xxxx xxxxxxxxxxxxx whimsical xxxx romantic xxxx laughing xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx recommend xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x several xxxxxxxxxxxxxxxxx xxxxx happy xxxxxxxxx again xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx slide by Dan Jurafsky 8

The bag of words representation great 2 γ ( )=c love 2 recommend 1 laugh 1 happy 1 ... ... slide by Dan Jurafsky 9

Doc Words Class Training 1 Chinese Beijing Chinese c P ( c ) = N c ˆ 2 Chinese Chinese Shanghai c N 3 Chinese Macao c P ( w | c ) = count ( w , c ) + 1 4 Tokyo Japan Chinese j ˆ Test 5 Chinese Chinese Chinese Tokyo Japan ? count ( c ) + | V | Priors: 3 P ( c )= 4 Choosing a class: 1 P ( j )= P(c|d 5 ) 3/4 * (3/7) 3 * 1/14 * 1/14 4 ∝ ≈ 0.0003 Conditional Probabilities: (5+1) / (8+6) = 6/14 = 3/7 P(Chinese| c ) = (0+1) / (8+6) = 1/14 P(Tokyo| c ) = P(j|d 5 ) 1/4 * (2/9) 3 * 2/9 * 2/9 ∝ (0+1) / (8+6) = 1/14 P(Japan| c ) = ≈ 0.0001 (1+1) / (3+6) = 2/9 P(Chinese| j ) = slide by Dan Jurafsky P(Tokyo| j ) = (1+1) / (3+6) = 2/9 P(Japan| j ) = (1+1) / (3+6) = 2/9 10

Twenty news groups results slide by Barnabás Póczos & Aarti Singh Naïve Bayes: 89% accuracy 11

What if features are continuous? e.g., character recognition: X i is intensity at i th pixel ecognition: i is intensity a Gaussian Naïve Bayes (GNB): � Naïve Bayes (GNB): • Gaussian Naïve Bayes (GNB): � • � • mean and variance for each class k and each pixel i Di ff erent mean and variance for each class k and each pixel i. � • � Sometimes assume variance • slide by Barnabás Póczos & Aarti Singh � • � • • is independent of Y (i.e., σ i ), � • � • • or independent of X i (i.e., σ k ) • or both (i.e., σ ) 12

Estimating parameters:   Y discrete, X i continuous tinuous Y discrete, X i continuou ates: slide by Barnabás Póczos & Aarti Singh 13

Estimating parameters:   Y discrete, X i continuous tinuous Maximum likelihood estimates: ates: k th class j th training image i th pixel in j th training image slide by Barnabás Póczos & Aarti Singh 14

Case Study:   Classifying Mental States 15

Example: GNB for classifying mental states ~1 mm resolution ~2 images per sec. 15,000 voxels/image slide by Barnabás Póczos & Aarti Singh non-invasive, safe measures Blood Oxygen   Level Dependent (BOLD)   response [Mitchell et al.] 16

� � � � � � � � Brain scans can � � track activation � � with precision and � sensitivity slide by Barnabás Póczos & Aarti Singh � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Learned Naïve Bayes Models   – – Means for P(BrainActivity | WordCategory) Pairwise classification accuracy:   [Mitchell et al.] 78-99%, 12 participants Tool words Building Building Tool words words slide by Barnabás Póczos & Aarti Singh 18

What you should know… Naïve Bayes classifier • What’s the assumption • Why we use it • How do we learn it • Why is Bayesian (MAP) estimation important   Text classification • Bag of words model Gaussian NB slide by Barnabás Póczos & Aarti Singh • Features are still conditionally independent • Each feature has a Gaussian distribution given class 19

Logistic Regression 20

      Last time… Naïve Bayes • NB Assumption:   :% • NB Classifier:   • Assume parametric form for P(X i |Y) and P(Y) - Estimate parameters using MLE/MAP and plug in slide by Aarti Singh & Barnabás Póczos 21

      Gaussian Naïve Bayes (GNB) • There are several distributions that can lead to a linear boundary. • As an example, consider Gaussian Naïve Bayes:   Gaussian class conditional densities Gaussian class conditional densities slide by Aarti Singh & Barnabás Póczos • What if we assume variance is independent of class, i.e. .e.%%%%%%%%%%%?% 22

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 slide by Aarti Singh & Barnabás Póczos 23

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos 24

GNB with equal variance is a Linear Classifier! fier!( Decision(boundary:( Decision boundary: d d Y Y P ( X i | Y = 0) P ( Y = 0) = P ( X i | Y = 1) P ( Y = 1) i =1 i =1 d log P ( Y = 0) Q d i =1 P ( X i | Y = 0) = log 1 − π log P ( X i | Y = 0) X + P ( Y = 1) Q d P ( X i | Y = 1) i =1 P ( X i | Y = 1) π i =1 slide by Aarti Singh & Barnabás Póczos { { Constant term First-order term 25

Gaussian Naive Bayes (GNB) Decision(Boundary( Decision Boundary = ( x 1 , x 2 ) X slide by Aarti Singh & Barnabás Póczos = P ( Y = 0) P 1 = P ( Y = 1) P 2 p 1 ( X ) = p ( X | Y = 0) ∼ N ( M 1 , Σ 1 ) p ( X | Y = 1) ∼ N ( M 2 , Σ 2 ) p 2 ( X ) = 26

Generative vs. Discriminative Classifiers • Generative classifiers (e.g. Naïve Bayes) - Assume some functional form for P(X,Y) (or P(X|Y) and P(Y)) - Estimate parameters of P(X|Y), P(Y) directly from training data • But arg max_Y P(X|Y) P(Y) = arg max_Y P(Y|X) • Why not learn P(Y|X) directly? Or better yet, why not learn the decision boundary directly? • Discriminative classifiers (e.g. Logistic Regression) - Assume some functional form for P(Y|X) or for the decision boundary slide by Aarti Singh & Barnabás Póczos - Estimate parameters of P(Y|X) directly from training data 27

Logistic Regression Assumes%the%following%func$onal%form%for%P(Y|X):% Assumes the following functional form for P(Y ∣ X): Logis$c%func$on%applied%to%a%linear% Logistic function applied to linear func$on%of%the%data% function of the data logit%(z)% Logis&c( Logistic   func&on( function   slide by Aarti Singh & Barnabás Póczos (or(Sigmoid):( (or Sigmoid): z% Features(can(be(discrete(or(con&nuous!( Features can be discrete or continuous! 8% 28

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression - PowerPoint PPT Presentation

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs. Generative Classification Linear Discriminant Functions Aykut Erdem October 2016 Hacettepe University Last time Nave Bayes Classifier

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Inverse problems with L 1 data fitting Christian Clason, Bangti JIN, Karl Kunisch Institute for

Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology

Point sets, Maps and Navigation - III D.A. Forsyth Localization We can now robustly register

Point sets, Maps and Navigation - II D.A. Forsyth Robustness is a serious problem Robustness is

Reference Classes Lee Edlefsen, Ph.D. Chief Scientist Sue Ranney, Ph.D. Chief Data Scientist

Fast algorithms for nonconvex compressive sensing Rick Chartrand Los Alamos National Laboratory

COMP24111: Machine Learning and Optimisation Chapter 3: Logistic Regression Dr. Tingting Mu

GradNet: Unsupervised Deep Screened Poisson Reconstruction for GradientDomain Rendering Jie Guo