PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE - PowerPoint PPT Presentation

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION LOGISTIC REGRESSION Matthieu R Bloch Tuesday, January 28, 2020 1

LOGISTICS LOGISTICS TAs and Office hours Monday: Mehrdad (TSRB 523a) - 2pm-3:15pm Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Homework 1 posted on Canvas Due Wednesday January 29, 2020 (11:59PM EST) (Wednesday February 5, 2020 for DL) Lecture notes updated Versions 1.1 posted for lectures 1,3,4,5 (small typos) Logistics for homework submission Upload separate PDF file Put problems in order Show your work (“Similar to above, etc.” does not show work) Include listing of code (example on overleaf) 2

RECAP: NAIVE BAYES RECAP: NAIVE BAYES Consider (random) feature vector and the label ⊺ R d x = [ x 1 , ⋯ , x d ] ∈ y Naive asssumption: Given , the features of are independent , i.e., ∏ d { x i } d y x P x | y = i =1 P x i | y i =1 Main benefit: only need univariate densities , combine discrete/continous features P x i | y Procedure Estimate a priori class probabilities: for π k 0 ≤ k ≤ K − 1 Esimate class conditional densities for and p ( x | k ) 1 ≤ i ≤ d 0 ≤ k ≤ K − 1 x i | y Lemma. The maximum likelihood estimate of is where N k π k π = N k ≜ |{ y i : y i = k }| ^ k N What about ? P x | y Continuous features: o�en Gaussian, use ML estimate Discrete (categorical) features: o�en Multinomial, use ML estimate 3

NAIVE BAYES (CT’D) NAIVE BAYES (CT’D) Assume th feature takes distinct values j x j J {0, … , J − 1} Lemma. N ( j ) The maximum likelihood estimate of is where ℓ, k ˆ P (ℓ| k ) P (ℓ| k ) = x j | y x j | y N k N ( j ) ≜ |{ x : y = k and x j = ℓ}| ℓ, k The naive bayes estimator is ^ k ∏ d h NB ˆ ( x ) = argmax k π j =1 P ( x j | k ) x j | y Naive Bayes can be completely wrong! Example bivariate Gaussian case 4

NAIVE BAYES AND BAG OF WORDS NAIVE BAYES AND BAG OF WORDS Classification of documents into categories (politics, sports, etc.) Document as vector with the number of occurences of word in document ⊺ x = [ x 1 , ⋯ , x d ] x j j Model documents of length and assume words are distributed among the words n d independently at random (multinonial distribution) Estimate parameters Compute the ML estimate of the document classes N k π = ^ k Compute the ML estimate of the probability that word occurs in class across all documents: N j k ∑ ℓ N ( j ) ℓ ℓ, k μ = ^ j , k j =1 ∑ ℓ N ( j ) ∑ d ℓ ℓ, k Run classifier: ^ NB x j ^ k ∏ d h = argmax k π j =1 ( μ ) ^ j , k Weakness of approach: some words may not show up at training but show up at testing Use Laplace smoothing ∑ ℓ N ( j ) 1 + ℓ ℓ, k μ ^ j , k = j =1 ∑ ℓ N ( j ) ∑ d d + ℓ ℓ, k

LINEAR DISCRIMINANT ANALYSIS (LDA) LINEAR DISCRIMINANT ANALYSIS (LDA) Consider (random) feature vector and the label ⊺ R d x = [ x 1 , ⋯ , x d ] ∈ y Assumption: Given , the feature vector have a Gaussian distribution y P x | y ∼ N ( μ k , Σ) The mean is class dependent but the covariance matrix is not 1 1 ) ⊺ Σ −1 ϕ ( x ; μ , Σ) ≜ exp ( − ( x − μ ( x − μ ) ) 1 d 2 2 2 (2 π ) |Σ | Estimate parameters from data N k π = ^ k N 1 μ = N k ∑ i : x i ^ k y i = k (recall assumption about covariance matrix) N ∑ K −1 1 ^ k ) ⊺ ^ Σ = k =0 ∑ i : ( x i − μ )( x i − μ ^ k = k y i Lemma. The LDA classifier is 1 ^ −1 h LDA ^ k ) ⊺ Σ ( x ) = argmin ( ( x − μ ( x − μ ) − log π ) ^ k ^ k 2 k For , the LDA is a linear classifier K = 2

   11

LINEAR DISCRIMINANT ANALYSIS (CT’D) LINEAR DISCRIMINANT ANALYSIS (CT’D) Generative model rarely accurate Number of parameters to estimate: class priors, means, elements of covariance matrix 1 K − 1 Kd d ( d + 1) Works well if 2 N ≫ d Works poorly if without other tricks (dimensionality reduction, structured covariance) N ≪ d Biggest concern: “one should solve the [classification] problem directly and never solve a more general problem as an intermediate step [such as modeling p(xly)].” , Vapnik, 1998 Revisit binary classifier with LDA π 1 ϕ ( x ; μ 1 , Σ) 1 η 1 ( x ) = = w ⊺ π 1 ϕ ( x ; μ 1 , Σ) + π 0 ϕ ( x ; μ 0 , Σ) 1 + exp(−( x + b )) We no not need to estimate the full joint distribution! 8

LOGISTIC REGRESSION LOGISTIC REGRESSION Assume that is of the form 1 η ( x ) w ⊺ 1+exp(−( x + b )) Estimate and from the data directly ^ w ^ b Plugin the result to obtain 1 η ^ ( x ) = ^ ⊺ ^ 1+exp(−( x + )) w b The function is called the logistic function 1 x ↦ 1+ e − x The binary logistic classifier is ( linear ) 1 ^ ⊺ h LC ^ ( x ) = 1 { ( x ) ≥ η } = 1 { w x + b ≥ 0} ^ 2 How do we estimate and ? ^ w b ^ From LDA analysis: , ^ −1 μ ^ −1 μ ^ −1 μ ^ ⊺ ^ ⊺ π ^ 1 1 1 w = Σ ( − μ ) b = 2 μ 0 Σ − 2 μ 1 Σ + log ^ ^ 1 ^ 0 ^ 0 ^ 1 π ^ 0 Direct estimation of from maximum likelihood ( w , b ) ^ 9

MLE FOR LOGISTIC REGRESSION MLE FOR LOGISTIC REGRESSION We have a parametric density model for p θ ( y | x ) = η ^ ( x ) Standard trick: and x ⊺ ] ⊺ θ = [ b w ⊺ ] ⊺ ~ x = [1, This allows us to lump the offset and write 1 η ( x ) = θ ⊺ x ~ 1 + exp(− ) Given our dataset the likelihood is ∏ N ~ i y i } N ~ i {( x , ) L ( θ ) ≜ i =1 P θ y i x ( | ) i =1 For with we obtain K = 2 Y = {0, 1} N ~ i ) y i ~ i ) 1− y i L ( θ ) ≜ ∏ η ( x (1 − η ( x ) i =1 N ~ i ~ i ℓ( θ ) = ∑ ( y i log η ( x ) + (1 − y i ) log(1 − η ( x ))) i =1 N e θ ⊺ x ~ i y i θ ⊺ x ~ i ℓ( θ ) = ∑ ( − log(1 + ) ) i =1 10

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE - PowerPoint PPT Presentation

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION LOGISTIC REGRESSION Matthieu R Bloch Tuesday, January 28, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Monday: Mehrdad (TSRB 523a) -

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

Plugin frameworks About me About this talk Plugin 3 approaches to designing plugin APIs

Protg Plugin Development Ray Fergerson Overview Part I What is a Plugin? How

Naive Bayes Classication Naive Bayes Classication In [1]: % matplotlib inline from

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

The Protg Plugin Architecture Timothy Redmond Tania Tudorache, Jennifer Vendetti Overview

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

SALT LAKE LEGAL DEFENDER (LDA) AND SOCIAL SERVICES Who we are, what we do, court system and how LDA

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Outline Naive Credal Classifier 2: an extension of Naive Bayes Introducing NCC2 1 for

MBS FileMaker Plugin Christian Schmitz Monkeybread Software MBS FileMaker Plugin 5700 functions

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

= = = 2 Further Var( X ) Var( ) Y a a a X =

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

Nested logit models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

The Kalman Filter An Algorithm for Dealing with Uncertainty Steven Janke May 2011 Steven Janke

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Sambuz

Useful Links

Newsletter

Mail Us