Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of - PowerPoint PPT Presentation

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of CSA, IISc chibha@chalmers.se January 17, 2012

Agenda Introduction to classification Bayes Classifier

Who is the person? Images of one person

Who is the person? Images of one person Is he the same person?

Who is the person? Images of one person Is he the same person? easy

Who is the person? Images of one person Is he the same person?

Who is the person? Images of one person Is he the same person? not so easy

Who is the person? Images of one person Is he the same person? not so easy But who is he? ALFRED NOBEL

Introduction to Classification Lots of scope for improvement.

The classification problem setup Alfred Nobel Bertha Von Suttner Objective From these images create a function, classifier, which can automatically recognize images of Nobel and Suttner

The steps Step 1 Create representation from the Image, sometimes called a feature map. Step 2 From a training set and a feature map create a classifier Step 3 Evaluate the goodness of the classifier We will be concerned about Step 2 and Step 3.

The classification problem setup Let ( X , Y ) ∼ P where P is a Distribution and D m = { ( X i , Y i ) | i . i . d X i , Y i ∼ P , i = 1 ,..., m } is a random sample Probability of misclassification R ( f ) = P ( f ( X ) � = Y )

Finding the best classifier Suppose P ( Y = y | X = x ) was high then it is very likely that that x has the label y . Define η ( x ) = P ( Y = 1 | X = x ) , posterior probability computed from Bayes rule from Class-conditional densities P ( X = x | Y = y ) For 2 classes, f ∗ ( x ) = sign ( 2 η ( x ) − 1 ) is the Bayes classifier.

Finding the best classifier Objective should be to choose f such that min f R ( f ) Theorem Let f be any other classifier and f ∗ be Bayes Classifier R ( f ) ≥ R ( f ∗ ) A very important result Bayes Classifier has the least error rate. R ( f ∗ ) is called the Bayes error-rate.

Review Maximum Likelihood estimation Try to construct Bayes Classifier

Naive Bayes Classifier Assume that the features are independent works well for many problems, specially on text classification

Spam Emails

Naive Bayes Classifier: Bernoulli model Create a feature list where each feature is on/off. Denote the feature map x = [ f 1 ,..., f d ] ⊤ P ( X = x | Y = y ) = ∏ d i = 1 P ( F i = f i | Y = y ) p 1 i = P ( F i = 1 | Y = 1 ) p 2 i = P ( F i = 1 | Y = 2 ) Bayes Classifier: Output the class with the higher score score 1 ( x ) = ∑ ( f i logp 1 i +( 1 − f i ) log ( 1 − p 1 i )) i similarly score 2 ( x )

Naive Bayes: Bernoulli Source: Introduction to Information Retrieval. (Manning, Raghavan, Schutze) 13.3 The Bernoulli model 263 T RAIN B ERNOULLI NB ( C , D ) 1 V ← E XTRACT V OCABULARY ( D ) N ← C OUNT D OCS ( D ) 2 3 for each c ∈ C 4 do N c ← C OUNT D OCS I N C LASS ( D , c ) 5 prior [ c ] ← N c / N 6 for each t ∈ V do N ct ← C OUNT D OCS I N C LASS C ONTAINING T ERM ( D , c , t ) 7 8 condprob [ t ][ c ] ← ( N ct + 1 ) / ( N c + 2 ) 9 return V , prior , condprob A PPLY B ERNOULLI NB ( C , V , prior , condprob , d ) 1 V d ← E XTRACT T ERMS F ROM D OC ( V , d ) 2 for each c ∈ C 3 do score [ c ] ← log prior [ c ] 4 for each t ∈ V do if t ∈ V d 5 6 then score [ c ] += log condprob [ t ][ c ] 7 else score [ c ] += log ( 1 − condprob [ t ][ c ]) 8 return arg max c ∈ C score [ c ] � Figure 13.3 NB algorithm (Bernoulli model): Training and testing. The add-one smoothing in Line 8 (top) is in analogy to Equation (13.7) with B = 2.

Discriminant functions Bayes Classifier � � d ∑ h ( x ) = sign f i θ i − b i = 1 θ i = log p 1 i ( 1 − p 2 i ) ( 1 − p 1 i ) p 2 i h ( x ) is sometimes called Discriminant functions

Gaussian class conditional distributions Let the class conditional distributions be N ( µ 1 , Σ) and N ( µ 2 , Σ) . The Bayes classifier is given by h ( x ) = sign ( w ⊤ x − b ) w = Σ − 1 ( µ 1 − µ 2 )

Fisher Discriminant Source: Pattern Recognition and Machine Learning (Chris Bishop) 4 4 2 2 0 0 −2 −2 −2 2 6 −2 2 6

Fisher Discriminant Let ( µ 1 , Σ 1 ) be the mean and covariance of class 1 and ( µ 2 , Σ 2 ) be the mean and covariance of class 2. � 2 w ⊤ ( µ 1 − µ 2 ) � J ( w ) = max w w ⊤ Sw w = S − 1 ( µ 1 − µ 2 ) S = Σ 1 +Σ 2

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of - PowerPoint PPT Presentation

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of CSA, IISc chibha@chalmers.se January 17, 2012 Agenda Introduction to classification Bayes Classifier Who is the person? Images of one person Who is the person? Images of one

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Information Retrieval Models EARIA 2016 Eric Gaussier Univ. Grenoble Alpes - CNRS, INRIA - LIG

Language Models CE-324: Modern Information Retrieval Sharif University of Technology M.

A well-balanced scheme for the shallow-water equations with topography and Manning friction C.

Sub-topics Soil-water-Environment Interaction The Natural Environment The Man-made

Building a Better Astrophysics AMR Code with Charm++: Enzo-P/Cello (or more adventures in

devious plans darkness lurks in a hidden corner SMITE G O D S G R A N D R E V E R S A L

CS440/ECE448 Lecture 14: Nave Bayes Mark Hasegawa-Johnson, 2/2020 Including slides by

Discrete Ion Signature on Energy-Time Spectrogram Maxwellian Fitting for Velocity