CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of - - PowerPoint PPT Presentation

β–Ά
cs480 680 lecture 7 may 29 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of - - PowerPoint PPT Presentation

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Linear Models Probabilistic Generative Models Regression


slide-1
SLIDE 1

CS480/680 Lecture 7: May 29, 2019

Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

Linear Models

  • Probabilistic Generative Models

Regression Classification

CS480/680 Spring 2019 Pascal Poupart 2 University of Waterloo

slide-3
SLIDE 3

Probabilistic Generative Model

  • Pr(𝐷): prior probability of class 𝐷
  • Pr π’š 𝐷 : class conditional distribution of π’š
  • Classification: compute posterior Pr(𝐷|π’š) according

to Bayes’ theorem

Pr 𝐷 π’š = Pr π’š 𝐷 Pr 𝐷 βˆ‘! Pr π’š 𝐷 Pr 𝐷 = 𝑙𝑄𝑠 π’š 𝐷 Pr(𝐷)

CS480/680 Spring 2019 Pascal Poupart 3 University of Waterloo

slide-4
SLIDE 4

Assumptions

  • In classification, the number of classes is finite, so a

natural prior Pr(𝐷) is the multinomial Pr 𝐷 = 𝑑! = 𝜌!

  • When π’š ∈ β„œ", then it is often OK to assume that

Pr(π’š|𝐷) is Gaussian.

  • Furthermore, assume that the same covariance

matrix 𝚻 is used for each class. Pr π’š 𝑑! ∝ 𝑓#$

% π’š#𝝂𝒍 "𝚻#$ π’š#𝝂𝒍

CS480/680 Spring 2019 Pascal Poupart 4 University of Waterloo

slide-5
SLIDE 5

Posterior Distribution

Pr 𝑑! π’š =

""##$

% π’š#𝝂𝒍 )𝚻#$ π’š#𝝂𝒍

βˆ‘" ""##$

% π’š#𝝂𝒍 )𝚻#$ π’š#𝝂𝒍

=

""##$

% π’š)𝚻#$π’š#%𝝂" )𝚻#$+,𝝂" )𝚻#$𝝂"

βˆ‘" ""##$

% π’š)𝚻#$π’š#%𝝂" )𝚻#$π’š,𝝂" )𝚻#$𝒗"

Consider two classes 𝑑! and 𝑑

%

=

& &'

./0𝝂/ )𝚻#$π’š#$ %𝝂/ )𝚻#$𝝂/ ."0𝝂" )𝚻#$π’š#$ %𝝂" )𝚻#$𝝂"

CS480/680 Spring 2019 Pascal Poupart 5 University of Waterloo

slide-6
SLIDE 6

Posterior Distribution

=

$ $)*

# 𝝂& "#𝝂' " 𝚻#$π’š*$ +𝝂& "𝚻#$𝝂&#$ +𝝂' "𝚻#$𝝂'#,-.& .'

=

$ $)*# 𝒙"π’š*01

where 𝒙 = 𝚻#$(𝝂! βˆ’ 𝝂+) and π‘₯, = βˆ’

$ % 𝝂!

  • 𝚻#$𝝂! +

$ % 𝝂+

  • 𝚻#$𝝂+ + ln

.& .'

CS480/680 Spring 2019 Pascal Poupart 6 University of Waterloo

slide-7
SLIDE 7

Logistic Sigmoid

  • Let 𝜏 𝑏 =

$ $)*#2

  • Then Pr 𝑑! π’š = 𝜏(𝒙-π’š + π‘₯,)
  • Picture:

Logistic sigmoid

CS480/680 Spring 2019 Pascal Poupart 7 University of Waterloo

slide-8
SLIDE 8

Logistic Sigmoid

class conditionals posterior

CS480/680 Spring 2019 Pascal Poupart 8 University of Waterloo

slide-9
SLIDE 9

Prediction

𝑐𝑓𝑑𝑒 π‘‘π‘šπ‘π‘‘π‘‘ = 𝑏𝑠𝑕𝑛𝑏𝑦! Pr 𝑑! π’š = A𝑑$ 𝜏 𝒙-π’š + π‘₯, β‰₯ 0.5 𝑑%

  • therwise

Class boundary: 𝜏 𝒙!

  • M

π’š = 0.5 ⟹

$ $)*# 𝒙&

"3 π’š = 0.5

⟹ 𝒙!

  • M

π’š = 0 ∴ linear separator

CS480/680 Spring 2019 Pascal Poupart 9 University of Waterloo

slide-10
SLIDE 10

Multi-class Problems

  • Consider Gaussian conditional distributions with identical Ξ£

Pr 𝑑1 π’š =

23 4! 23 π’š 𝑑1 βˆ‘" 23 4" 23 π’š 𝑑 6

=

7!8#$

% π’š#𝝂! (𝚻#$ π’š#𝝂𝒍

βˆ‘" 7"8#$

% π’š#𝝂" ( 𝚻#$ π’š#𝝂"

=

7!8#$

% #%𝝂! (𝚻#$π’š+𝝂! (𝚻#$𝝂!

βˆ‘" 7"8#$

% #%𝝂" (𝚻#$π’š+𝝂" (𝚻#$𝝂"

=

8𝝂!

(𝚻#$π’š#$ %𝝂! (𝚻#$𝒗!+-. /!

βˆ‘" 8

𝝂" (𝚻#$π’š#$ %𝝂" (𝚻#$𝝂"+-. /" =

8𝒙!

(1 π’š

βˆ‘" 8

𝒙" (1 π’š ⟹ softmax

where 𝒙2

3 = (βˆ’ 4 5 𝝂2 3𝚻64𝝂2 + ln 𝜌2 , 𝝂2 3𝚻64)

CS480/680 Spring 2019 Pascal Poupart 10 University of Waterloo

slide-11
SLIDE 11

Softmax

  • When there are several classes, the posterior is a softmax

(generalization of the sigmoid)

  • Softmax distribution: Pr 𝑑1 π’š =

8!" π’š βˆ‘$ 8!$ π’š

  • Argmax distribution:

Pr 𝑑1 π’š = -1 if 𝑙 = 𝑏𝑠𝑕𝑛𝑏𝑦6 𝑔

6(𝑦)

  • therwise

= lim

9:;8β†’= 9:;8!" % βˆ‘$ 9:;8!$ %

β‰ˆ

8!" % βˆ‘$ 8!$(%)

(softmax approximation)

CS480/680 Spring 2019 Pascal Poupart 11 University of Waterloo

slide-12
SLIDE 12

Softmax

class conditionals posterior

CS480/680 Spring 2019 Pascal Poupart 12 University of Waterloo

slide-13
SLIDE 13

Parameter Estimation

  • Where do Pr(𝑑!) and Pr(π’š|𝑑!) come from?
  • Parameters: 𝜌, π‚πŸ, π‚πŸ‘, 𝚻

Pr 𝑑! = 𝜌, Pr π’š 𝑑! = π‘™πš» 𝑓#!

" π’š#π‚πŸ $𝚻%! π’š#𝝂!

Pr 𝑑& = 1 βˆ’ 𝜌, Pr π’š 𝑑& = π‘™πš» 𝑓#!

" π’š#π‚πŸ‘ $𝚻%! π’š#𝝂"

where π‘™πš» is the normalization constant that depends on 𝚻

  • Estimate parameters by

– Maximum likelihood – Maximum a posteriori – Bayesian learning

CS480/680 Spring 2019 Pascal Poupart 13 University of Waterloo

slide-14
SLIDE 14

Maximum Likelihood Solution

  • Likelihood:

L 𝐘, 𝐳 = Pr 𝒀, 𝒛 𝜌, 𝝂', 𝝂(, 𝚻 = 6

)

πœŒπ‘‚ π’š) 𝝂', 𝚻

*!

1 βˆ’ 𝜌 𝑂 π’š) 𝝂(, 𝚻

'+*!

  • ML hypothesis:

< πœŒβˆ—, 𝝂'

βˆ—, 𝝂( βˆ—, πš»βˆ— > =

𝑏𝑠𝑕𝑛𝑏𝑦-,𝝂",𝝂",𝚻 βˆ‘) 𝑧) ln 𝜌 + ln π‘™πš» βˆ’ '

( π’š) βˆ’ 𝝂' 1𝚻+' π’š) βˆ’ 𝝂'

+ 1 βˆ’ 𝑧) ln 1 βˆ’ 𝜌 + ln π‘™πš» βˆ’ '

( π’š) βˆ’ 𝝂( 1𝚻+' π’š) βˆ’ 𝝂(

𝑧> ∈ {0,1}

CS480/680 Spring 2019 Pascal Poupart 14 University of Waterloo

slide-15
SLIDE 15

Maximum Likelihood Solution

  • Set derivative to 0

0 = G HI J 𝒀,𝒛

G"

⟹ 0 = βˆ‘K 𝑧K

& " + 1 βˆ’ 𝑧K

βˆ’

& &L"

⟹ 0 = βˆ‘K 𝑧K 1 βˆ’ 𝜌 + (1 βˆ’ 𝑧K)(βˆ’πœŒ) ⟹ βˆ‘K 𝑧K = 𝜌 βˆ‘K 𝑧K + βˆ‘K 1 βˆ’ 𝑧K ⟹ βˆ‘K 𝑧K = πœŒπ‘‚ (where 𝑂 is the # of training points) ∴

βˆ‘? M? N

= 𝜌

CS480/680 Spring 2019 Pascal Poupart 15 University of Waterloo

slide-16
SLIDE 16

Maximum Likelihood Solution

0 = πœ– ln 𝑀 𝒀, 𝒛 /πœ–π‚! ⟹ 0 = βˆ‘" 𝑧"[βˆ’πš»#! π’š" βˆ’ 𝝂! ] ⟹ βˆ‘" 𝑧"π’š" = βˆ‘" 𝑧"𝝂! ⟹ βˆ‘" 𝑧"π’š" = 𝑂

!𝝂!

∴

βˆ‘( %(π’š( ')

= 𝝂! Similarly:

βˆ‘((!#%()π’š( '*

= 𝝂*

where 𝑂

$ is the # of data points in class 1

𝑂% is the # of data points in class 2

CS480/680 Spring 2019 Pascal Poupart 16 University of Waterloo

slide-17
SLIDE 17

Maximum Likelihood

+ ,- . 𝒀,𝒛 +𝚻

= 0 ⟹ β‹― ⟹ Ξ£ =

') ' 𝑻! + '* ' 𝑻*

where 𝑻! =

! ') βˆ‘"∈4) π’š" βˆ’ 𝝂!

π’š" βˆ’ 𝝂! 5 𝑻* =

! '* βˆ‘"∈4* π’š" βˆ’ 𝝂*

π’š" βˆ’ 𝝂* 5 (𝑻6 is the empirical covariance matrix of class 𝑙)

CS480/680 Spring 2019 Pascal Poupart 17 University of Waterloo