Pattern Recognition Linear Models for Classification Extra Slides - - PowerPoint PPT Presentation

pattern recognition linear models for classification
SMART_READER_LITE
LIVE PREVIEW

Pattern Recognition Linear Models for Classification Extra Slides - - PowerPoint PPT Presentation

Pattern Recognition Linear Models for Classification Extra Slides Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 9 Maximum Likelihood Estimation: Coin Tossing Y = 1 if heads, Y = 0 if tails.


slide-1
SLIDE 1

Pattern Recognition Linear Models for Classification Extra Slides

Ad Feelders

Universiteit Utrecht

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 9

slide-2
SLIDE 2

Maximum Likelihood Estimation: Coin Tossing

Y = 1 if heads, Y = 0 if tails. β = Pr(Y = 1). In a sequence of 10 coin flips we observe y = (1, 0, 1, 1, 0, 1, 1, 1, 1, 0). The likelihood function is ℓ(β) = β · (1 − β) · β · β · (1 − β) · β · β · β · β ·(1 − β) = β7(1 − β)3 The corresponding log-likelihood function is ln ℓ(β) = ln(β7(1 − β)3) = 7 ln β + 3 ln(1 − β)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 2 / 9

slide-3
SLIDE 3

Computing the maximum

To determine the maximum we take the derivative and equate it to zero d ln ℓ(β) dβ = 7 β − 3 1 − β = 0 which yields maximum likelihood estimate ˆ β = 0.7. This is the relative frequency of heads in the sample. Show that in general ˆ β = n1 n , where n is the number of coin tosses, and n1 is the number of times heads comes up.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 3 / 9

slide-4
SLIDE 4

ML estimation for logistic regression

In the logistic regression model, the probability of “heads” is assumed to depend on x in the following way: Pr(Y = 1 | X = x) = eβ0+β1x 1 + eβ0+β1x (1) Pr(Y = 0 | X = x) = 1 1 + eβ0+β1x (2) Given observations Yi and Xi (i = 1, . . . , n), if Yi = 1 then (1) enters into the likelihood function, and if Yi = 0 then (2) enters into the likelihood function. There is no closed form solution for the maximum likelihood estimates of β0 and β1 in this case. Except for some pathological cases, the likelihood function is concave, so there is a unique global maximum.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 4 / 9

slide-5
SLIDE 5

Multinomial Logit in R

# load training data > optdigits.train <- read.csv("D:/Pattern Recognition/Datasets/optdigits-tra.txt", header=F) # convert class label to factor > optdigits.train[,65] <- as.factor(optdigits.train[,65]) # same for test data > optdigits.test <- read.csv("D:/Pattern Recognition/Datasets/optdigits-tes.txt", header=F) > optdigits.test[,65] <- as.factor(optdigits.test[,65])

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 5 / 9

slide-6
SLIDE 6

Multinomial Logit in R

# load nnet library > library(nnet) # fit multinomial logistic regression model # column 1 and 40 are not used (always 0) > optdigits.multinom <- multinom(V65 ∼ ., data =

  • ptdigits.train[,-c(1,40)], maxit = 1000)

# weights: 640 (567 variable) initial value 8802.782811 ... converged # predict class label on training data > optdigits.multinom.pred <- predict(optdigits.multinom,

  • ptdigits.train[,-c(1,40,65)],type="class")

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 6 / 9

slide-7
SLIDE 7

Multinomial Logit in R

# make confusion matrix: true label vs. predicted label > table(optdigits.train[,65],optdigits.multinom.pred)

  • ptdigits.multinom.pred

1 2 3 4 5 6 7 8 9 0 376 1 0 389 2 0 380 3 0 389 4 0 387 5 0 376 6 0 377 7 0 387 8 0 380 9 0 382

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 7 / 9

slide-8
SLIDE 8

Multinomial Logit in R

# predict class label on test data > optdigits.multinom.test.pred <- predict(optdigits.multinom,

  • ptdigits.test[,-c(1,40,65)],type="class")

> table(optdigits.test[,65],optdigits.multinom.test.pred)

  • ptdigits.multinom.test.pred

1 2 3 4 5 6 7 8 9 0 170 1 1 6 1 1 170 4 1 3 1 1 1 2 4 7 157 1 6 1 1 3 10 155 2 2 8 3 3 4 8 0 153 1 9 3 1 6 5 1 5 1 173 1 1 6 4 2 4 3 168 7 4 2 17 2 149 5 8 2 5 7 3 5 5 4 142 1 9 1 6 2 5 4 3 159

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 8 / 9

slide-9
SLIDE 9

Multinomial Logit in R

# make confusion matrix for predictions on test data > confmat <- table(optdigits.test[,65],

  • ptdigits.multinom.test.pred)

# use it to compute accuracy on test data > sum(diag(confmat))/sum(confmat) [1] 0.888147 The accuracy on the test sample is about 89%.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 9 / 9