Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 - - PowerPoint PPT Presentation

classification introduction
SMART_READER_LITE
LIVE PREVIEW

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 - - PowerPoint PPT Presentation

Classification: Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements BVT Review Regression Questions? General Questions? 2 Statistical Learning Supervised Learning Regression Classification


slide-1
SLIDE 1

Classification: Introduction

David Dalpiaz STAT 430, Fall 2017

1

slide-2
SLIDE 2

Announcements

  • BVT Review
  • Regression Questions?
  • General Questions?

2

slide-3
SLIDE 3

Statistical Learning

  • Supervised Learning
  • Regression
  • Classification
  • Unsupervised Learning

3

slide-4
SLIDE 4

Classification versus Regression

Regression Y ∈ R Classification Y ∈ G = {1, 2, 3, . . . g}

4

slide-5
SLIDE 5

American Bulldog

Figure 1: A Good Dog

5

slide-6
SLIDE 6

American Bulldog

Figure 2: A Good Dog

6

slide-7
SLIDE 7

American Bulldog

Figure 3: A Good Dog

7

slide-8
SLIDE 8

American Bulldog

Figure 4: A Good Dog

8

slide-9
SLIDE 9

Golden Retriever

Figure 5: A Good Dog

9

slide-10
SLIDE 10

Golden Retriever

Figure 6: A Good Dog

10

slide-11
SLIDE 11

Golden Retriever

Figure 7: A Good Dog

11

slide-12
SLIDE 12

Golden Retriever

Figure 8: Some Good Dogs

12

slide-13
SLIDE 13

Dog Probabilities

Let’s make some assumptions about dogs. Let

  • Y be the species, either B for American Bulldog, or R for

Golden Retriever

  • W be the weight of the dog, in kg
  • H be the height of the dog, in cm

0.5 = πB = P(Y = B) = P(Y = G) = πR = 0.5

13

slide-14
SLIDE 14

Dog Probabilities

American Bulldog H | B ∼ N(µ = 56, σ2 = 1.52) W | B ∼ N(µ = 34, σ2 = 2.252) Golden Retriever H | R ∼ N(µ = 53, σ2 = 1.52) W | R ∼ N(µ = 30, σ2 = 0.752) Let’s also assume that W and H are conditionally independent given B or R. (However, this is unrealistic.)

14

slide-15
SLIDE 15

Dog Parameters

# ht, wt # cm, kg b_mu = c(56, 34) b_sigma = matrix(c(1.5 ^ 2, 0, 0, 2.25 ^ 2), 2, 2) r_mu = c(53, 32) r_sigma = matrix(c(1.5 ^ 2, 0, 0, 0.75 ^ 2), 2, 2)

15

slide-16
SLIDE 16

Weight Distribution

28 30 32 34 36 38 40 0.0 0.1 0.2 0.3 0.4 0.5 Weight Density Golden Retriever American Bulldog

16

slide-17
SLIDE 17

Bayes Classifier

CB(x) = argmax

k

P( Y = k | X = x ) Decision Boundary x : P( Y = B | X = x ) = P( Y = R | X = x )

17

slide-18
SLIDE 18

Weight Bayes, Decision Boundary

28 30 32 34 36 38 40 0.0 0.1 0.2 0.3 0.4 0.5 Weight Density Golden Retriever American Bulldog

18

slide-19
SLIDE 19

Height Distribution

48 50 52 54 56 58 60 0.00 0.05 0.10 0.15 0.20 0.25 Height Density Golden Retriever American Bulldog

19

slide-20
SLIDE 20

Height Distribution, Decision Boundary

48 50 52 54 56 58 60 0.00 0.05 0.10 0.15 0.20 0.25 Height Density Golden Retriever American Bulldog

20

slide-21
SLIDE 21

Height and Weight Distribution

48 50 52 54 56 58 60 30 35 40 Height Weight Golden Retriever American Bulldog

21

slide-22
SLIDE 22

Let’s Make Some Dogs

sim_dog_data = function(n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) { species = c(rep("American Bulldog", n_obs / 2), rep("Golden Retriever", n_obs / 2)) ht_wt = rbind(mvtnorm::rmvnorm(n = n_obs / 2, mean = b_mu, sigma = b_sigma), mvtnorm::rmvnorm(n = n_obs / 2, mean = r_mu, sigma = r_sigma)) data.frame( species, height = ht_wt[, 1], weight = ht_wt[, 2]) }

22

slide-23
SLIDE 23

Let’s Make Some Dogs

set.seed(66) dog_trn = sim_dog_data(n_obs = 200, b_mu, b_sigma, r_mu, r_sigma) dog_tst = sim_dog_data(n_obs = 800, b_mu, b_sigma, r_mu, r_sigma)

23

slide-24
SLIDE 24

Simulated Dogs, Univariate Density Estimates

Feature

0.00 0.05 0.10 0.15 0.20 0.25 45 50 55 60

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | |

height

0.0 0.1 0.2 0.3 0.4 0.5 30 35 40

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | || | | | | | | | || | | | | | | | | | | | | | | | | | |

weight American Bulldog Golden Retriever

24

slide-25
SLIDE 25

Simulated Dogs, Bivariate Density Estimates

Scatter Plot Matrix height

54 56 58 60 54 56 58 60 48 50 52 54 48 50 52 54

weight

34 36 38 34 36 38 30 32 34 30 32 34

American Bulldog Golden Retriever

25

slide-26
SLIDE 26

Simulated Train Dogs, Decision?

48 50 52 54 56 58 30 32 34 36 38

Train Data

height weight Golden Retriever American Bulldog

26

slide-27
SLIDE 27

Simulated Test Dogs, Decision?

50 52 54 56 58 60 28 30 32 34 36 38 40

Test Data

height weight Golden Retriever American Bulldog

27

slide-28
SLIDE 28

Classification Error

I(yi = ˆ C(x)) =

  

1 yi = ˆ C(x) yi = ˆ C(x) Err(ˆ C, Data) = 1 n

n

  • i=1

I(yi = ˆ C(xi))

28

slide-29
SLIDE 29

Decision Boundary ?

w = 142.1745763 − 2.0006502 · h C = function(data) { with(data, ifelse(weight > (142.1746 - 2.00065 * height), "American Bulldog", "Golden Retriever")) } # train error mean(C(dog_trn) != dog_trn$species) ## [1] 0.125 # test error mean(C(dog_tst) != dog_tst$species) ## [1] 0.11

29