[PPT] - klaR: A Package Including Various Classification Tools Christian R PowerPoint Presentation

SLIDE 1

klaR: A Package Including Various Classification Tools

Christian R¨

ver, Nils Raabe, Karsten Luebke and Uwe Ligges

Universit¨ at Dortmund 44221 Dortmund Germany May 21, 2004

SLIDE 2

Overview:

1. Example data 2. Classification tools 3. Comparing classification results 4. Variable selection 5. Illustrating discrimination 6. Visualization of data structure

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

1

SLIDE 3

B3 data: “West German business cycles”

data on 14 economic variables observed quarterly over 39 years

(157 observations)

each quarter was assigned to one out of 4 phases:
1. upswing
2. upper turning point
3. downswing
4. lower turning point
wanted: classification rule for phases
C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

2

SLIDE 4

RDA: Regularized Discriminant Analysis1

generalization of LDA and QDA
assumptions similar to QDA

(differences in means and covariances)

covariance matrices are manipulated using two parameters (γ and λ)
more robust against multicollinearity
parameters are determined by minimizing (estimated) misclassification

rate

1Friedman, J.H. (1989):

Regularized Discriminant Analysis. Journal of the American Statistical Association 84, 165-175.

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

3

SLIDE 5

RDA: special cases

(γ=0, λ=0): QDA — individual covariances for each group.
(γ=0, λ=1): LDA — a common covariance matrix.
(γ=1, λ=0):

Conditional independence, identical variances within class (similar to Naive Bayes).

(γ=1, λ=1):

Objects are assigned to class with nearest mean (euclidean).

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

4

SLIDE 6

RDA: examples

set parameters manually...

> x <- rda(PHASEN~., data=B3[train,], gamma=0.05, lambda=0.1)

...or optimize misclassification rate.

> x <- rda(PHASEN~., data=B3[train,])

prediction etc. as usual

> predict(x, B3[test,]) $class [1] 3 3 3 4 4 4 4 1 3 1 1 1 1 1 1 1 4 4 4 1 1 4 4 4 1 1

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

5

SLIDE 7

SVMlight2

interface to T. Joachims’ Support Vector Machine implementation
supports loss parameters and 1-against-all classification
returns comparable membership scores (‘posterior probabilities’)
example:

> x <- svmlight(PHASEN ~ ., data=B3[train,]) > predict(x, B3[test,])

2Joachims, T. (2004): SVMlight. http://svmlight.joachims.org/

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

6

SLIDE 8

Comparing classifications

looking at misclassifications:

> errormatrix(true.phase, rda.prediction) predicted true dn ltp up utp -SUM- dn 2 7 7 ltp 2 4 2 up 1 12 14 13 utp 5 1 5

SUM-

3 24 27

27 out of 48 are misclassified, worst rates for (true) “utp”, most

misclassifications go into class “ltp”,. . .

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

7

SLIDE 9

Comparing classifications

looking at posterior assignments:

$posterior up utp dn ltp [1,] 0.000 0.000 0.978 0.022 [2,] 0.001 0.000 0.995 0.005 [3,] 0.077 0.000 0.151 0.772 [4,] 0.249 0.000 0.000 0.750 [5,] 0.256 0.000 0.005 0.739 each observation is assigned to every class with a certain posterior probability or membership

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

8

SLIDE 10

Comparing classifications

probability distribution over 4 classes may be illustrated by a point in a

3-dimensional simplex (tetraeder, ‘barycentric plot’): – each corner corresponds to one class, – probability for certain class proportional to distance to opposite side

example:

> quadplot(rdapred$posterior, [...] )

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

9

SLIDE 11

RDA posterior assignments

1 2 3 4

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

10

SLIDE 12

SVMlight posterior assignments

1 2 3 4

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

11

SLIDE 13

Comparing classifications

RDA: greater posterior probabilities

(points on edges and corners)

SVMlight: more uncertainty

(points inside simplex) ➜ measure these features for comparison

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

12

SLIDE 14

Comparing classifications

derive3

– Correctness rate: 1 - error rate – Accuracy: distance to ‘true’ corner – Ability to separate: distance to classified corner – Confidence: mean membership of assigned class (either by class or average)

3Garczarek, U. and Weihs, C. (2003):

Standardizing the Comparison of Partitions. Computational Statistics 18, 143-162.

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

13

SLIDE 15

> ucpm(m=rdapred$posterior, tc=B3$PHASEN[test]) $CR [1] 0.5833333 $AC [1] 0.3250307 $AS [1] 0.981954 $CF [1] 0.9889456 $CFvec 1 2 3 4 0.9912088 1.0000000 0.9999684 0.9511723

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

14

SLIDE 16

Comparing classifications

LDA RDA SVM Correctness rate (1 - error rate) 0.44 0.58 0.54 Accuracy (distance to true corner) 0.03 0.33 0.17 Ability to separate (distance to classified corner) 0.75 0.98 0.29 Confidence (mean membership of assigned class) 0.83 0.99 0.47

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

15

SLIDE 17

Variable selection

stepclass: stepwise selection using (estimated) misclassification rate

– forward selection: add variables to model – backward selection: throw variables out – or both directions

works for most classification methods
C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

16

SLIDE 18

Variable selection

example:

> x <- stepclass(PHASEN~., data=B3[train,], + method="qda", prior=rep(1/4,4)) > x method : qda final model : EWAJW, LSTKJW, ZINSLR error rate : 0.3265

error rate for test set is 29% (71% correct)
C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

17

SLIDE 19

Visualization of partitionings

how are classes located / separated?
look at partitioning for every pair of variables...

> partimat(B3[,x$model$name], B3[,"PHASEN"], + method="qda", plot.matrix=TRUE)

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

18

SLIDE 20

2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 1 11 1 1 1 2 2 2 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 2 2 33 3 3 3 4 4 4 4 1 12 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 11 1 1 11 1 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 41 1

Error: 0.287

5 10 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 34 1 1 1 1 1 1 2 22 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 41 1 2 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 11 1 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 1 1

Error: 0.287

5 10 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 1 12 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 34 4 4 4 4 4 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 2 2 2 2 2 3 3 3 3 3 3 3 33 4 1 1

Error: 0.459

2 4 6 −4 −2 2 4 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 1 1 2 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 33 3 3 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 11 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 1 1

Error: 0.459

2 4 6 −4 −2 2 4 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 1 1 2 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 1 1

Error: 0.42

5 10 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 1 1 1 1 1 1 2 2 2 3 3 3 33 3 4 4 4 4 1 1 1 1 1 1 2 2 3 3 3 3 3 4 4 4 4 1 1 2 2 3 3 3 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 33 3 3 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 1 1

Error: 0.42

5 10 −4 −2 2 4 6 −4 −2 2 4 6 EWAJW LSTKJW 2 4 6 2 4 6 ZINSLR

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

19

SLIDE 21

Visualization of data structure

EDAM: Eight Directions Arranged Map4
similar to Self Organizing Maps
observations (and gaps between) are arranged on a 2D-grid in order to

reflect distances

example:

> lcEDAM <- EDAM(B3[test,-1], classes=B3$PHASEN[test], + standardize = TRUE, iter.max = 20)

4Raabe, N. (2003).

Vergleich von Kohonen Self-Organizing-Maps mit einem nichtsimultanen Klassi- fikations- und Visualisierungsverfahren. Diploma Thesis, University of Dortmund.

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

20

SLIDE 22

1 2 3 4 5 6 7 8 1 2 3 4 5 6 Dimension 1 Dimension 2 upswing upper turning point downswing lower turning point

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

21

SLIDE 23

1 2 3 4 5 6 7 8 1 2 3 4 5 6 Dimension 1 Dimension 2 1993,1 1993,1 1993,2 1993,2 1993,4 1993,4 1982,4 1982,4 1982,2 1982,2 1982,3 1982,3 1992,1 1992,1 1992,2 1992,2 1993,3 1993,3 1983,1 1983,1 1984,2 1984,2 1985,1 1985,1 1992,3 1992,3 1992,4 1992,4 1994,1 1994,1 1984,1 1984,1 1983,2 1983,2 1983,3 1983,3 1991,4 1991,4 1989,3 1989,3 1984,4 1984,4 1983,4 1983,4 1985,3 1985,3 1984,3 1984,3 1991,3 1991,3 1989,4 1989,4 1988,4 1988,4 1988,3 1988,3 1985,4 1985,4 1985,2 1985,2 1991,2 1991,2 1990,2 1990,2 1989,2 1989,2 1988,2 1988,2 1987,3 1987,3 1987,2 1987,2 1990,3 1990,3 1990,1 1990,1 1989,1 1989,1 1987,4 1987,4 1986,1 1986,1 1987,1 1987,1 1991,1 1991,1 1990,4 1990,4 1988,1 1988,1 1986,4 1986,4 1986,3 1986,3 1986,2 1986,2 upswing upper turning point downswing lower turning point

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

22

SLIDE 24

1 2 3 4 5 6 7 8 1 2 3 4 5 6 Dimension 1 Dimension 2 1993,1 1993,1 1993,2 1993,2 1993,4 1993,4 1982,4 1982,4 1982,2 1982,2 1982,3 1982,3 1992,1 1992,1 1992,2 1992,2 1993,3 1993,3 1983,1 1983,1 1984,2 1984,2 1985,1 1985,1 1992,3 1992,3 1992,4 1992,4 1994,1 1994,1 1984,1 1984,1 1983,2 1983,2 1983,3 1983,3 1991,4 1991,4 1989,3 1989,3 1984,4 1984,4 1983,4 1983,4 1985,3 1985,3 1984,3 1984,3 1991,3 1991,3 1989,4 1989,4 1988,4 1988,4 1988,3 1988,3 1985,4 1985,4 1985,2 1985,2 1991,2 1991,2 1990,2 1990,2 1989,2 1989,2 1988,2 1988,2 1987,3 1987,3 1987,2 1987,2 1990,3 1990,3 1990,1 1990,1 1989,1 1989,1 1987,4 1987,4 1986,1 1986,1 1987,1 1987,1 1991,1 1991,1 1990,4 1990,4 1988,1 1988,1 1986,4 1986,4 1986,3 1986,3 1986,2 1986,2 upswing upper turning point downswing lower turning point

1 2 3 4

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

23

SLIDE 25

> require(klaR)

seen:

– Classification tools: rda, svmlight – Comparing classifications: errormatrix, ucpm – 3D barycentric plots: quadplot – Variable selection: stepclass – Illustrating classifications: partimat – Data visualization: EDAM

further features:

– 2D barycentric plots: triplot – Hidden Markov Modelling: hmm.sop – Simple k-Nearest Neighbour: sknn

C. R¨
ver, N. Raabe, K. Luebke and U. Ligges: klaR: A Package Including Various Classification Tools

24