A Decision Tree for Interval-valued Data with Modal Dependent - - PowerPoint PPT Presentation

a decision tree for interval valued data with modal
SMART_READER_LITE
LIVE PREVIEW

A Decision Tree for Interval-valued Data with Modal Dependent - - PowerPoint PPT Presentation

A Decision Tree for Interval-valued Data with Modal Dependent Variable Djamal Seck 1 , Lynne Billard 2 , Edwin Diday 3 and Filipe Afonso 4 1Departement de Mathematiques et Informatique, Universit e Cheikh Anta Diop de Dakar, Senegal


slide-1
SLIDE 1

A Decision Tree for Interval-valued Data with Modal Dependent Variable

Djamal Seck1, Lynne Billard2, Edwin Diday3 and Filipe Afonso4

1Departement de Mathematiques et Informatique, Universit´ e Cheikh Anta Diop de Dakar, Senegal djamal.seck@ucad.edu.sn 2Department of Statistics, University of Georgia, Athens GA 30605 USA lynne@stat.uga.edu 3 CEREMADE, University of Paris Dauphine 75775 Paris Cedex 16 France edwin.diday@ceremade.dauphine.fr 4 Syrokko, A´ eropˆ

  • le de Roissy, Bat. A´

eronef, 5 rue de Copenhague, 95731 Roissy Charles de Gaulle Cedex France, afonso@syrokko.com

COMPSTAT - August 2010

Seck Symbolic Decision Tree

slide-2
SLIDE 2

The Future

Schweizer (1985): ”Distributions are the numbers of the future”

Seck Symbolic Decision Tree

slide-3
SLIDE 3

Types of Data

Classical Data Value X:

  • A single point in p-dimensional space

E.g., X = 17, X = 2.1, X = blue

Seck Symbolic Decision Tree

slide-4
SLIDE 4

Types of Data

Classical Data Value X:

  • A single point in p-dimensional space

E.g., X = 17, X = 2.1, X = blue Symbolic Data Value Y :

  • Hypercube or Cartesian product of distributions

in p-dimensional space I.e. Y = list, interval, modal in structure

Seck Symbolic Decision Tree

slide-5
SLIDE 5

Types of Data

Classical Data Value X:

  • A single point in p-dimensional space

E.g., X = 17, X = 2.1, X = blue Symbolic Data Value Y :

  • Hypercube or Cartesian product of distributions

in p-dimensional space I.e. Y = list, interval, modal in structure Modal data: Histogram, empirical distribution function, probability distribution, model, ...

Seck Symbolic Decision Tree

slide-6
SLIDE 6

Types of Data

Classical Data Value X:

  • A single point in p-dimensional space

E.g., X = 17, X = 2.1, X = blue Symbolic Data Value Y :

  • Hypercube or Cartesian product of distributions

in p-dimensional space I.e. Y = list, interval, modal in structure Modal data: Histogram, empirical distribution function, probability distribution, model, ... Weights: Relative frequencies capacities, credibilities, necessities, possibilities, ...

Seck Symbolic Decision Tree

slide-7
SLIDE 7

Symbolic Data

How do symbolic data arise?

1

Aggregated data by classes or groups.

Research interest : classes or groups

2

Natural symbolic data.

Pulse rate : 64 ± 2=[62,66]. Daily temperature : [55,67].

3

Published data : census data.

4

Symbolic data : range, list, and distribution, etc.

Seck Symbolic Decision Tree

slide-8
SLIDE 8

Literature Review

1

Clustering for classical data - CART, Breiman et al. (1984)

2

Clustering for symbolic data.

Agglomerative algorithm and dissimilarity measures for non-modal categorical and interval-valued data: Gowda and Diday (1991) Pyramid clustering: Brito (1991, 1994), Brito and Diday (1990) Spatial pyramids: Raoul Mohamed (2009) Divisive monothetic algorithm for intervals: Chavent (1998,2000) Divisive algorithms for histograms: Kim (2009) Decision trees for non-modal dependent variables: P´ erinel (1996, 1999), Limam (2005), Winsberg et al. (2006),... ......

Seck Symbolic Decision Tree

slide-9
SLIDE 9

Literature Review

1

Clustering for classical data - CART, Breiman et al. (1984)

2

Clustering for symbolic data.

Agglomerative algorithm and dissimilarity measures for non-modal categorical and interval-valued data: Gowda and Diday (1991) Pyramid clustering: Brito (1991, 1994), Brito and Diday (1990) Spatial pyramids: Raoul Mohamed (2009) Divisive monothetic algorithm for intervals: Chavent (1998,2000) Divisive algorithms for histograms: Kim (2009) Decision trees for non-modal dependent variables: P´ erinel (1996, 1999), Limam (2005), Winsberg et al. (2006),... ...... Decision tree for interval data and modal dependent variable (STREE): Seck (2010) (a CART methodology for symbolic data)

Seck Symbolic Decision Tree

slide-10
SLIDE 10

The Data

We have observations Ω = {ω1, . . . , ωn}, where ωi has realization Yi = (Yi1, . . . , Yip), i = 1, . . . , n. Modal multinominal (Modal categorical): Yij = {mijk, pijk; k = 1, . . . , si}, si

k=1 pijk = 1,

with mijk ∈ Oj = {mj1, . . . , mjs}, j = 1, . . . , p i = 1, . . . , n. (Take si = s, wlg.) Multi-valued (non-modal): Yij = {mijk, k = 1, . . . , si}, i.e., pijk = 1/s or 0, with mijk ∈ Oj, j = 1, . . . , p, i = 1, . . . , n. Intervals: Yi = ([ai1, bi1], . . . , [aip, bip]), with aij, bij ∈ Rj, j = 1, . . . , p, i = 1, . . . , n. Nominal (classical categorical): Special case of modal multinominal with si = 1, p1 = 1; write Yij ≡ mij1 = δij, δij ∈ Oj. Classical continuous variable: Special case of interval with aij = [aij, aij] for aij ∈ Rj.

Seck Symbolic Decision Tree

slide-11
SLIDE 11

STREE Algorithm

Have at rth stage the partition Pr = (C1, . . . , Cr) Discrimination criterion: D(N) - explains partition of node N as in CART analysis Homogeneity criterion: H(N) - inertia associated with explanatory variables as in pure hierarchy tree analysis We take the mixture, for α > 0, β > 0, I = αD(N) + βH(N) with α + β = 1. The D(N) is taken as the Gini measure (as in CART) D(N) =

  • i=f

pipf = 1 −

  • i=1,...,r

p2

i

with pi = ni/n, ni = card(N Ci), n = card(N); the H(N) is H(N) =

  • ωi1 ∈Ω
  • ωi2 ∈Ω

pi1pi2 2µ d2(ωi1, ωi2) where d(ωi1, ωi2) is a distance measure between ωi1 and ωi2, pi is the weight associated with ωi and µ = N

i=1 pi.

Seck Symbolic Decision Tree

slide-12
SLIDE 12

STREE Algorithm

Have at rth stage the partition Pr = (C1, . . . , Cr) Discrimination criterion: D(N) - explains partition of node N as in CART analysis Homogeneity criterion: H(N) - inertia associated with explanatory variables as in pure hierarchy tree analysis We take the mixture, for α > 0, β > 0, I = αD(N) + βH(N) with α + β = 1. The D(N) is taken as the Gini measure (as in CART) D(N) =

  • i=f

pipf = 1 −

  • i=1,...,r

p2

i

with pi = ni/n, ni = card(N Ci), n = card(N); the H(N) is H(N) =

  • ωi1 ∈Ω
  • ωi2 ∈Ω

pi1pi2 2µ d2(ωi1, ωi2) where d(ωi1, ωi2) is a distance measure between ωi1 and ωi2, pi is the weight associated with ωi and µ = N

i=1 pi.

Select the partition C = {C1, C2} for which the reduction in I is greatest; i.e., maximize ∆I = I(C) − I(C1, C2).

Seck Symbolic Decision Tree

slide-13
SLIDE 13

Decision Tree - Distance Measures

The homogeneity criterion H(N) H(N) =

  • ωi1 ∈Ω
  • ωi2 ∈Ω

pi1pi2 2µ d2(ωi1, ωi2) where d(ωi1, ωi2) is a distance measure between ωi1 and ωi2, pi is the weight associated with ωi and µ = N

i=1 pi.

Seck Symbolic Decision Tree

slide-14
SLIDE 14

Decision Tree - Distance Measures

The homogeneity criterion H(N) H(N) =

  • ωi1 ∈Ω
  • ωi2 ∈Ω

pi1pi2 2µ d2(ωi1, ωi2) where d(ωi1, ωi2) is a distance measure between ωi1 and ωi2, pi is the weight associated with ωi and µ = N

i=1 pi. The STREE algorithm uses

Modal categorical variables - L1 distance: dj(ωi1, ωi2) =

k∈O |pi1jk − pi2jk|;

  • r, L2 distance:

dj(ωi1, ωi2) =

k∈O(pi1jk − pi2jk)2

Interval variables - Hausdorff distance: dj(ωi1, ωi2) = max(|ai1j − ai2j|, |bi1j − bi2j|) Classical categorical variables - (0, 1) distance: dj(ωi1, ωi2) = 0, if mi1j = mi2j 1, if mi1j = mi2j Classical continuous variables - Euclidean distance: dj(ωi1, ωi2) = (ai1j − ai2j)2

Seck Symbolic Decision Tree

slide-15
SLIDE 15

Decision Tree - Distance Measures

The homogeneity criterion H(N) H(N) =

  • ωi1 ∈Ω
  • ωi2 ∈Ω

pi1pi2 2µ d2(ωi1, ωi2) where d(ωi1, ωi2) is a distance measure between ωi1 and ωi2, pi is the weight associated with ωi and µ = N

i=1 pi. The STREE algorithm uses

Modal categorical variables - L1 distance: dj(ωi1, ωi2) =

k∈O |pi1jk − pi2jk|;

  • r, L2 distance:

dj(ωi1, ωi2) =

k∈O(pi1jk − pi2jk)2

Interval variables - Hausdorff distance: dj(ωi1, ωi2) = max(|ai1j − ai2j|, |bi1j − bi2j|) Classical categorical variables - (0, 1) distance: dj(ωi1, ωi2) = 0, if mi1j = mi2j 1, if mi1j = mi2j Classical continuous variables - Euclidean distance: dj(ωi1, ωi2) = (ai1j − ai2j)2 Hence, d(ωi1, ωi2) = p

j=1 dj(ωi1, ωi2).

Seck Symbolic Decision Tree

slide-16
SLIDE 16

Decision Tree - Cut Points

Cut points: Take Modal categorical case – Recall Yij = {mijk, pijk; k = 1, . . . , si}, si

k=1 pijk = 1,

(Take si = s, wlg.) with mijk ∈ Oj = {mj1, . . . , mjs}, j = 1, . . . , p i = 1, . . . , n. First: For each k in turn, order pijk from smallest to largest. There are Lk ≤ n distinct values of pjkr, r = 1, . . . , Lk. Then, cut point for this modality (mjk) is the probability cjkr = (pjkr + pjk,r+1)/2, r = 1, . . . , Lk − 1, k = 1, . . . , s. There are s

k=1(Lk − 1)possible partitions for each j.

Seck Symbolic Decision Tree

slide-17
SLIDE 17

Decision Tree - Cut Points

Cut points: Take Modal categorical case – Recall Yij = {mijk, pijk; k = 1, . . . , si}, si

k=1 pijk = 1,

(Take si = s, wlg.) with mijk ∈ Oj = {mj1, . . . , mjs}, j = 1, . . . , p i = 1, . . . , n. First: For each k in turn, order pijk from smallest to largest. There are Lk ≤ n distinct values of pjkr, r = 1, . . . , Lk. Then, cut point for this modality (mjk) is the probability cjkr = (pjkr + pjk,r+1)/2, r = 1, . . . , Lk − 1, k = 1, . . . , s. There are s

k=1(Lk − 1)possible partitions for each j.

Similarly, take pairs (mijk1, mijk2) with probability (pijk1 + pijk2) = pijk1k2. Repeat previous process using now these probabilities pijk1k2, for the Lk1k2 distinct probabilities among the s(s + 1)/2 possible pairs.

Seck Symbolic Decision Tree

slide-18
SLIDE 18

Decision Tree - Cut Points

Cut points: Take Modal categorical case – Recall Yij = {mijk, pijk; k = 1, . . . , si}, si

k=1 pijk = 1,

(Take si = s, wlg.) with mijk ∈ Oj = {mj1, . . . , mjs}, j = 1, . . . , p i = 1, . . . , n. First: For each k in turn, order pijk from smallest to largest. There are Lk ≤ n distinct values of pjkr, r = 1, . . . , Lk. Then, cut point for this modality (mjk) is the probability cjkr = (pjkr + pjk,r+1)/2, r = 1, . . . , Lk − 1, k = 1, . . . , s. There are s

k=1(Lk − 1)possible partitions for each j.

Similarly, take pairs (mijk1, mijk2) with probability (pijk1 + pijk2) = pijk1k2. Repeat previous process using now these probabilities pijk1k2, for the Lk1k2 distinct probabilities among the s(s + 1)/2 possible pairs. Likewise, take sets of three, four,. . . , (s − 1) of the s values of mijk, k = 1, . . . , s in Oj. The total number of possible cuts points is L. It can be shown that maxL = (n − 1) s−1

q=1

s

q

  • = (n − 1)2(2s−1 − 1).

Seck Symbolic Decision Tree

slide-19
SLIDE 19

Decision Tree - Cut Points

Cut points: Take Intervls case – Recall Yi = ([ai1, bi1], . . . , [aip, bip]), with aij, bij ∈ Rj, j = 1, . . . , p, i = 1, . . . , n. First: For each j, let Dj = {djr, r = 1, . . . , L} be the set of n aij and n bij values,

  • rdered from smallest to largest.

Thus, e.g., dj1 = mini∈Ω(aij), djL = mini∈Ω(bij), j = 1, . . . , p. There are L ≤ 2n distinct values of djr, r = 1, . . . , L. The cut points are cjr = (djr + dj,r+1)/2, r = 1, . . . , L

Seck Symbolic Decision Tree

slide-20
SLIDE 20

Decision Tree - Example

Fisher (1936): IRIS dataset - 150 observations, 50 for each species versicolor, virginica, setosa Y1 = Sepal Length, Y2 = Sepal Width, Y3 = Petal Length, Y4 = Petal Width

Seck Symbolic Decision Tree

slide-21
SLIDE 21

Decision Tree

Fisher (1936): IRIS dataset - 150 observations, 50 for each species setosa, versicolor, virginica Y1 = Sepal Length, Y2 = Sepal Width, Y3 = Petal Length, Y4 = Petal Width Clustered into 30 sets of observations, by k-means clustering method

Seck Symbolic Decision Tree

slide-22
SLIDE 22

Decision Tree

Fisher (1936): IRIS dataset - 150 observations, 50 for each species setosa, versicolor, virginica Y1 = Sepal Length, Y2 = Sepal Width, Y3 = Petal Length, Y4 = Petal Width Clustered into 30 sets of observations, by k-means clustering method Table 1: Fisher’s Iris Data as Intervals

Concept Speciesa Sepal length Sepal width Petal length Petal width ω1 {1, 1.0} [4.8, 5.4] [3.3, 3.8] [1.5, 1.9] [0.2, 0.6] . . . ω4 {1,1.0} [4.5, 4.5] [2.3, 2.3] [1.3, 1.3] [0.3, 0.3] . . . ω12 {2,.9; 3,.1} [4.9, 5.7] [2.5, 3.0] [4.1, 4.5] [1.2, 1.7] . . . ω30 {2,1.0} [6.2, 6.3] [2.2, 2.3] [4.4, 4.5] [1.3, 1.5]

aSpecies identified by 1,2,3 for setosa, versicolor, virginica, respectively.

Seck Symbolic Decision Tree

slide-23
SLIDE 23

Decision Tree

Fisher (1936): IRIS dataset - 150 observations, 50 for each species setosa, versicolor, virginica Y1 = Sepal Length, Y2 = Sepal Width, Y3 = Petal Length, Y4 = Petal Width Clustered into 30 sets of observations, by k-means clustering method Table 1: Fisher’s Iris Data as Intervals

Concept Speciesa Sepal length Sepal width Petal length Petal width ω1 {1, 1.0} [4.8, 5.4] [3.3, 3.8] [1.5, 1.9] [0.2, 0.6] . . . ω4 {1,1.0} [4.5, 4.5] [2.3, 2.3] [1.3, 1.3] [0.3, 0.3] . . . ω12 {2,.9; 3,.1} [4.9, 5.7] [2.5, 3.0] [4.1, 4.5] [1.2, 1.7] . . . ω30 {2,1.0} [6.2, 6.3] [2.2, 2.3] [4.4, 4.5] [1.3, 1.5]

aSpecies identified by 1,2,3 for setosa, versicolor, virginica, respectively.

Species – modal categorical data: Yu = {yk, pk; k = 1, . . . , su}, u = 1, . . . , m Y1, Y2, Y3, Y4 – interval data: Yuj = [auj, buj], j = 1, . . . , p, u = 1, . . . , m

Seck Symbolic Decision Tree

slide-24
SLIDE 24

Decision Tree

Pure Decision Tree on 30 IRIS intervals: α = 0

1 2 3 4 5 6 7

species: setosa, versicolor, virginica

Seck Symbolic Decision Tree

slide-25
SLIDE 25

Decision Tree

Pure CART Tree on original 150 IRIS observations: α = 0 1 2 3 4 5 6 7 8 species: setosa, versicolor, virginica

Seck Symbolic Decision Tree

slide-26
SLIDE 26

Decision Tree

Pure DIV Tree on 30 IRIS intervals: α = 1

1 2 3 4 5 6 7 8

species: setosa, versicolor, virginica

Seck Symbolic Decision Tree

slide-27
SLIDE 27

Decision Tree

Pure DIV Tree on original 150 IRIS observations: α = 1

1 2 3 4 5 6 7 8

species: setosa, versicolor, virginica

Seck Symbolic Decision Tree

slide-28
SLIDE 28

Decision Tree

Trees on 30 IRIS intervals:

1 2 3 4 5 6 7

Pure Decision Tree: α = 0

1 2 3 4 5 6 7 8

Pure DIV Tree: α = 1

Seck Symbolic Decision Tree

slide-29
SLIDE 29

Decision Tree

Trees on original 150 IRIS observations:

1 2 3 4 5 6 7 8

Pure Decision Tree: α = 0

1 2 3 4 5 6 7 8

Pure DIV Tree: α = 1

Seck Symbolic Decision Tree

slide-30
SLIDE 30

Decision Tree

Comparison of STREE and CART Algorithms Randomly divided 150 observations into Training subset (size n1), and Test subset (size n2), with n1 + n2 = n = 150 Several sets of (n1, n2)

  • 1. For CART:

Run CART algorithm on the n1 observations in Training subset

  • 1. For STREE:

First find 30 clusters from the n1 observations in Training subset Run decision tree analysis (α = 0) on the 30 clusters

  • 2. Test tree on the n1 observations in Test subset
  • 3. Obtain number of misclassifications
  • 4. Repeat 10 times for each (n1, n2)
  • 5. Calculate average number of misclassifications for each (n1, n2) and

for each Algorithm

Seck Symbolic Decision Tree

slide-31
SLIDE 31

Decision Tree

Comparison of STREE and CART Average Misclassifications for Test subsets (n2) n1 = Size of Training subset, n2 = Size of Test subset

Seck Symbolic Decision Tree

slide-32
SLIDE 32

∼ ▼❡r❝✐ ❇✐❡♥ ∼ ❚❤❛♥❦ ❨♦✉

Seck Symbolic Decision Tree

slide-33
SLIDE 33

∼ ▼❡r❝✐ ❇✐❡♥ ∼ ∼ ❚❤❛♥❦ ❨♦✉ ∼

Partial support from National Science Foundation gratefully acknowledged

Seck Symbolic Decision Tree