Analysis of sorting data using multiple correspondence analysis and - - PowerPoint PPT Presentation

analysis of sorting data using multiple
SMART_READER_LITE
LIVE PREVIEW

Analysis of sorting data using multiple correspondence analysis and - - PowerPoint PPT Presentation

Analysis of sorting data using multiple correspondence analysis and a related method E.M. Qannari Ph. Courcoux V. Cariou ONIRIS, Nantes, F-44322, France 1 Sorting data : Procedure n stimuli evaluated by m subjects: Please, sort the


slide-1
SLIDE 1

1

Analysis of sorting data using multiple correspondence analysis and a related method E.M. Qannari Ph. Courcoux

  • V. Cariou

ONIRIS, Nantes, F-44322, France

slide-2
SLIDE 2

2

Sorting data : Procedure n stimuli evaluated by m subjects:

“Please, sort the stimuli in as many groups as you

consider necessary with the understanding that stimuli in the same group are perceived as similar”

Acid Salty Fresh Salty Bitter Sweet

Subject 1 Subject 2 Subject m

slide-3
SLIDE 3

3

General setting and notations

n

K1 group indicators K2 groups indicators Km groups indicators

m categorical variables

(represented by their indicator variables)

Kj group indicators

X1 X2 Xj Xm

slide-4
SLIDE 4

4

Beer data

Data from Abdi H., Chollet S., Valentin D. and Chréa C. (2007) Analysing assessors and products in sorting tasks: DISTATIS,theory and applications. Food Quality and Preference.

slide-5
SLIDE 5

5

Data from Abdi et al. (2007)

  • The data relate to an experiment where ten

consumers were instructed to sort eight commercial beers.

# Beer Subj1 Subj2 Subj3 Subj4 Subj5 Subj6 Subj7 Subj8 Subj9 Subj10 1 Affligen 1 4 3 4 1 1 2 2 1 3 2 Budweiser 4 5 2 5 2 3 1 1 4 3 3 BucklerBlonde 3 1 2 3 2 4 3 1 1 2 4 Killian 4 2 3 3 1 1 1 2 1 4 5 StLandelin 1 5 3 5 2 1 1 2 1 3 6 BucklerHighland 2 3 1 1 3 5 4 4 3 1 7 FruitDefendu 1 4 3 4 1 1 2 2 2 4 8 EKU28 5 2 4 2 4 2 5 3 4 5

slide-6
SLIDE 6

6

Discrimination indices and MCA

  • Given a (quantitative) variable z and let’s consider

(categorical) variable Xj: 2(z/j) : discrimination index : the between groups to total variance ratio associated with z and Xj.

  • We seek z so as to maximize :
  • It is know that this problem leads to MCA
  • Subsequent z variables (factors) are sought following the

same strategy, under orthogonality constraints.

) / ( ) (

1 2

j z z I

m j

 

slide-7
SLIDE 7

7

Standardized MCA

  • Alternatively:

) / ( 1 ) (

1 2

j z K z I

m j j

 

slide-8
SLIDE 8

8

MCA applied to beer data

0.0 0.2 0.4 0.6 0.8

  • 0.2

0.0 0.2 0.4 0.6 0.8

Reprsentation of the beers axes 1&2 axis 1 axis 2

Affligen

Budweiser

Buckler Blonde

Killian St Landelin

Buckler Highland

Fruit Defendu

EKU28

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4

Reprsentation of the beers axes 3&4

axe 3 axe 4

Affligen Budweiser Buckler Blonde Killian St Landelin Buckler Highland Fruit Defendu EKU28

slide-9
SLIDE 9

9

Alternative method: maximizing the between groups variances

  • X=[X1 , X2 , …, Xm] (the indicator variables supposed

to be centered)

  • Let z=Xu and denote by B(z/j) the between groups

variance of z with respect to Xj.

  • We define the total between groups variance as:

) / ( ) (

1

j z B z B

m j

slide-10
SLIDE 10

10

An alternative method to MCA

  • We can show that the vector of loadings u is an

eigenvector of the matrix (associated with the largest eigenvalue).

  • Subsequent z variables can be sought following

the same strategy, under orthogonality constraints.

   

 

   

         

m j T j j T j j T m j T j j T j j T

X X X X P with PX X X X X X X X

1 1 1 1

slide-11
SLIDE 11

11

The rationale behind the method of analysis

  • In addition to investigating the relationships

between the categorical variables, we take account

  • f the variances of the indicator variables.
  • VAR(Indicator)=p*(1-p)

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 0.20 0.25

Variance of an indicator variable

p

p(1-p)

Presence of rare categories Presence of rare categories

slide-12
SLIDE 12

12

  • 1

1 2

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 2.0

Representation of the beers axes 1&2

axis 1 axis 2

Affligen Budweiser Buckler Blonde Killian St Landelin Buckler Highland Fruit Defendu EKU28

  • 2
  • 1

1

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5

Representation of the beers axes 3&4

axis 3 axis 4

Affligen Budweiser Buckler Blonde Killian St Landelin Buckler Highland Fruit Defendu EKU28

Alternative method applied to beer data

slide-13
SLIDE 13

13

A continuum approach

  • MCA

z=Xu with u eigenvetor of :

  • Alternative method

z=Xu with u eigenvetor of :

  • Regularized MCA:

z=Xu with u eigenvetor of :

PX X X X

T T 1

) (

PX X T

 

PX X I X X

T T 1

1

   

slide-14
SLIDE 14

14

continuum approach and Ridge Regularization

 

PX X I X X

T T 1

1

   

 

    

1

1

k with PX X kI X X

T T

The eigenvectors of : are also eigenvectors of : Ridge regularization

slide-15
SLIDE 15

15

RMCA (lambda=0.95)

  • 1

1 2

  • 1

1 2

Représentation des produits axes 1&2

axe 1 axe 2

Affligen Budweiser Buckler Blonde Killian St Landelin Buckler Highland Fruit Defendu EKU28

  • 2
  • 1

1

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5

Représentation des produits axes 3&4

axe 3 axe 4

Affligen Budweiser Buckler Blonde Killian St Landelin Buckler Highland Fruit Defendu EKU28

slide-16
SLIDE 16

16

Property 1 illustrated on beer data

The variance of z increases with 

Alternative

MCA

slide-17
SLIDE 17

17

Property 2 illustrated on beer data

The between groups variance of z increases with 

Alternative

MCA

slide-18
SLIDE 18

18

Property 3 illustrated on beer data

0.0 0.2 0.4 0.6 0.8 1.0 lambda

The discrimination index (between to total variance ratio)

  • f z decreases with 

MCA

Alternative

slide-19
SLIDE 19

19

Conclusion

  • Proposition of an alternative method that handles

the problem of rare categories

  • Further research work is needed to investigate this

alternative method.

  • Proposition of a continuum approach whose end

points are MCA and the alternative method.

  • This approach enjoys interesting properties and can

easily be extended to the framework of Generalized Canonical Correlation Analysis.

  • See how it relates to Regularized MC by Takane and

Hwang.

slide-20
SLIDE 20

20

TRUGAREZ!

slide-21
SLIDE 21

21

Co-occurrence matrix

1 2 3 4 5 6 7 8 1 10 1 1 5 6 8 2 1 10 3 2 5 1 3 1 3 10 2 2 4 5 2 2 10 5 5 1 5 6 5 2 5 10 4 6 10 7 8 5 4 10 8 1 1 10

B e e r s B e e r s