Descriptive and Exploratory Methods L eon Bottou largely copied - - PowerPoint PPT Presentation

descriptive and exploratory methods
SMART_READER_LITE
LIVE PREVIEW

Descriptive and Exploratory Methods L eon Bottou largely copied - - PowerPoint PPT Presentation

Descriptive and Exploratory Methods L eon Bottou largely copied from Mireille Summa-Gettler lectures (in french) COS 424 3/23/2010 Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric


slide-1
SLIDE 1

Descriptive and Exploratory Methods

L´ eon Bottou

largely copied from Mireille Summa-Gettler lectures (in french)

COS 424 – 3/23/2010

slide-2
SLIDE 2

Agenda

Goals Classification, clustering, regression, other. Representation Parametric vs. kernels vs. nonparametric Probabilistic vs. nonprobabilistic Linear vs. nonlinear Deep vs. shallow Capacity Control Explicit: architecture, feature selection Explicit: regularization, priors Implicit: approximate optimization Implicit: bayesian averaging, ensembles Operational Considerations Loss functions Budget constraints Online vs. offline Computational Considerations Exact algorithms for small datasets. Stochastic algorithms for big datasets. Parallel algorithms.

Today’s topic fits poorly in this picture.

L´ eon Bottou 2/86 COS 424 – 3/23/2010

slide-3
SLIDE 3

Introduction

Predictive methods – Construct models using examples (the training set). – Hope that it works well for future situations (e.g. on a testing set.) Descriptive methods – Describe the distribution of examples. – Investigate the geometry of the data. – Hope to acquire insights about the underlying phenomenon.

L´ eon Bottou 3/86 COS 424 – 3/23/2010

slide-4
SLIDE 4

A catalog of descriptive methods

Clustering methods – K-means, K-medioids, Gaussian mixtures. . . – Hierarchical clustering. . . Projection methods – Principal component analysis (PCA) [Hotelling, 30s] – Correspondence analysis (CA) [Benzecri, 60s] – Multiple correspondence analysis (MCA) – Canonical correlation analysis (CCA), . . .

  • Embedding methods

– Kernel PCA – Locally linear embbedding (LLE) – ISOMAP

L´ eon Bottou 4/86 COS 424 – 3/23/2010

slide-5
SLIDE 5

I. Principal Component Analysis

L´ eon Bottou 5/86 COS 424 – 3/23/2010

slide-6
SLIDE 6

Sparkling water springs

Observations – 21 sparkling water springs in France. Continuous variables – 8 ion concentrations (calcium, magnesium, . . . ) – price per liter. Categorical variables – Total minerality (low, medium, high) – Compliance with regulations (yes, no) – Region (Alps, Auvergne, Languedoc, . . . )

L´ eon Bottou 6/86 COS 424 – 3/23/2010

slide-7
SLIDE 7

Sparkling water springs

  • !"
  • #
  • $$
  • %
  • %

#

  • %
  • #
  • #
  • &

'

($

  • #
  • #
  • %
  • &

' )

  • #
  • %
  • %
  • %
  • '

($

  • #

& ' #

  • %
  • %
  • %
  • %

%

  • &

'

($

  • #%
  • $$
  • %
  • #
  • #
  • ($

% % #

  • %
  • $$
  • *+

#

  • #

%

  • %
  • *
  • #

%

  • %
  • #%
  • %
  • $$
  • #

&

  • ($
  • %
  • ##

% %

  • $$
  • #
  • ($
  • ,-

) ,' !

  • .$

$/

  • 011
  • 21"
  • 1*3
  • '--

4-5

  • 1

*6-'

  • *7
  • 8

89

  • 8$'*-
  • :&
  • Active

variables

  • Supplementary

variables

L´ eon Bottou 7/86 COS 424 – 3/23/2010

slide-8
SLIDE 8

Elementary planes

Pairwise graphs are not informative

L´ eon Bottou 8/86 COS 424 – 3/23/2010

slide-9
SLIDE 9

Approximate a data cloud by its projection

High dimensional cloud. Low dimensional projection.

eon Bottou 9/86 COS 424 – 3/23/2010

slide-10
SLIDE 10

Some projections are more informative

  • The main idea of PCA is

the determination of a good projection.

L´ eon Bottou 10/86 COS 424 – 3/23/2010

slide-11
SLIDE 11

One data table, two data clouds

eon Bottou 11/86 COS 424 – 3/23/2010

slide-12
SLIDE 12

PCA projection of the 21 rows

L´ eon Bottou 12/86 COS 424 – 3/23/2010

slide-13
SLIDE 13

PCA projection of the 8 columns

L´ eon Bottou 13/86 COS 424 – 3/23/2010

slide-14
SLIDE 14

Summary

Principal component analysis – Table of n observations represented by p continuous variables. – Cloud of n row-points (observations) in dimension p. – Cloud of p column-points (variables) in dimension n. – Search the “best” projection for each cloud. Interpretation – Identify similar observations. – Identify similar variables. Best projection ?

L´ eon Bottou 14/86 COS 424 – 3/23/2010

slide-15
SLIDE 15

Distance

Distances – A good projection reveals whether two points were close or distant. – We would like to use the convenient Euclidian distance. – Variables often have very different numerical ranges.

  • Correlation PCA

– Normalize the mean and standard deviation of each variable, xij = (zij − ¯

zj)/σj.

– This is the default and this is what we discuss today. Covariance PCA – Normalize the mean of each variable, xij = (zij − ¯

zj), but

not the standard deviation. – This is sometimes useful.

L´ eon Bottou 15/86 COS 424 – 3/23/2010

slide-16
SLIDE 16

Normed centered data

  • !
  • "

"" #

  • $

" $

  • #
  • $
  • "
  • "
  • "#

#

  • #
  • $
  • "
  • "

#

  • "
  • $
  • #
  • $"
  • %
  • $
  • #

#

  • "
  • #
  • $
  • #
  • "
  • "

# " $ "$

  • $

#"

  • #

"$ $

  • $$
  • $
  • "
  • $
  • "
  • $
  • &
  • $
  • $
  • $
  • "

"

  • #

#

  • $
  • $

$#

  • $#
  • $
  • "
  • #
  • $$
  • $

#

  • "

"

  • "

$

  • $
  • %
  • '(

() *++ ,+! +- .// 0/1 + 2/. 3 4 45 4(./ 67

L´ eon Bottou 16/86 COS 424 – 3/23/2010

slide-17
SLIDE 17

Preserve distances

Projection contracts the distances

  • PCA criterion: maximize sum of projected distances

max H n

  • i=1

n

  • i′=1

d2 H(i, i′)

L´ eon Bottou 17/86 COS 424 – 3/23/2010

slide-18
SLIDE 18

Maximize dispersion

Observe

  • i,i′

(xi − xi′)2 =

  • i,i′
  • (xi − ¯

x) − (xi′ − ¯ x) 2 = · · · = 2n

  • i

(xi − ¯ x)2 = 2n2 Var(x).

Equivalent PCA criterion: maximize dispersion – Maximize average distance to the cloud mean G.

max H 1 n n

  • i=1

d2 H(i, G)

Equivalent PCA criterion: maximize variance – Maximize variance of the projected points.

L´ eon Bottou 18/86 COS 424 – 3/23/2010

slide-19
SLIDE 19

First factorial axis

– Pick unit vector u. – Project the xi on the line of direction u. – Find u that maximizes the dispersion.

max

u

C(u) =

n

  • i=1
  • u⊤xi − u⊤¯

x 2

subject to

u⊤u = 1

The constraint means that u lives on the unit sphere. At the optimum u∗, the gradient

  • f the dispersion must be orthogonal to the surface of the sphere, otherwise we would

be able to find a better solution by slightly moving u∗ along the projection of the

  • gradient. Therefore there exist a “Lagrange multiplier” λ such that d C(u∗)

du

− λu∗ = 0.

This leads to the necessary condition

  • 1

n

  • i(xi − ¯

x)(xi − ¯ x)⊤ u∗ = λu∗ .

Therefore u∗ must be an eigenvector of the covariance matrix. The best one is associated with the largest eigenvalue. Orientation is arbitrary!

L´ eon Bottou 19/86 COS 424 – 3/23/2010

slide-20
SLIDE 20

Successive factorial axes

First factorial axis – Direction with maximal dispersion – Eigenvector of the covariance matrix Σ with the highest eigenvalue. Second factorial axis – Direction with maximal dispersion orthogonal to the first axis – Eigenvector of Σ with the second highest eigenvalue. Third factorial axis – Direction with maximal dispersion orthogonal to the first two axes – Eigenvector of Σ with the third highest eigenvalue.

  • etc. . .

Basis change – The basis formed by the p variables is replaced by the basis formed by the p principal axes.

L´ eon Bottou 20/86 COS 424 – 3/23/2010

slide-21
SLIDE 21

Factorial coordinates of the rows

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

L´ eon Bottou 21/86 COS 424 – 3/23/2010

slide-22
SLIDE 22

First factorial plane

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

L´ eon Bottou 22/86 COS 424 – 3/23/2010

slide-23
SLIDE 23

Approximate reconstruction of distances

  • Distance computed with the first two axes:

d2(Arcens, Arvie) ≈ (−0.142.02)2 + (1.41 + 1.81)2 = 15.03

  • Distance computed with the first three axes:

d2(Arcens, Arvie) ≈ (−0.142.02)2 + (1.41 + 1.81)2 + (−0.04 − 2.02)2 = 19.28

  • Distance computed with all eight axes:

d2(Arcens, Arvie) = (−0.142.02)2 + · · · + (0.01 − 0.07)2 = 21.93

Same as distance computed on the normed centered data.

L´ eon Bottou 23/86 COS 424 – 3/23/2010

slide-24
SLIDE 24

Factorial axes and factors

Two views of a factorial axis α – Factorial axis unit vector uα. – Factor ψαi =

p

  • j=1

uαj xij

– E(ψα) = 0 , Var(ψα) = λα .

  • !"

#

  • Ψ

Ψ

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

L´ eon Bottou 24/86 COS 424 – 3/23/2010

slide-25
SLIDE 25

Reconstruction

– On the factorial axis α

n

  • i=1

n

  • i′=1

d2

α(i, i′) = 2n n

  • i=1

d2

α(i, G) = 2n n

  • i=1

ψ2

αi = 2n2Var(ψα) = 2n2λα

– In the full space Rp

n

  • i=1

n

  • i′=1

d2(i, i′) =

p

  • α=1

n

  • i=1

n

  • i′=1

d2

α(i, i′) = 2n2 p

  • α=1

λα = 2n2p

– Percent reconstruction on the first axis: λ1

p .

– Percent reconstruction on the first two axes: λ1+λ2

p

.

L´ eon Bottou 25/86 COS 424 – 3/23/2010

slide-26
SLIDE 26

Computation of the PCA

Singular Value Decomposition of X = V D U⊤ – X: n × p data matrix of rank r. – V : n × r orthogonal matrix. – D: r × r diagonal matrix with elements (. . . √λα . . . ). – U : p × r orthogonal matrix. Row PCA – Σrow = X⊤X = U D V ⊤V D U⊤ = U D2 U⊤. – The unit vectors uα are the columns of U. – The factors ψα are the columns of X U = V D U⊤U = V D. Column PCA – Σcol = X X⊤ = V D U⊤U D V ⊤ = V D2 V ⊤. – The unit vectors vα are the columns of V . – The factors ϕα are the columns of X⊤V = U D V ⊤V = U D.

L´ eon Bottou 26/86 COS 424 – 3/23/2010

slide-27
SLIDE 27

Transition relations

Relation between row PCA and column PCA

  • The following relations can be derived from the SVD equations.

ϕα = 1 √λα X⊤ψα ψα = 1 √λα X ϕα uα = 1 √λα ϕα vα = 1 √λα ψα

Proof example – Let eα be a basis vector of Rr and write

Xϕα = (V D U⊤)(U D eα) = V D D eα = √λα V D eα = √λα ψα.

L´ eon Bottou 27/86 COS 424 – 3/23/2010

slide-28
SLIDE 28

Summary

Normalization – Correlation PCA: normalize mean and sdev (the default.) – Covariance PCA: normalize mean (sometimes.) PCA is a change of basis – First factorial axis: direction with maximal dispersion. – First factorial plane: plane with maximal dispersion. Factor α – Coordinates ψαi of all the observations on the axis α. Dispersion – Dispersion of the cloud on the first principal axis

= Variance of the first factor.

– Dispersion of the cloud on the first principal plane

= Sum of the variances of the first and second factors.

L´ eon Bottou 28/86 COS 424 – 3/23/2010

slide-29
SLIDE 29

Contributions

Contribution of an observation to an axis – How much an observation contributes to the definition of the axis?

CTRα(i) = ψ2

αi

n

i′=1 ψ2 αi′

Contribution of a variable to an axis – How much a variable contributes to the definition of the axis? – Same thing using the PCA of the column points.

CTRα(j) = ϕ2

αj

p

j′=1 ϕ2 αj′

Both types of contributions help interpreting the axes

L´ eon Bottou 29/86 COS 424 – 3/23/2010

slide-30
SLIDE 30

Contributions of observations

  • !

"

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

– St Yorre and Vichy-C´ elestins contribute most to the first axis. – Qu´ ezac, San Pellegrino, and Puits St-Georges contribute most to the second axis.

L´ eon Bottou 30/86 COS 424 – 3/23/2010

slide-31
SLIDE 31

Contributions of variables

  • !"#

$

– Bicarbonates, Fluorures, Sodium, and Potassium contribute most to the first axis. – Calcium and Magnesium contribute most to the 2nd axis.

L´ eon Bottou 31/86 COS 424 – 3/23/2010

slide-32
SLIDE 32

Simultaneously plotting rows and columns

Projecting the original axes – Project points ej = (. . . , 0, 1, 0, 0, . . . ),

i.e. the tips of the corresponding unit vectors.

ψα,[j] =

p

  • j′=1

1 I(j = j′) uαj′ = uαj

– Relation with the column PCA:

ψα,[j] = uαj = 1 √λα ϕαj

  • !"

#

  • Correlation between variables and factorial axes

– The projected original axes reveal, up to a factor √λα. the correlation between the factorial axes and the original variable. Correl(x·j, ψα)

√λα = Covar(x·j, ψα) λα = e⊤

j Σ uα

λα = e⊤

j λαuα

λα = uαj = ψα,[j]

L´ eon Bottou 32/86 COS 424 – 3/23/2010

slide-33
SLIDE 33

Simultaneously plotting rows and columns

– This particular plot uses a different scale for the points and the axes! The circle indicates the location of the “unit” sphere. – Compare with the column PCA (slide 13.)

L´ eon Bottou 33/86 COS 424 – 3/23/2010

slide-34
SLIDE 34

Continuous supplementary variables

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

Price axis – Use the correlation formula

ψα,[price] = Correl(price, ψα) √λα

Relation with the unit sphere

p

  • α=1

ψ2

α,[j] = 1 p

  • α=1

ψ2

α,[price] < 1

L´ eon Bottou 34/86 COS 424 – 3/23/2010

slide-35
SLIDE 35

Continuous supplementary variables

Price

– The price of a sparkling spring water bears little relation to its mineral content.

L´ eon Bottou 35/86 COS 424 – 3/23/2010

slide-36
SLIDE 36

Categorical supplementary variables

  • !""

#"$

  • %"!
  • &
  • '

'(

  • '!"
  • )*

Partition observation in groups – e.g. one group per region β. Per category barycenter

¯ ψα,[β] = 1 nβ

  • i∈β

ψαi

Per category variance ellipsoid

Σ[β] = 1 nβ

  • i∈β

(ψαi − ¯ ψα,[β])(ψαi − ¯ ψα,[β])⊤

Use the central limit theorem to ascertain whether a group effect is significative !

L´ eon Bottou 36/86 COS 424 – 3/23/2010

slide-37
SLIDE 37

Groups are often more interesting

Rhône-Alpes Auvergne

– The barycenter of the six “Rhˆ

  • nes-Alpes” springs is close to the origin.

– The barycenter of the five “Auvergne” springs is high on the first axis.

L´ eon Bottou 37/86 COS 424 – 3/23/2010

slide-38
SLIDE 38

Supplementary observations

What about Champagne?

Concentration Source Calcium 86 mg/l (Jos et. al., Talenta 63. 2004) Magnesium 83 mg/l (Jos et. al., Talenta 63. 2004) Potassium 339 mg/l (Jos et. al., Talenta 63. 2004) Bicarbonate 1229 mg/l (estimated: mean bicarbonates in water) Sulfates 80 mg/l (estimated: typical sulfites in wine) Fluorures 1.5 mg/l (estimated: mean fluorures in water) Sodium 10 mg/l (Jos et. al., Talenta 63. 2004) Nitrates 2 mg/l (estimated: mean nitrates in spring water)

Changes of basis

  • Ψ

Ψ

L´ eon Bottou 38/86 COS 424 – 3/23/2010

slide-39
SLIDE 39

Supplementary observations

Price

Champagne

– The price of Champagne is not related to its mineral content (surprise!) – How do we know if the Champage point is meaningful ?

L´ eon Bottou 39/86 COS 424 – 3/23/2010

slide-40
SLIDE 40

Squared cosines

Quality of the projection on a factorial axis

QLT(i) = cos2(a) = ψ2

αi

d2(i, G) = ψ2

αi

p

α′=1 ψ2 α′i

  • Quality of the projection on a factorial plane

– Squared cosines add up.

QLT2(i) = cos2(a) = ψ2

αi + ψ2 α′i

d2(i, G) = QLT(for axis α)(i) + QLT(for axis α′)(i)

eon Bottou 40/86 COS 424 – 3/23/2010

slide-41
SLIDE 41

Squared cosines

  • ! "#

$%% &%' %() *++ ,+- %'') .+* / 01 0 *!+' 23''

– Pyr´ en´ ees, Qu´ ezac, St-Yorre, Verni` ere, and Vichy-C´ elestins are well represented in the first factorial plane. – Champagne is far behind but not the worst.

L´ eon Bottou 41/86 COS 424 – 3/23/2010

slide-42
SLIDE 42

Squared cosines

Champagne

  • !
  • "
  • #$

L´ eon Bottou 42/86 COS 424 – 3/23/2010

slide-43
SLIDE 43

How many axes?

For visualisation – Always investigate at least the first three factorial axes. – Plot multiple factorial planes (12,23,13,. . . ) Heuristics – Search the “elbow” in the plot of decreasing eigenvalues. – Discard axes with eigenvalue smaller than 1. Stability – Evaluated with bootstrap procedures.

eon Bottou 43/86 COS 424 – 3/23/2010

slide-44
SLIDE 44

How many axes?

Percentages of variance can be misleading – Two examples with the same information but different noise levels. Original data. Same data with 10 additional random columns.

eon Bottou 44/86 COS 424 – 3/23/2010

slide-45
SLIDE 45

II. Correspondence Analysis

L´ eon Bottou 45/86 COS 424 – 3/23/2010

slide-46
SLIDE 46

Hair and eye colors

We know for 592 english women – the color of their eyes – the color of their hair.

  • !
  • "

Contingency table [xij]

xi• =

p

  • j=1

xij x•j =

n

  • i=1

xij x•• =

n

  • i=1

p

  • j=1

xij

L´ eon Bottou 46/86 COS 424 – 3/23/2010

slide-47
SLIDE 47

Row and column profiles

Rows and columns are best described by their histograms. Row profiles

  • !

!

  • rij = xij

xi• mi = xi• x•• cj = x•j x••

Column profiles

  • !
  • "

#

  • $
  • ij = xij

x•j mi = xi• x•• cj = x•j x••

L´ eon Bottou 47/86 COS 424 – 3/23/2010

slide-48
SLIDE 48

Row and column profiles

Row profiles

  • !

!

  • rij = xij

xi• mi = xi• x•• cj = x•j x••

Remarks – The “mass” indicates the relative importance of each row. – The “average row profile” is not the mean of the row profiles. – The “average row profile” is the weighted mean of the row profiles.

1 n

n

  • i=1

rij = cj

n

  • i=1

mi rij =

n

  • i=1

xi• x•• xij xi• = cj

– PCA on the row profiles?

L´ eon Bottou 48/86 COS 424 – 3/23/2010

slide-49
SLIDE 49

Centering the columns

Subtracting the average row profile

  • !
  • "

#

  • !!
  • !

! #

  • rij − cj = xij

xi• − x•j x••

– This is smarter than subtracting the column average. Aggregating two identical rows does not change the result. – This table is centered only if we take the masses into account. From now on we must always take the masses into account. For instance, when computing covariances.

L´ eon Bottou 49/86 COS 424 – 3/23/2010

slide-50
SLIDE 50

Rescaling the columns (1)

Standard deviation of the columns? – The standard deviation of a column is a bad measure. The difference between 43% and 44% is a small difference. The difference between 1% and 2% is a big difference. The binomial argument – The xij are counting events whose probability is roughly cj. – The standard deviation of the rij would then be

  • cj(1 − cj).

– The cj are usually well below one because

j cj = 1.

– Conclusion: divide the columns by √cj . – This will all make more sense later. . .

L´ eon Bottou 50/86 COS 424 – 3/23/2010

slide-51
SLIDE 51

Rescaling the columns (2)

Normalized row profiles

  • !
  • "

!

  • !
  • !
  • !
  • yij = rij − cj

√cj =

  • 1

cj xij xi• − x•j x••

  • Euclidian distance between two normalized rows

d(i, i′) =

p

  • j=1

(rij − ri′j)2 cj =

p

  • j=1

x•• x•j xij xi• − xi′j xi′• 2

– This is called the χ2 distance. – This will all make more sense later. . .

L´ eon Bottou 51/86 COS 424 – 3/23/2010

slide-52
SLIDE 52

Principal component analysis

– Compute the covariance matrix (weighted by the masses) – Diagonalize and project on the two first axes.

  • – This does not seem very useful. . .

L´ eon Bottou 52/86 COS 424 – 3/23/2010

slide-53
SLIDE 53

Principal component analysis

– Compute the covariance matrix (weighted by the masses) – Diagonalize and project on the two first axes.

  • – This does not seem very useful. . .
  • !"!#!"$%

L´ eon Bottou 53/86 COS 424 – 3/23/2010

slide-54
SLIDE 54

Placing the columns in the row space

eon Bottou 54/86 COS 424 – 3/23/2010

slide-55
SLIDE 55

Placing the columns in the row space

eon Bottou 55/86 COS 424 – 3/23/2010

slide-56
SLIDE 56

Placing the columns in the row space

eon Bottou 56/86 COS 424 – 3/23/2010

slide-57
SLIDE 57

Placing the rows in the column space

– Same thing with the columns, including centering and rescaling.

  • – Same shapes, different scale. . .

L´ eon Bottou 57/86 COS 424 – 3/23/2010

slide-58
SLIDE 58

Simultaneous representation

eon Bottou 58/86 COS 424 – 3/23/2010

slide-59
SLIDE 59

Summary

Contingency table n x p

}

n rows

}

p columns

Gp

p points in R

n

Gn

n points in R

p

Simultaneous representation of rows and columns

Center and rescale Center and rescale (quite differently!)

L´ eon Bottou 59/86 COS 424 – 3/23/2010

slide-60
SLIDE 60

Duality of the row and column analysis

Standard PCA – Center and rescale the columns (mean and sdev). – Diagonalize the row covariance of the normalized table. – Diagonalize the column covariance of the same normalized table. – Dual representation arise from the properties of diagonalization. Correspondence Analysis – Center and rescale the columns, diagonalize the row covariance. – Center and rescale the rows, diagonalize the column covariance. – Why do we get a dual representation?

L´ eon Bottou 60/86 COS 424 – 3/23/2010

slide-61
SLIDE 61

Duality of the row and column analysis

Weighted covariance – We diagonalize the weighted covariance matrix Σ = Y ⊤Dm Y where Y is the normalized row profile matrix and Dc = diag(m1 . . . mp). – We can write Σ = Z⊤Z with Z = D

1 2

m Y .

Divergence matrix Z for the row analysis

zij = mi cj xij xi• − x•j x••

  • =

xi• x•j xij xi• − x•j x••

  • =

xij − xi•x•j

x••

√xi•x•j =

xij x•• − micj

√micj

Divergence matrix Z for the column analysis

zij = cj mi xij x•j − xi• x••

  • =

x•j xi• xij x•j − xi• x••

  • =

xij − xi•x•j

x••

√xi•x•j =

xij x•• − micj

√micj

The dual representation exists because this is the same matrix.

L´ eon Bottou 61/86 COS 424 – 3/23/2010

slide-62
SLIDE 62

The χ2 test of independence

– The real data table:

  • xij
  • !
  • – The theoretical data table

assuming independence:

  • x•• × mi cj
  • !
  • "#
  • – The inertia

I =

  • ij

z2

ij =

  • ij

1 mi cj xij x•• − mi cj 2

measures how dependent are the rows and columns. – Correspondence analysis finds the axes that best display this dependence !

L´ eon Bottou 62/86 COS 424 – 3/23/2010

slide-63
SLIDE 63

The numerical recipe

Compute the divergence matrix

Z = xij

x•• − micj

√micj

  • Singular Value decomposition

Z = V D U⊤

with

D = diag(

  • λα)

Compute the factors

ψ = D−1

2

m V D

ϕ = D−1

2

c

U D

Transition relations

ψα = 1 √λα

p

  • j=1

xij xi• ϕαj ϕα = 1 √λα

n

  • i=1

xij x•j ψαi

L´ eon Bottou 63/86 COS 424 – 3/23/2010

slide-64
SLIDE 64

III. Multiple Correspondence Analysis

L´ eon Bottou 64/86 COS 424 – 3/23/2010

slide-65
SLIDE 65

Polling n subjects with a questionnaire

  • {

{

{

  • !"#$

%&%"%"%'

  • ( )

!"$

!%$ !%$ !%$

# *

L´ eon Bottou 65/86 COS 424 – 3/23/2010

slide-66
SLIDE 66

Burt table

Multiple contingency table

eon Bottou 66/86 COS 424 – 3/23/2010

slide-67
SLIDE 67

Encoding the Burt table

  • {

{

{

eon Bottou 67/86 COS 424 – 3/23/2010

slide-68
SLIDE 68

Multiple correspondence analysis

eon Bottou 68/86 COS 424 – 3/23/2010

slide-69
SLIDE 69

Transition relations for MCA

ϕαj = 1 √λα

  • Mean of coordinates ψαi for the

subjects who chose modality j

eon Bottou 69/86 COS 424 – 3/23/2010

slide-70
SLIDE 70

Transition relations for MCA

ψαi = 1 √λα

  • Mean of coordinates ϕαj for the

modalities selected by subject i

eon Bottou 70/86 COS 424 – 3/23/2010

slide-71
SLIDE 71

Essential properties

  • 1. A modality is the mean of the

subjects that selected it, up to the √λα coefficient.

  • 2. The weighted barycenter of the

modalities of a question is the

  • rigin.

eon Bottou 71/86 COS 424 – 3/23/2010

slide-72
SLIDE 72

Inertia

Matrix X contains only zeroes and ones

xi• = Q number of questions, x•• = nQ, mi = 1

n, cj = x•j nQ

Inertia for modality j :

Ij =

  • i

cj mi xij x•j − mi 2 = 1 Q

  • 1 − x•j

n

  • .

The inertia increases when few subjects pick the modality. Inertia for variable q :

Iq =

  • j∈q

Ij = 1 Q

  • j∈q
  • 1 − x•j

n

  • = pq − 1

Q

, where pq is the number of modalities for the question. The inertia increases when the variable has many modalities. Total inertia :

I =

  • q

Iq = p − 1 Q

.

L´ eon Bottou 72/86 COS 424 – 3/23/2010

slide-73
SLIDE 73

Practical consequences

Tricks that improve the MCA results

  • The inertia of a modality increases when few subjects pick it.

= ⇒ aggregate rare modalities, or make them supplementaries . = ⇒ bin continuous variables using quantiles.

  • The inertia of a question increases when it has many possible answers.

= ⇒ balance the number of modalities for all variables.

L´ eon Bottou 73/86 COS 424 – 3/23/2010

slide-74
SLIDE 74

Equivalences for the case of two variables

eon Bottou 74/86 COS 424 – 3/23/2010

slide-75
SLIDE 75

Supplementary elements

  • F

1

F2

  • • •
  • active elements

F1 F2

subjects

F

1

F2 x4 x1 x2 x3

variables categorical variables continuous

F1 F2

²

¤

² ² ² ¤ supplementary elements

supplementary elements active elements variables categorical

data table w R

x

x R R

™ ™ ™

t t t

variables categorical variables continuous

categorical variables

L´ eon Bottou 75/86 COS 424 – 3/23/2010

slide-76
SLIDE 76

IV. Application example: Semiometrie

L´ eon Bottou 76/86 COS 424 – 3/23/2010

slide-77
SLIDE 77

Semiometry

Introduced by J. F. Steiner in the 70s. Semiometrie is the use of words to describe lifestyles and values. Ludovic Lebart

L´ eon Bottou 77/86 COS 424 – 3/23/2010

slide-78
SLIDE 78

Semiometry

The basic idea is to insert in a marketing questionnaire a series of questions consisting uniquely of words.

Questionnaires in 5 languages

FRENCH ENGLISH GERMAN SPANISH ITALIAN l'absolu absolute absolut el absoluto l'assoluto l'acharnement persistence hartnaeckig el empeno l'accanimento acheter to buy kaufen comprar comprare admirer to admire bewundern admirar ammirare adorer to love anbeten adorar adorare l'ambition ambition der ehrgeiz la ambicion l'ambizione l'âme soul die seele el alma l'anima l'amitié friendship die freundschaft la amistad l'amicizia l'angoisse anguish die angst la angustia l'angoscia un animal animal ein tier un animal un animale un arbre tree ein baum un arbol un albero l'argent silver das geld el dinero il denaro une armure armour die ruestung una armadura un'armatura l'art art die kunst el arte l'arte

L´ eon Bottou 78/86 COS 424 – 3/23/2010

slide-79
SLIDE 79

Semiometry

The subjects must rate these words on a seven levels scale. – from most disagreeable or unpleasant – to most agreeable or pleasant

x x x

Facsimile of a questionnaire

L´ eon Bottou 79/86 COS 424 – 3/23/2010

slide-80
SLIDE 80

Principal component analysis

– The first axis is not interesting. It just orders words from “bad” to “good”. We already know which words are “bad” to “good”. – The next five axes are highly meaningful. They are robust across studies. They are robust across languages. They are robust across countries.

L´ eon Bottou 80/86 COS 424 – 3/23/2010

slide-81
SLIDE 81

Semiometric plane (2,3)

Duty Attachment Detachment Pleasure

L´ eon Bottou 81/86 COS 424 – 3/23/2010

slide-82
SLIDE 82

Semiometric plane (2,4)

Duty Mind Matter Pleasure

L´ eon Bottou 82/86 COS 424 – 3/23/2010

slide-83
SLIDE 83

Semiometric plane (2,5)

Duty Heart Reason Pleasure

L´ eon Bottou 83/86 COS 424 – 3/23/2010

slide-84
SLIDE 84

Semiometric plane (2,6)

Duty Humility Sovereignty Pleasure

L´ eon Bottou 84/86 COS 424 – 3/23/2010

slide-85
SLIDE 85

How is this useful?

Politics Plot groups of voters as supplementary variables – those who say they vote for you. – those who say they are undecided. – those who say they’ll never vote for you. – those who say they vote for someone else. . . Determine a target population of voters to convert. Read which keywords make them tick. . . Your competitors are also using the same methods. Your competitors are tracking your moves. The semiotic space is a political chess board.

L´ eon Bottou 85/86 COS 424 – 3/23/2010

slide-86
SLIDE 86

How is this useful?

Marketing Plot groups of customers as supplementary variables – those who buy your product again and again. – those who buy your product sometimes. – those who buy your competitors’ products. – etc. . . . Determine a target population of customers to convince. Read how to advertise most effectively. . . Your competitors are also using the same methods. Your competitors are tracking your moves. The semiotic space is a marketing chess board.

L´ eon Bottou 86/86 COS 424 – 3/23/2010