HERE http://www.ted.com/talks/john_francis_walks_the_e arth.html - - PowerPoint PPT Presentation

here
SMART_READER_LITE
LIVE PREVIEW

HERE http://www.ted.com/talks/john_francis_walks_the_e arth.html - - PowerPoint PPT Presentation

THANK YOU FOR BEING HERE http://www.ted.com/talks/john_francis_walks_the_e arth.html Welcome to the 6 th CARME conference HMFA ;) The applied mathematics department Jrme PAGES : Professeur, Directeur du laboratoire Marine


slide-1
SLIDE 1

“THANK YOU FOR BEING HERE”

http://www.ted.com/talks/john_francis_walks_the_e arth.html

slide-2
SLIDE 2

Welcome to the 6th CARME conference

HMFA ;)

slide-3
SLIDE 3

The applied mathematics department

Jérôme PAGES : Professeur, Directeur du laboratoire

Marine CADORET : Maître de conférences contractuelle

David CAUSEUR : Professeur

Thibaut DUTRION : Ingénieur d’étude

Magalie HOUEE : Ingénieur d’étude

François HUSSON : Maître de conférences

Julie JOSSE : Maître de Conférences

Sébastien LÊ : Maître de conférences

Marie VERBANCK : Doctorante

Elisabeth LENAULD, Karine BAGORY : Secrétaires

slide-4
SLIDE 4

TODAY’S TUTORIALS

The tutorials of the day

slide-5
SLIDE 5

Tutorials

 Lê Sébastien: From one to multiple data tables with

FactoMineR

 Dray Stéphane: Multivariate analysis of ecological

data with ade4

 Mair Patrick, de Leeuw Jan: Multidimensional scaling

using majorization with smacof

 Nenadic Oleg, Greenacre Michael: Correspondence

analysis with ca

slide-6
SLIDE 6

What’s is common to all those presentations?

slide-7
SLIDE 7

I READ THE NEWS TODAY OH BOY…

The Beatles A day in the life

slide-8
SLIDE 8

Data Analysts Captivated by R’s Power January 6, 2009 from the New York Times

 “The popularity of R at universities could threaten

SAS Institute (…)”

slide-9
SLIDE 9

R You Ready for R? January 8, 2009 from the New York Times

 “Intel Capital has placed the number of R users at 1

million, and Revolution kicks the estimate all the way up to 2 million.”

slide-10
SLIDE 10

What is R?

 R

is a free software environment for statistical computing and graphics. (http://www.r-project.org/)

 R is a freely available language and environment for

statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. (http://cran.r-project.org/)

 R was designed by Ross Ihaka and Robert Gentleman

slide-11
SLIDE 11

How R became a must in a decade?

 Its economic model: it’s free.  Mr. Ihaka said: “We could have chosen to be

commercial, and we would have sold five copies of the software.”

slide-12
SLIDE 12

How R became a must in a decade?

 The snowball effect: a figurative term for a process

that starts from an initial state of small significance and builds upon itself, becoming larger (graver, more serious), and perhaps potentially dangerous

  • r

disastrous (a vicious circle, a "spiral of decline"), though it might be beneficial instead (a virtuous circle) (wikipedia)

slide-13
SLIDE 13

The snowball effect

Version Packages Date 1.3 110 21/06/2001 1.4 129 17/12/2001 1.5 161 12/06/2002 1.6 163 1.7 219 25/05/2003 1.8 273 16/11/2003 1.9 357 05/06/2004 2.0 406 12/10/2004 2.1 548 18/06/2005 2.2 647 16/12/2005 2.3 739 31/05/2006 2.4 911 12/12/2006 2.5 1000 12/04/2007 2.6 1300 16/11/2007 2.7 1495 18/03/2008 2.8 1614 20/10/2008 2.9 1907 17/04/2009 2.10 2008 26/10/2009

slide-14
SLIDE 14

Why create an R package? (P. Rossi)

There are three good reasons:

 Creating an R package forces you to document your code

and provide test examples to insure that it actually works. It will also be much easier to use your code as documentation will only be a ? command away and all of your functions and shared libraries will be available for use.

 If your goal is disseminate your research, this is an ideal

way of making sure others have access to your work. It will also increase the probability that eventually your work will be correct. You will also learn more about the properties of your research ideas through the experience of others.

 Ease your sense of guilt by giving back something to this

amazing community of volunteers!

slide-15
SLIDE 15

The R packages

slide-16
SLIDE 16

The R packages

 There’s an old bulgarian proverb that says

« всеки проблем има R решение »

 In other words « each problem has its R package»

slide-17
SLIDE 17

The « sudoku » package

slide-18
SLIDE 18

http://factominer.free.fr

Journal of statistical software FactoMineR: an R package for multivariate analysis

The « FactoMineR » package

slide-19
SLIDE 19

FROM MULTIVARIATE TO MULTIPLE TABLES DATA ANALYSIS...AN OVERVIEW

Sébastien Lê and Jérôme Pagès

slide-20
SLIDE 20

Objectives

 To understand what can be expected from multiway

data methods

 To understand the motivations and the framework of

Canonical Analysis (CA)

 To understand Generalized Canonical Analysis

(GCA)

 To be able to place Multiple Factor Analysis vs.

GCA

slide-21
SLIDE 21

Outline

 What you know  What you want to do and why you want to do it  How to do it  How to do it with R

slide-22
SLIDE 22

The data

 40 mice  2 genotypes (wild, PPARα-deficient)  5 diets (dha, efad, lin, ref, tsol)  120 genes (expression)  21 hepatic fatty acid concentration  Thanks to Sandrine Lagarrigue, Genetics

department-INRA Rennes, for the availability

  • f the data
slide-23
SLIDE 23

PPARα

 In the field of molecular biology, the peroxisome

proliferator-activated receptors (PPARs) are a group

  • f nuclear receptor proteins that function as

transcription factors regulating the expression of genes.

 PPARs play essential roles in the regulation of cellular

differentiation, development, and metabolism (carbohydrate, lipid, and protein) of higher

  • rganisms.

 PPARα are expressed in liver, kidney, heart, muscle,

adipose tissue, and others

slide-24
SLIDE 24

The diets

 dha: diet rich in fatty acids of the Omega 3 family

and particularly docosahexaenoic acid (DHA), based

  • n fish oil;

 efad (Essential Fatty Acid Deficient): diet based on of

saturated fatty acids only, made from hydrogenated coconut oil;

 lin: diet rich in Omega 3, made from linseed oil;  ref: regime whose contribution in Omega 6 and

Omega 3 is adapted for the French population, seven times more Omega 6 than Omega 3;

 tsol: diet rich in Omega 6, based on sunflower oil.

slide-25
SLIDE 25

Issues

slide-26
SLIDE 26

MY FIRST PCA

with supplementary qualitative variables (and FactoMineR)

slide-27
SLIDE 27

The dataset

X

slide-28
SLIDE 28

X

The dataset

Σ=r(i,j)

M(n,p) M(p,p)

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

MY SECOND PCA

with supplementary qualitative and quantitative variables (and FactoMineR)

slide-35
SLIDE 35

The dataset(s)

X1 X2

slide-36
SLIDE 36

The dataset(s)

X1 X2

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

The dataset(s)

X1 X2

slide-43
SLIDE 43

The dataset(s)

X1 X2

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49

The dataset(s)

X1 X2

M(n,p) M(n,p1) M(n,p2)

Σ11 Σ22 Σ12 Σ21

M(p1,p1) M(p1,p2) M(p2,p1) M(p2,p2)

Σ12

slide-50
SLIDE 50

Close Encounters of the Third Kind

X1 X2

slide-51
SLIDE 51

FROM MULTIVARIATE TO MULTIPLE TABLES DATA ANALYSIS

slide-52
SLIDE 52

Rashomon (1950, Kurosawa)

slide-53
SLIDE 53

Rashomon (1950, Kurosawa)

 The film depicts the rape of a woman and the

apparent murder of her husband through the widely differing accounts of four witnesses, including the rapist and, through a medium, the dead man.

 The stories are mutually contradictory, leaving the

viewer to determine which, if any, is the truth.

slide-54
SLIDE 54

Objectives underlying the study of several groups

  • f variables

1 I K1 Kj KJ X1 Xj XJ

Individuals Variables

slide-55
SLIDE 55

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation

slide-56
SLIDE 56

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation To balance the influence of each group in a simultaneous analysis

slide-57
SLIDE 57

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation To search for factors that are common to the group of variables

slide-58
SLIDE 58

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation To compare the factors of several groups of variables

slide-59
SLIDE 59

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation Two groups are all the more close that they induce the same structure

slide-60
SLIDE 60

Objectives underlying the study of several groups

  • f variables

Weighting of the variables

Looking for common factors

Comparison of factors

Overall representation of groups

Superimposed representation An individual is all the more “homogenous” that its superimposed representations are close

slide-61
SLIDE 61

WEIGHTING OF THE VARIABLES

slide-62
SLIDE 62

On the interest of balancing the influence of each group of variables

By analogy with the individuals: weighting sample surveys, balanced data

Same weight for each variable of a given group

 Number of variables  Structure of each group

slide-63
SLIDE 63

Reference example

RI set 1 : 2 var. set 2 : 3 var.

On the interest of balancing the influence of each group of variables

slide-64
SLIDE 64

PCA of the 5 variables, without considering the sets

1st principal component RI set 1 : 2 var. set 2 : 3 var.

Reference example

On the interest of balancing the influence of each group of variables

slide-65
SLIDE 65

Balancing the sets by the total “inertia”

RI set 1 : 2 var. set 2 : 3 var.

0.5 0.5 0.3 0.3 0.3

Reference example

On the interest of balancing the influence of each group of variables

slide-66
SLIDE 66

Each variable of the set j is weighted by 1/1

j

1

j: 1st eigenvalue of PCA applied to set j.

Balancing the sets of variables in MFA

RI set 1 : 2 var. set 2 : 3 var.

0.5 0.5 1 1 1

Reference example

On the interest of balancing the influence of each group of variables

slide-67
SLIDE 67

On the interest of balancing the influence of each group of variables

 For each group the variance of the main axis of

variability is equal to 1

 No group can generate all by itself the first global

axis

 A “multidimensional” group will contribute to the

construction of more axes than a “one-dimensional” group

 This weighting is a specific characteristic of MFA; it

induces many properties described later

slide-68
SLIDE 68

MFA is based on a “factorial analysis” applied to all active sets of variables

Weighted factorial analysis

slide-69
SLIDE 69

De facto: MFA beneficiates from the transition formulae and from the duality between individuals and variables.

MFA is based on a “factorial analysis” applied to all active sets of variables

Weighted factorial analysis

slide-70
SLIDE 70

Quantitative variables: MFA is based on a weighted PCA standardized variables unstandardized variables mixed Equivalence When each set is composed by 1 quantitative variable: MFA=PCA

MFA is based on a “factorial analysis” applied to all active sets of variables

Weighted factorial analysis

slide-71
SLIDE 71

MFA provides: Firstly: classical results of factorial analysis For each axis: Co-ordinates, contributions and squared cosines of individuals Correlation coefficients between factors and continuous variables

Weighted factorial analysis

MFA is based on a “factorial analysis” applied to all active sets of variables

slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74
slide-75
SLIDE 75
slide-76
SLIDE 76

SETTING UP COMMON FACTORS

slide-77
SLIDE 77

Looking for factors common to TWO sets of variables

X1 X2

Reference method: Canonical Analysis Hotelling, 1936

slide-78
SLIDE 78

Looking for factors common to TWO sets of variables

 The word “canonical” comes from the Greek κανών

/ kanôn that means “ruler”

 The purpose of Canonical analysis is to find the

relationship between two groups of variables

 It works by finding two linear combinations of

variables, one for each group, which are most highly correlated

 Hotelling, H. (1936) Relations between two sets of

  • variables. Biometrika, 28, 321-377
slide-79
SLIDE 79

Looking for factors common to TWO sets of variables

A factor common to two clouds?

A B C A B C

slide-80
SLIDE 80

Looking for factors common to TWO sets of variables

A factor common to two clouds!

A B C A B C

slide-81
SLIDE 81

Looking for factors common to TWO sets of variables

RI

span by variables of set 1 span by variables of set 2

Looking for jointly linear combinations of variables of sets 1 and 2

slide-82
SLIDE 82

cancor(lip,gen)

Beware: canonical variables L1 = 0.028c18.1.n-9+0.032c18.1.n-7+…+0.012c18.3.n-3 G1 = 0.51PMDCI+0.63THIOL+…-0.30CYP4A14

slide-83
SLIDE 83

G1 = 0.51PMDCI+0.63THIOL+…+0.26LPIN-0.27LPIN1+…-0.30CYP4A14 r(LPIN,LPIN1) = 0.97

cancor(lip,gen)

slide-84
SLIDE 84

Looking for factors common to SEVERAL sets of variables

slide-85
SLIDE 85

Generalized Canonical Analysis

X1 X2 XJ

slide-86
SLIDE 86

Generalized Canonical Analysis

X1 X2 XJ

s t z z z K z R z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

2

slide-87
SLIDE 87

Generalized Canonical Analysis

Generalized canonical analysis (Carroll, 1968) For each step s: Firstly: general variable (related to all the sets of variables) Secondly: canonical variables (linear combination of variables of sets j related to general variable)

2

maximises ( , )

s j j

F R z K

R²(z, Kj): determination coefficient

RI

Ej span by

variables of set j

Fs

j

Fs Ej

projection of Fs on Ej

Fs

j

slide-88
SLIDE 88

Generalized Canonical Analysis

s t z z z K z R z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

2

K1 K2 KJ

K1 K2 KJ

V1 V2 V3

slide-89
SLIDE 89

K1 K2 KJ

K1 K2 KJ

V1 V2 V3 EJ

s t z z z K z R z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

2

Generalized Canonical Analysis

slide-90
SLIDE 90

K1 K2 KJ

K1 K2 KJ

V1 V2 V3 EJ zs

s t z z z K z R z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

2

Generalized Canonical Analysis

slide-91
SLIDE 91

K1 K2 KJ

K1 K2 KJ

V1 V2 V3 EJ zs

s t z z z K z R z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

2

Generalized Canonical Analysis

slide-92
SLIDE 92

Multiple Factor Analysis

X1 X2 XJ

slide-93
SLIDE 93

Multiple Factor Analysis

X1 X2 XJ

s t z z z K z Lg z

t s s j j s s

   

, ) , Cor( and 1 ) Var( max is ) , ( /

slide-94
SLIDE 94

A measure of relationship between

  • ne variable z

a set of variables Kj = {vk ; k = 1, Kj} = projected inertia of the whole set of the variables vk onto z Case of standardized variables weighted in a MFA

2 1

1 ( , ) ( , )

j

j k j k K

Lg z K r z v 

( , ) 1

j

Lg z K   ( , )

j

Lg z K In every case, owing to the weighting of MFA:

Multiple Factor Analysis

slide-95
SLIDE 95

Multiple Factor Analysis

K1 K2 KJ

K1 K2 KJ

 

j

K k j j j

k z Cor z

  • n

projected K

  • f

Inertia K z Lg ) , ( 1 ) , (

2 1

j j

K z K z Lg

  • f

comp. princ. 1 the is 1 ) , (

st

 

s t z z z z

  • n

projected K

  • f

Inertia z

t s s j s j s

   

, ) , Cor( and 1 ) Var( max is /

slide-96
SLIDE 96

Multiple Factor Analysis

V1 V2 V3 zs EJ

K1 K2 KJ

K1 K2 KJ

… s t z z z z

  • n

projected K

  • f

Inertia z

t s s j s j s

   

, ) , Cor( and 1 ) Var( max is /

slide-97
SLIDE 97

Multiple Factor Analysis

V1 V2 V3 zs EJ

K1 K2 KJ

K1 K2 KJ

… s t z z z z

  • n

projected K

  • f

Inertia z

t s s j s j s

   

, ) , Cor( and 1 ) Var( max is /

slide-98
SLIDE 98

SUPERIMPOSED REPRESENTATION OF THE J CLOUDS OF INDIVIDUALS

slide-99
SLIDE 99

Superimposed representation of the J clouds of individuals

1 j J 1 K1 1 Kj 1 KJ 1 i1 ij iJ i I

NI

j : partial cloud (of individuals; relatively to the set j)

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

slide-100
SLIDE 100

How to compare clouds representing the same objects but in different spaces ? Reference method: Procrustes analysis (Green, 1952; Gower, 1975)

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

Superimposed representation of the J clouds of individuals

slide-101
SLIDE 101

Superimposed representation of the J clouds of individuals

 Procrustes was a character of Greek myth. An

innkeeper who plied his trade in Attica, he put his victims on an iron bed. If they were longer than the bed, he cut off their feet. If they were shorter, he stretched them…

slide-102
SLIDE 102

Superimposed representation of the J clouds of individuals

Make the configurations fit each other

 do this by moving them to a common origin  stretch or shrink each configuration in order to make it

fit as good as possible

 if needed, flip them around

slide-103
SLIDE 103

Superimposed representation of the J clouds of individuals

slide-104
SLIDE 104

Superimposed representation of the J clouds of individuals

Geometrical framework

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

1) NI

j must be well represented

2) The J points representing the same individual must be close to one another

slide-105
SLIDE 105

Superimposed representation of the J clouds of individuals

Geometrical framework

j

K K

R R  

NI

j partial cloud

NI mean cloud

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

i NI

j

RKj RK RK1 ij NI

1

i1 NI

slide-106
SLIDE 106

Superimposed representation of the J clouds of individuals

The partial clouds are projected onto the principal components of the mean cloud

Principle

i us NI

j

RKj RK RK1 ij NI

1

i1 NI

slide-107
SLIDE 107

Superimposed representation of the J clouds of individuals

 The superimposed representation and the canonical

variables provided by MFA express a very same problematic since they both correspond to the same solution to two apparently different problems

slide-108
SLIDE 108

Superimposed representation of the J clouds of individuals

Usual transition relationship in PCA 1 ( ) ( )

s ik s k K s

F i x G k 

Fs(i) coordinate of i along the axis s Gs(k) coordinate of variable k along the axis s s eigenvalue associated to the axis s xik data (value of k for i)

F1 F2 1 2 F1 F2 A B C

slide-109
SLIDE 109

Superimposed representation of the J clouds of individuals

Usual transition relationship in PCA 1 ( ) ( )

s ik s k K s

F i x G k 

1

1 1 ( ) ( )

j

s ik s j j J k K s

F i x G k  

 

 

Usual transition relationship applied to the mean cloud in MFA If the variable k has the weight mk 1 ( ) ( )

s ik k s k K s

F i x m G k 

Partial transition relationship

1

1 ( ) ( )

j

j s ik s j k K s

J F i x G k  

1 ( ) ( )

j s s j J

F i F i J

 

slide-110
SLIDE 110
slide-111
SLIDE 111
slide-112
SLIDE 112

GLOBAL REPRESENTATION OF SETS OF VARIABLES

slide-113
SLIDE 113

Global representation of sets of variables

1 j J 1 K1 1 Kj 1 KJ 1 i1 ij iJ i I

NI

j : partial cloud (of individuals ; associated to the set j)

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

slide-114
SLIDE 114

Global representation of sets of variables

1 j J 1 l I 1 l I 1 l I 1 1 1 i W1(i,l) i Wj(i,l) i WJ(i,l) I I I

Matrices of scalar products between individuals for each set of variables

j j j

W X X 

How to measure the global resemblance of the NI

j ?

RKj NI

j

ij RKJ NI

J

iJ RK1 i1 NI

1

slide-115
SLIDE 115

Global representation of sets of variables

RI RI² NK

j

NJ Wj

Data Scalar products

1 Kj 1 l I 1 1 i xik  i Wj(i,l) I I

slide-116
SLIDE 116

Global representation of sets of variables

RI² NJ Wj

Studying the cloud NJ Reference method: STATIS (Escoufier Y., Lavit C.)

slide-117
SLIDE 117

Global representation of sets of variables

ws: W associated to vs Inertia of NK

j projected upon vs

co-ordinate of Wj upon ws

Data Scalar products

1 Kj 1 l I 1 1 i xik  i Wj(i,l) I I

RI RI² NK

j

NJ Wj vs ws

slide-118
SLIDE 118
slide-119
SLIDE 119

“THANK YOU FOR BEING HERE”

http://www.ted.com/talks/john_francis_walks_the_e arth.html