Three versions of manifolds Sayan Mukherjee Departments of - - PowerPoint PPT Presentation

three versions of manifolds
SMART_READER_LITE
LIVE PREVIEW

Three versions of manifolds Sayan Mukherjee Departments of - - PowerPoint PPT Presentation

Three versions of manifolds Sayan Mukherjee Departments of Statistical Science, Computer Science, Mathematics Institute for Genome Sciences & Policy, Duke University Joint work with: Part I F. Liang (UIUC), Q. Wu (MTSU), K. Mao (LYZ


slide-1
SLIDE 1

Three versions of manifolds

Sayan Mukherjee

Departments of Statistical Science, Computer Science, Mathematics Institute for Genome Sciences & Policy, Duke University Joint work with: Part I – F. Liang (UIUC), Q. Wu (MTSU), K. Mao (LYZ Capital) Part II – J. Guinney (Fred Hutchinson CC), Q. Wu (MTSU), D.X. Zhou (City University Hong Kong), M. Maggioni (Duke University) Part III – P.R. Hahn (University of Chicago)

October 3, 2011

slide-2
SLIDE 2

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

A play in three acts

(1) Bayesian model for supervised dimension reduction.

2,

slide-3
SLIDE 3

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

A play in three acts

(1) Bayesian model for supervised dimension reduction. (2) Geometric analysis for SDR based on gradients.

3,

slide-4
SLIDE 4

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

A play in three acts

(1) Bayesian model for supervised dimension reduction. (2) Geometric analysis for SDR based on gradients. (3) Generative model for manifold learning using Lie groups.

4,

slide-5
SLIDE 5

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Information and sufficiency

A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher and goes back to at least Adcock 1878 and Edgeworth 1884.

5,

slide-6
SLIDE 6

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Information and sufficiency

A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher and goes back to at least Adcock 1878 and Edgeworth 1884. X1, ..., Xn drawn iid form a Gaussian can be reduced to µ, σ2.

6,

slide-7
SLIDE 7

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Regression

Assume the model Y = f (X) + ε, I Eε = 0, with X ∈ X ⊂ Rp and Y ∈ R.

7,

slide-8
SLIDE 8

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Regression

Assume the model Y = f (X) + ε, I Eε = 0, with X ∈ X ⊂ Rp and Y ∈ R. Data – D = {(xi, yi)}n

i=1 iid

∼ ρ(X, Y ).

8,

slide-9
SLIDE 9

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Dimension reduction

If the data lives in a p-dimensional space X ∈ I Rp replace X with Θ(X) ∈ I Rd, p ≫ d.

9,

slide-10
SLIDE 10

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Dimension reduction

If the data lives in a p-dimensional space X ∈ I Rp replace X with Θ(X) ∈ I Rd, p ≫ d. My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.

10,

slide-11
SLIDE 11

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Dimension reduction

If the data lives in a p-dimensional space X ∈ I Rp replace X with Θ(X) ∈ I Rd, p ≫ d. My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold. ρX is concentrated on a manifold M ⊂ I Rp of dimension d ≪ p

11,

slide-12
SLIDE 12

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Supervised dimension reduction (SDR)

Given response variables Y1, ..., Yn ∈ I R and explanatory variables

  • r covariates X1, ..., Xn ∈ X ⊂ Rp

Yi = f (Xi) + εi, εi

iid

∼ N(0, σ2).

12,

slide-13
SLIDE 13

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Supervised dimension reduction (SDR)

Given response variables Y1, ..., Yn ∈ I R and explanatory variables

  • r covariates X1, ..., Xn ∈ X ⊂ Rp

Yi = f (Xi) + εi, εi

iid

∼ N(0, σ2). Is there a submanifold S ≡ SY |X such that Y ⊥ ⊥ X | PS(X) ?

13,

slide-14
SLIDE 14

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Linear projections capture nonlinear manifolds

In this talk PS(X) = BTX where B = (b1, ..., bd).

14,

slide-15
SLIDE 15

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Linear projections capture nonlinear manifolds

In this talk PS(X) = BTX where B = (b1, ..., bd). Semiparametric model Yi = f (Xi) + εi = g(bT

1 Xi, . . . , bT d Xi) + εi,

span B is the dimension reduction (d.r.) subspace.

15,

slide-16
SLIDE 16

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Show video

16,

slide-17
SLIDE 17

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Visualization of SDR

−20 20 50 100 −20 −10 10 20 x (a) Data y z 0.5 1 0.2 0.4 0.6 0.8 1 Dimension 1 Dimension 2 (b) Diffusion map −10 10 20 −20 −10 10 20 Dimension 1 Dimension 2 (c) GOP 0.5 1 0.2 0.4 0.6 0.8 1 Dimension 1 Dimension 2 (d) GDM −0.5 0.5 −0.5 0.5 −0.5 0.5 −0.5 0.5

17,

slide-18
SLIDE 18

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Principal components analysis (PCA)

Algorithmic view of PCA:

1 Given X = (X1, ...., Xn) a p × n matrix construct

ˆ Σ = (X − ¯ X)(X − ¯ X)T

18,

slide-19
SLIDE 19

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Principal components analysis (PCA)

Algorithmic view of PCA:

1 Given X = (X1, ...., Xn) a p × n matrix construct

ˆ Σ = (X − ¯ X)(X − ¯ X)T

2 Eigen-decomposition of ˆ

Σ λivi = ˆ Σvi.

19,

slide-20
SLIDE 20

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Probabilistic PCA

X ∈ I Rp is charterized by a multivariate normal X ∼ N(µ + Aν, ∆), ν ∼ N(0, Id) µ ∈ I Rp A ∈ I Rp×d ∆ ∈ I Rp×p ν ∈ I Rd.

20,

slide-21
SLIDE 21

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Probabilistic PCA

X ∈ I Rp is charterized by a multivariate normal X ∼ N(µ + Aν, ∆), ν ∼ N(0, Id) µ ∈ I Rp A ∈ I Rp×d ∆ ∈ I Rp×p ν ∈ I Rd. ν is a latent variable

21,

slide-22
SLIDE 22

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

SDR model

Semiparametric model Yi = f (Xi) + εi = g(bT

1 Xi, . . . , bT d Xi) + εi,

span B is the dimension reduction (d.r.) subspace.

22,

slide-23
SLIDE 23

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Principal fitted components (PFC)

Define Xy ≡ (X | Y = y) and specify multivariate normal distribution Xy ∼ N(µy, ∆), µy = µ + Aνy µ ∈ I Rp A ∈ I Rp×d νy ∈ I Rd.

23,

slide-24
SLIDE 24

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Principal fitted components (PFC)

Define Xy ≡ (X | Y = y) and specify multivariate normal distribution Xy ∼ N(µy, ∆), µy = µ + Aνy µ ∈ I Rp A ∈ I Rp×d νy ∈ I Rd. B = ∆−1A.

24,

slide-25
SLIDE 25

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Principal fitted components (PFC)

Define Xy ≡ (X | Y = y) and specify multivariate normal distribution Xy ∼ N(µy, ∆), µy = µ + Aνy µ ∈ I Rp A ∈ I Rp×d νy ∈ I Rd. B = ∆−1A. Captures global linear predictive structure. Does not generalize to clusters and manifolds.

25,

slide-26
SLIDE 26

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Mixture models and localization

A driving idea in manifold learning is that manifolds are locally Euclidean.

26,

slide-27
SLIDE 27

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Mixture models and localization

A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture ”nonparametric” distributions.

27,

slide-28
SLIDE 28

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Mixture models and localization

A driving idea in manifold learning is that manifolds are locally Euclidean. A driving idea in probabilistic modeling is that mixture models are flexible and can capture ”nonparametric” distributions. Mixture models can capture local nonlinear predictive manifold structure.

28,

slide-29
SLIDE 29

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Model specification

Xy ∼ N(µyx, ∆) µyx = µ + Aνyx νyx ∼ Gy Gy: density indexed by y having multiple clusters µ ∈ I Rp ε ∼ N(0, ∆) with ∆ ∈ I Rp×p A ∈ I Rp×d νxy ∈ I Rd.

29,

slide-30
SLIDE 30

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Dimension reduction space

Proposition For this model the d.r. space is the span of B = ∆−1A Y | X d = Y | (∆−1A)TX.

30,

slide-31
SLIDE 31

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Sampling distribution

Define νi ≡ νyixi. Sampling distribution for data xi | (yi, µ, νi, A, ∆) ∼ N(µ + Aνi, ∆) νi ∼ Gyi.

31,

slide-32
SLIDE 32

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Categorical response: modeling Gy

Y = {1, ..., C}, so each category has a distribution νi | (yi = k) ∼ Gk, c = 1, ..., C.

32,

slide-33
SLIDE 33

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Categorical response: modeling Gy

Y = {1, ..., C}, so each category has a distribution νi | (yi = k) ∼ Gk, c = 1, ..., C. νi modeled as a mixture of C distributions G1, ..., GC with a Dirichlet process model for ech distribution Gc ∼ DP(α0, G0).

33,

slide-34
SLIDE 34

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Likelihood

Lik({xi} | {yi}, θ) ≡ Lik({xi} | {yi}, A, ∆, ν1, ..., νn, µ) Lik({xi} | {yi}, θ) ∝ det(∆−1)

n 2 ×

exp

  • −1

2

n

  • i=1

(xi − µ − Aνi)T∆−1(xi − µ − Aνi)

  • .

34,

slide-35
SLIDE 35

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior inference

Given data Pθ ≡ Post(θ | data) ∝ Lik(θ | data) × π(θ).

35,

slide-36
SLIDE 36

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior inference

Given data Pθ ≡ Post(θ | data) ∝ Lik(θ | data) × π(θ).

1 Pθ provides estimate of (un)certainty on θ 36,

slide-37
SLIDE 37

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior inference

Given data Pθ ≡ Post(θ | data) ∝ Lik(θ | data) × π(θ).

1 Pθ provides estimate of (un)certainty on θ 2 Requires prior on θ 37,

slide-38
SLIDE 38

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior inference

Given data Pθ ≡ Post(θ | data) ∝ Lik(θ | data) × π(θ).

1 Pθ provides estimate of (un)certainty on θ 2 Requires prior on θ 3 Sample from Pθ ? 38,

slide-39
SLIDE 39

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Sampling from the posterior

Inference consists of drawing samples θ(t) = (µ(t), A(t), ∆−1

(t), ν(t))

from the posterior.

39,

slide-40
SLIDE 40

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Sampling from the posterior

Inference consists of drawing samples θ(t) = (µ(t), A(t), ∆−1

(t), ν(t))

from the posterior. Define θ/µ

(t)

≡ (A(t), ∆−1

(t), ν(t))

θ/A

(t)

≡ (µ(t), ∆−1

(t), ν(t))

θ/∆−1

(t)

≡ (µ(t), A(t), ν(t)) θ/ν

(t)

≡ (µ(t), A(t), ∆−1

(t)).

40,

slide-41
SLIDE 41

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Gibbs sampling

Conditional probabilities can be used to sample µ, ∆−1, A µ(t+1) |

  • data, θ/µ

(t)

No

  • data, θ/µ

(t)

  • ,

41,

slide-42
SLIDE 42

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Gibbs sampling

Conditional probabilities can be used to sample µ, ∆−1, A µ(t+1) |

  • data, θ/µ

(t)

No

  • data, θ/µ

(t)

  • ,

∆−1

(t+1) |

  • data, θ/∆−1

(t)

InvWishart

  • data, θ/∆−1

(t)

  • 42,
slide-43
SLIDE 43

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Gibbs sampling

Conditional probabilities can be used to sample µ, ∆−1, A µ(t+1) |

  • data, θ/µ

(t)

No

  • data, θ/µ

(t)

  • ,

∆−1

(t+1) |

  • data, θ/∆−1

(t)

InvWishart

  • data, θ/∆−1

(t)

  • A(t+1) |
  • data, θ/A

(t)

No

  • data.θ/A

(t)

  • .

43,

slide-44
SLIDE 44

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Gibbs sampling

Conditional probabilities can be used to sample µ, ∆−1, A µ(t+1) |

  • data, θ/µ

(t)

No

  • data, θ/µ

(t)

  • ,

∆−1

(t+1) |

  • data, θ/∆−1

(t)

InvWishart

  • data, θ/∆−1

(t)

  • A(t+1) |
  • data, θ/A

(t)

No

  • data.θ/A

(t)

  • .

Sampling ν(t) is more involved.

44,

slide-45
SLIDE 45

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior draws from the Grassmann manifold

Given samples (∆−1

(t), A(t))m t=1 compute B(t) = ∆−1 (t)A(t).

45,

slide-46
SLIDE 46

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior draws from the Grassmann manifold

Given samples (∆−1

(t), A(t))m t=1 compute B(t) = ∆−1 (t)A(t).

Each B(t) is a subspace which is a point in the Grassmann manifold G(d,p). There is a Riemannian metric on this manifold. This has two implications.

46,

slide-47
SLIDE 47

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior mean and variance

Given draws (B(t))m

t=1 the posterior mean and variance should be

computed with respect to the Riemannian metric.

47,

slide-48
SLIDE 48

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior mean and variance

Given draws (B(t))m

t=1 the posterior mean and variance should be

computed with respect to the Riemannian metric. Given two subspaces W and U spanned by orthonormal bases W and V the Karcher mean is (I − X(X TX)−1X T)Y (X TY )−1 = UΣV T Θ = atan(Σ) dist(W, V) =

  • Tr(Θ2).

48,

slide-49
SLIDE 49

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior mean and variance

The posterior mean subspace BBayes = arg min

B∈G(d,p) m

  • i=1

dist(Bi, B).

49,

slide-50
SLIDE 50

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior mean and variance

The posterior mean subspace BBayes = arg min

B∈G(d,p) m

  • i=1

dist(Bi, B). Uncertainty var({B1, · · · , Bm}) = 1 m

m

  • i=1

dist(Bi, BBayes).

50,

slide-51
SLIDE 51

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Swiss roll

X1 = t cos(t), X2 = h, X3 = t sin(t), X4,...,10

iid

∼ N(0, 1) where t = 3π

2 (1 + 2θ), θ ∼ U(0, 1), h ∼ U(0, 1) and

Y = sin(5πθ) + h2 + ε, ε ∼ N(0, 0.01).

51,

slide-52
SLIDE 52

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Pictures

52,

slide-53
SLIDE 53

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Metric

Projection of the estimated d.r. space ˆ B = (ˆ b1, · · · , ˆ bd) onto B 1 d

d

  • i=1

||PBˆ bi||2 = 1 d

d

  • i=1

||(BBT)ˆ bi||2

53,

slide-54
SLIDE 54

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Comparison of algorithms

200 300 400 500 600 0.4 0.5 0.6 0.7 0.8 0.9 1 Sample size Accuracy BMI BAGL SIR LSIR PHD SAVE 54,

slide-55
SLIDE 55

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Posterior variance

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 Boxplot for the Distances

55,

slide-56
SLIDE 56

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Error as a function of d

1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 error number of e.d.r. directions 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 error number of e.d.r. directions

56,

slide-57
SLIDE 57

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Digits

5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25

57,

slide-58
SLIDE 58

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Two classification problems

3 vs. 8 and 5 vs. 8.

58,

slide-59
SLIDE 59

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Two classification problems

3 vs. 8 and 5 vs. 8. 100 training samples from each class.

59,

slide-60
SLIDE 60

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

BMI

3 versus 8 5 versus 8

−3 −2 −1 1 2 3 4 5 x 10

−4

−4 −3 −2 −1 1 2 3 4 5 6 x 10

−4

60,

slide-61
SLIDE 61

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

All ten digits

digit Nonlinear Linear 0.04(± 0.01) 0.05 (± 0.01) 1 0.01(± 0.003) 0.03 (± 0.01) 2 0.14(± 0.02) 0.19 (± 0.02) 3 0.11(± 0.01) 0.17 (± 0.03) 4 0.13(± 0.02) 0.13 (± 0.03) 5 0.12(± 0.02) 0.21 (± 0.03) 6 0.04(± 0.01) 0.0816 (± 0.02) 7 0.11(± 0.01) 0.14 (± 0.02) 8 0.14(± 0.02) 0.20 (± 0.03) 9 0.11(± 0.02) 0.15 (± 0.02) average 0.09 0.14

Table: Average classification error rate and standard deviation on the digits data.

61,

slide-62
SLIDE 62

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Cancer classification

n = 38 samples with expression levels for p = 7129 genes or ests 19 samples are Acute Myeloid Leukemia (AML) 19 are Acute Lymphoblastic Leukemia, these fall into two subclusters – B-cell and T-cell.

62,

slide-63
SLIDE 63

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Substructure captured

−5 5 10 x 10

4

−10 −5 5 x 10

4

63,

slide-64
SLIDE 64

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold.

64,

slide-65
SLIDE 65

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold. (2) Mixtures of subspaces.

65,

slide-66
SLIDE 66

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold. (2) Mixtures of subspaces. (3) Differential geometry between subspaces of different dimensions.

66,

slide-67
SLIDE 67

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold. (2) Mixtures of subspaces. (3) Differential geometry between subspaces of different dimensions. (4) Implications for covariance matrix estimation.

67,

slide-68
SLIDE 68

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold. (2) Mixtures of subspaces. (3) Differential geometry between subspaces of different dimensions. (4) Implications for covariance matrix estimation. (5) Randomized algorithms for massive data (n, p ≈ 106).

68,

slide-69
SLIDE 69

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Introduction Unsupervised dimension reduction Likelihood based SDR Results on data Challenges

Challenges

(1) Direct inference on the Grassman manifold. (2) Mixtures of subspaces. (3) Differential geometry between subspaces of different dimensions. (4) Implications for covariance matrix estimation. (5) Randomized algorithms for massive data (n, p ≈ 106). (6) Mixtures of manifolds.

69,

slide-70
SLIDE 70

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Recall SDR model

Semiparametric model Yi = f (Xi) + εi = g(bT

1 Xi, . . . , bT d Xi) + εi,

span B is the dimension reduction (d.r.) subspace.

70,

slide-71
SLIDE 71

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Recall SDR model

Semiparametric model Yi = f (Xi) + εi = g(bT

1 Xi, . . . , bT d Xi) + εi,

span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρX is concentrated on a manifold M ⊂ I Rp of dimension d ≪ p.

71,

slide-72
SLIDE 72

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Gradients and outer products

Given a smooth function f the gradient is ∇f (x) =

  • ∂f (x)

∂x1 , ..., ∂f (x) ∂xp

T .

72,

slide-73
SLIDE 73

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Gradients and outer products

Given a smooth function f the gradient is ∇f (x) =

  • ∂f (x)

∂x1 , ..., ∂f (x) ∂xp

T . Define the gradient outer product matrix Γ Γij =

  • X

∂f ∂xi (x) ∂f ∂xj (x)dρ

X (x),

Γ = I E[(∇f ) ⊗ (∇f )].

73,

slide-74
SLIDE 74

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

GOP captures the d.r. space

Suppose y = f (X) + ε = g(bT

1 X, ..., bT d X) + ε.

74,

slide-75
SLIDE 75

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

GOP captures the d.r. space

Suppose y = f (X) + ε = g(bT

1 X, ..., bT d X) + ε.

Note that for B = (b1, ..., bd) λibi = Γbi.

75,

slide-76
SLIDE 76

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

GOP captures the d.r. space

Suppose y = f (X) + ε = g(bT

1 X, ..., bT d X) + ε.

Note that for B = (b1, ..., bd) λibi = Γbi. For i = 1, .., d ∂f (x) ∂bi = bT

i (∇f (x)) = 0 ⇒ bT i Γbi = 0.

If w ⊥ bi for all i then wTΓw = 0.

76,

slide-77
SLIDE 77

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Statistical interpretation

Linear case y = βTx + ε, ε iid ∼ N(0, σ2). Ω = cov (I E[X|Y ]), ΣX = cov (X), σ2

Y = var (Y ). 77,

slide-78
SLIDE 78

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Statistical interpretation

Linear case y = βTx + ε, ε iid ∼ N(0, σ2). Ω = cov (I E[X|Y ]), ΣX = cov (X), σ2

Y = var (Y ).

Γ = σ2

Y

  • 1 − σ2

σ2

Y

2 Σ−1

X ΩΣ−1 X

≈ σ2

Y Σ−1 X ΩΣ−1 X . 78,

slide-79
SLIDE 79

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Statistical interpretation

For smooth f (x) y = f (x) + ε, ε iid ∼ No(0, σ2). Ω = cov (I E[X|Y ]) not so clear.

79,

slide-80
SLIDE 80

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi

80,

slide-81
SLIDE 81

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi Ωi = cov (I E[Xχi |Yχi ])

81,

slide-82
SLIDE 82

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi Ωi = cov (I E[Xχi |Yχi ]) Σi = cov (Xχi )

82,

slide-83
SLIDE 83

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi Ωi = cov (I E[Xχi |Yχi ]) Σi = cov (Xχi ) σ2

i

= var (Yχi )

83,

slide-84
SLIDE 84

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi Ωi = cov (I E[Xχi |Yχi ]) Σi = cov (Xχi ) σ2

i

= var (Yχi ) mi = ρ

X (χi). 84,

slide-85
SLIDE 85

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Nonlinear case

Partition into sections and compute local quantities X =

I

  • i=1

χi Ωi = cov (I E[Xχi |Yχi ]) Σi = cov (Xχi ) σ2

i

= var (Yχi ) mi = ρ

X (χi).

Γ ≈

I

  • i=1

mi σ2

i Σ−1 i

Ωi Σ−1

i

.

85,

slide-86
SLIDE 86

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Estimating the gradient

Taylor expansion yi ≈ f (xi) ≈ f (xj) + ∇f (xj), xj − xi ≈ yj + ∇f (xj), xj − xi if xi ≈ xj.

86,

slide-87
SLIDE 87

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Estimating the gradient

Taylor expansion yi ≈ f (xi) ≈ f (xj) + ∇f (xj), xj − xi ≈ yj + ∇f (xj), xj − xi if xi ≈ xj. Let f ≈ ∇f the following should be small

  • i,j

wij(yi − yj − f (xj), xj − xi)2, wij =

1 sp+2 exp(−xi − xj2/2s2) enforces xi ≈ xj.

87,

slide-88
SLIDE 88

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Estimating the gradient

The gradient estimate

  • fD = arg min
  • f ∈Fp

  1 n2

n

  • i,j=1

wij

  • yi − yj − (

f (xj))T(xj − xi) 2 + λJ( f )   where J( f ) is a smoothness penalty.

88,

slide-89
SLIDE 89

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Estimates on manifolds

Mrginal distribution ρ

X is concentrated on a compact Riemannian manifold M ∈ I

Rd with isometric embedding ϕ : M → Rp and metric d M and dµ is the uniform measure on M. Assume regular distribution (i) The density ν(x) =

dρ X (x) dµ

exists and is H¨

  • lder continuous (c1 > 0 and 0 < θ ≤ 1)

|ν(x) − ν(u)| ≤ c1 dθ M(x, u) ∀x, u ∈ M. (ii) The measure along the boundary is small: (c2 > 0) ρ

M

x ∈ M : d M(x, ∂M) ≤ t ≤ c2 t ∀t > 0. 89,

slide-90
SLIDE 90

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements Learning gradients Estimating gradients Rates of convergence

Convergence to gradient on manifold

Theorem Under above regularity conditions on ρ

X and f ∈ C 2(M), with

probability 1 − δ (dϕ)∗ fD − ∇Mf 2

L2

ρ M ≤ C log

1 δ n−

k1 2d+k2

  • .

where (dϕ)∗ (projection onto tangent space) is the dual of the map dϕ and k1, k2 are smoothness constants.

90,

slide-91
SLIDE 91

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Handwritten twos

91,

slide-92
SLIDE 92

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Stochastic model generating digits

Given observations of series of handwritten digits can we build a stochastic process that generates digits ?

92,

slide-93
SLIDE 93

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Parameterizing the model

Each zi = xi yi

  • ∈ R2 and indexed by an unseen ti,

x(ti) y(ti)

  • = f (ti) + εi =

f1(ti) f2(ti)

  • +

εxi εyi

  • .

93,

slide-94
SLIDE 94

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Parameterizing the model

Each zi = xi yi

  • ∈ R2 and indexed by an unseen ti,

x(ti) y(ti)

  • = f (ti) + εi =

f1(ti) f2(ti)

  • +

εxi εyi

  • .

We dependence between f1(t) = f (t; θ1), f2 = f (t; θ2) f (t; θ) = f (t(θ)) := f (α + βt), θ = {α, β}, so the parameters θ belong to a group (shifts and scales), Lie group structure.

94,

slide-95
SLIDE 95

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

The stochastic model

Gaussian process for mapping t → Z f ∼ GP(µ(t), φ, K), mean function µ(t) scale parameter φ covariance operator K.

95,

slide-96
SLIDE 96

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

The stochastic model

Gaussian process for mapping t → Z f ∼ GP(µ(t), φ, K), mean function µ(t) scale parameter φ covariance operator K. Denote Z := (x1 · · · xN y1 · · · yN)T t := (α1 + β1t1 · · · α1 + β1tN α2 + β2t1 · · · α2 + β2tN)T .

96,

slide-97
SLIDE 97

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

The stochastic model

Gaussian process for mapping t → Z f ∼ GP(µ(t), φ, K), mean function µ(t) scale parameter φ covariance operator K. Denote Z := (x1 · · · xN y1 · · · yN)T t := (α1 + β1t1 · · · α1 + β1tN α2 + β2t1 · · · α2 + β2tN)T . Then Z | t ∼ MVN(µω(t), φ, K), Kgh := exp(−φ(tg−th)2), ω, α, β, φ.

97,

slide-98
SLIDE 98

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Generator of a semi-group

Given the stochastic model we specified we can define the following generator of the semi-group Dθ1,θ2 = lim

∆→0

  • f (t; θ1)

f (t; θ2)

f (t + ∆; θ1) f (t + ∆; θ2)

. We should be able to numerically compute this.

98,

slide-99
SLIDE 99

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Acknowledgements

Partha Niyogi, Mike West, Carlos Carvalho, Natesh Pillai, Merlise Clyde

99,

slide-100
SLIDE 100

Supervised dimension reduction Geometric analysis for SDR Generative model for manifold learning Acknowledgements

Acknowledgements

Partha Niyogi, Mike West, Carlos Carvalho, Natesh Pillai, Merlise Clyde Funding: Center for Systems Biology at Duke NSF DMS and CCF AFOSR NIH

100,