Low rank Interaction Log-linear Model for Contingency Table Analysis - - PowerPoint PPT Presentation

low rank interaction log linear model for contingency
SMART_READER_LITE
LIVE PREVIEW

Low rank Interaction Log-linear Model for Contingency Table Analysis - - PowerPoint PPT Presentation

Low rank Interaction Log-linear Model for Contingency Table Analysis Genevi` eve Robin Ecole Polytechnique genevieve.robin@polytechnique.edu advisors: Julie Josse and Eric Moulines February 16, 2017 Genevi` eve Robin (Polytechnique)


slide-1
SLIDE 1

Low rank Interaction Log-linear Model for Contingency Table Analysis

Genevi` eve Robin

´ Ecole Polytechnique genevieve.robin@polytechnique.edu advisors: Julie Josse and ´ Eric Moulines

February 16, 2017

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 1 / 39

slide-2
SLIDE 2

Overview

1

Motivations

2

Generalized additive main effects & multiplicative interaction thresholded (GAMMIT) statistical guarantees

  • ptimization algorithm

3

Automatic selection of the regularization parameter cross validation quantile universal threshold

4

Experiments

5

Data analyses

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 2 / 39

slide-3
SLIDE 3

Motivations

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 3 / 39

slide-4
SLIDE 4

High dimensional count data

Single-cell RNA sequencing (counts of genes in cells) Image processing (number of photons on a grid) Ecological data (abundance of species across environments) Y =   2 17 5 4 1 3 9 23 7 7 2   ∈ Nm1×m2 Yij counts occurrences of (i, j) Yij independent. Estimate E[Yij] = µij ⇒ Denoise and visualize data

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 4 / 39

slide-5
SLIDE 5

Ecological data

Alop.alpi Alch.pent Geum.mont Pote.aure Sali.herb AR26 2 2 AR08 1 2 1 AR05 3 3 AR06 3 AR69 1 2 2 2 AR32 2 3 3 1 AR40 2 3 3 4

Table: Excerpt of Aravo dataset. 82 species of plants across 75 environments in the French Alps (Dray and Dufour, 2007).

How do species interact with environments ?

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 5 / 39

slide-6
SLIDE 6

Log-linear model

Observation matrix Y ∈ Nm1×m2, E[Yij] = µij, Xij := log(µij). Xij = αi + βj + Θij (1) αi effect of i-th environment βj effect of j-th species Θij interaction between i-th environment and j-th species

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 6 / 39

slide-7
SLIDE 7

Log-linear model

Observation matrix Y ∈ Nm1×m2, E[Yij] = µij, Xij := log(µij). Xij = αi + βj + Θij (1) αi effect of i-th environment βj effect of j-th species Θij interaction between i-th environment and j-th species Θ has rank K < min(m1 − 1, m2 − 1) Xij = αi + βj + (UDV ⊤)ij, UDV ⊤, the SVD of Θ.

(RC model, Goodman, 1985; log-bilinear model, Falguerolle, 1998; GAMMI, Gower, 2011)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 6 / 39

slide-8
SLIDE 8

Log-linear model with known covariates

Environment characteristics, species traits are known.

Aspect Slope Form PhysD ZoogD Snow AR26 5 3 20 no 140 AR08 8 20 3 60 some 160 AR05 9 10 4 20 high 150 AR06 8 20 3 40 high 160 AR69 8 30 2 30 high 160 AR32 8 10 5 20 some 160 AR40 8 15 4 10 some 180 Height Spread Angle Area Thick SLA N mass Seed Alop.alpi 5.00 20 20 190.90 0.20 15.10 203.85 0.21 Poa.alpi 8.00 15 45 160.00 0.18 10.70 204.37 0.32 Alch.pent 2.00 20 15 218.10 0.16 23.70 364.98 0.31 Geum.mont 5.00 10 15 852.60 0.20 11.30 223.74 1.67 Plan.alpi 0.50 10 20 40.00 0.22 11.90 242.76 0.33 Pote.aure 3.00 20 15 264.50 0.10 17.50 253.75 0.24 Sali.herb 1.00 50 60 82.50 0.18 14.70 367.50 0.05

Figure: Environment (left) an species (right) covariates for Aravo data (excerpt)

Xij = (αR)ij + (Cβ)ij + Θij (2) Known parameters: R ∈ RK2×m2 matrix of row covariates, C ∈ Rm1×K1 matrix of column covariates Unknown parameters: α ∈ Rm1×K2, β ∈ RK1×m2, Θij

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 7 / 39

slide-9
SLIDE 9

Generalized additive main effects and multiplicative interaction thresholded (GAMMIT)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 8 / 39

slide-10
SLIDE 10

Model

We can re-write model X = αR + Cβ + Θ as X = X0 + Θ, X0 ∈ V, Θ ∈ V⊥, (3) V1 linear span of C, V2 linear span of R; Π1 orthogonal projection on V1, Π2 orthogonal projection on V2; V = {X ∈ Rm1×m2; X = Π1X + XΠ2 − Π1XΠ2}; T orthogonal projector on V⊥; ⇒ Main effects X0 = Π1X + XΠ2 − Π1XΠ2, ⇒ Interaction Θ = T (X)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 9 / 39

slide-11
SLIDE 11

Low-rank interaction log-linear model

Penalized negative Poisson quasi-log-likelihood for λ > 0 (relaxation of the rank constraint) Φλ

Y (X)

= −(m1m2)−1

m1

  • i=1

m2

  • j=1

(YijXij − exp(Xij)) + λ T (X)σ,1 ˆ X λ = argmin

X∈K

Φλ

Y (X),

(4) K = [ ¯ γ, ¯ γ]m1×m2, ¯ γ > 0, ¯ γ < ∞ compact set. ⇒ We recover ˆ Θλ = T ( ˆ X λ),

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 10 / 39

slide-12
SLIDE 12

Statistical guarantees

¯ X true parameter matrix, ¯ Xij = log(E[Yij]).

Theorem

Assume ¯ µ ≤ E[Yij] ≤ ¯ µ, ¯ µ > 0, ¯ µ < ∞ and λ ≥ 2

  • ∇ΦY ( ¯

X)

  • σ,∞, then
  • ˆ

X λ − ¯ X

  • 2

σ,2

m1m2 ≤ λ2/ ¯ µ2m1m2

  • 18 rk T ( ¯

X) + K1 + K2

  • ,

(5) K1, K2 number of column and row covariates. Strong convexity of ΦY ΦY ( ˆ X λ) + λ

  • T ( ˆ

X λ)

  • σ,1 ≤ ΦY ( ¯

X) + λ

  • T ( ¯

X)

  • σ,1

Klopp et al. (2015)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 11 / 39

slide-13
SLIDE 13

Statistical guarantees

Theorem (Risk bound)

Assume ¯ µ ≤ E[Yij] ≤ ¯ µ, ¯ µ > 0, ¯ µ < ∞, ¯ σ2 ≤ Var(Yij) ≤ ¯ σ2, ¯ σ2, ¯ σ2 < ∞; There exists δ > 0, such that for all i, j, E[exp (|Yij|/δ)] < ∞; m1 + m2 ≥ max

  • δ2(2¯

σ2 ¯ σ2)−1, (4δ2/¯ σ2)4 . Set λ = 2cδ¯ σ

  • 2(m1 ∨ m2) log(m1 + m2)/(m1m2). Then
  • ¯

X − ˆ Xλ

  • 2

σ,2

m1m2 ≤ 4cδ¯ σ2/ ¯ µ2 (m1 + m2)

  • 18 rk T ( ¯

X) + K1 + K2

  • log(m1 + m2)

m1m2 , (6) with probability at least 1 − (m1 + m2)−1, where cδ is a numerical constant that depends only on δ.

Improves rate of Cao and Xie (2016)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 12 / 39

slide-14
SLIDE 14

Optimization problem

ΦY (X) = −(m1m2)−1

m1

  • i=1

m2

  • j=1

(YijXij − exp(Xij)) Reparametrize ˆ X λ = argmin

X∈K

ΦY (X) + λ T (X)σ,1 in ˆ X λ = argmin

X∈K, U∈KT

ΦY (X) + λ Uσ,1 s.t. T (X) = U, (7) where KT image of K by T is compact. (7) is a separable, linearly constrained, strongly convex program on a compact set. ⇒ admits a unique solution.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 13 / 39

slide-15
SLIDE 15

Alternating direction method of multipliers (ADMM)

Augmented Lagrangian indexed by τ, Γ dual variable: Lτ (X, U, Γ) = ΦY (X) + λ Uσ,1 + Γ, T (X) − U + τ 2 T (X) − U2

2 .

At iteration k + 1 ADMM update rules are given by X k+1 = argminX∈K Lτ

  • X, Uk, Γk

Uk+1 = argminU∈KT Lτ

  • X k+1, U, Γk

Γk+1 = Γk + τ

  • T (X k+1) − Uk+1

.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 14 / 39

slide-16
SLIDE 16

Update rules

X update: gradient descent X k+1 = argminX∈K ΦY (X) + λ

  • Uk
  • σ,1 + Γk, T (X) − Uk

+ τ 2

  • T (X) − Uk
  • 2

2

∇XLτ

  • X, Uk, Γk

= ∇ΦY (X) + Γk + τ

  • T (X) − Uk

.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 15 / 39

slide-17
SLIDE 17

Update rules

X update: gradient descent X k+1 = argminX∈K ΦY (X) + λ

  • Uk
  • σ,1 + Γk, T (X) − Uk

+ τ 2

  • T (X) − Uk
  • 2

2

∇XLτ

  • X, Uk, Γk

= ∇ΦY (X) + Γk + τ

  • T (X) − Uk

. U update: closed form Uk+1 = Dλ/τ

  • T (X k+1) + Γk/τ
  • ,

Dλ/τ operator for soft-thresholding of singular values at level λ/τ (Cai et al., 2010).

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 15 / 39

slide-18
SLIDE 18

Alternating Direction Method of Multipliers (ADMM)

Convergence is guaranteed by (Boyd et al., 2011, Theorem 3.2.1) Warm start strategy (Hastie et al., 2015) Run the algo using λ0 such that T ( ˆ X λ0) = 0 For decreasing values of λ initialize with previous run

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 16 / 39

slide-19
SLIDE 19

Automatic selection of λ

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 17 / 39

slide-20
SLIDE 20

Cross validation

⇒ EM algorithm Y = (Yobs, Ymis). At iteration k E step EYmis[Φλ

(Yobs,Ymis)(X)|Yobs; ˆ

X λk] Y k+1

  • bs

= Yobs Y k+1

mis

= E[Ymis; ˆ X λk] = exp( ˆ X λ

mis k).

M step ˆ X λk+1 = argmax EYmis[Φλ

(Yobs,Ymis)(X)|Yobs; ˆ

X λk] Run the ADMM algorithm

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 18 / 39

slide-21
SLIDE 21

Cross-validation

Remove a fraction of the entries of Y Compute ˆ X λ

mis for all λ with EM

Compute

  • exp( ˆ

X λ

mis) − Ymis

  • 2

2 for each λ

Repeat N times Select λCV = argmin

λ

1/N N

i=1

  • exp( ˆ

X λ

mis (i)) − Ymis

  • 2

2.

⇒ time consuming

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 19 / 39

slide-22
SLIDE 22

Quantile universal threshold (QUT)

CV designed to minimize prediction error. What about selecting the rank ? (number of non-zero singular values) Extend the work of Giacobino et al. (2016) on zero-thresholding statistics.

Theorem (Zero-thresholding statistics)

The interaction estimator T ( ˆ Xλ) associated to regularization parameter λ is null if and only if λ ≥ λ0(Y ), where λ0 is the zero-thresholding statistics given by λ0(Y ) = (m1m2)−1

  • T (Y − exp( ˆ

X0))

  • σ,∞ ,

where ˆ X0 = argmin

X∈K, T (X)=0

ΦY (X).

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 20 / 39

slide-23
SLIDE 23

Quantile universal threshold (QUT)

Example: Xij = αi + βj + Θij, ˆ X0 can be computed in closed form

( ˆ X0)ij = 1 m1

m1

  • i=1

log(

m2

  • j=1

Yij) + 1 m2

m2

  • j=1

log(

m1

  • i=1

Yij) − log(

m1

  • i=1

m2

  • j=1

Yij) + log(

m2

  • j=1

Yij) − 1 m1

m1

  • i=1

log(

m2

  • j=1

Yij) + log(

m1

  • i=1

Yij) − 1 m2

m2

  • j=1

log(

m1

  • i=1

Yij), if no row/column of Y is exclusively 0. λ0(Y ) = (m1m2)−1

  • T (Y − exp( ˆ

X0))

  • σ,∞ .

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 21 / 39

slide-24
SLIDE 24

Thresholding test

Test null hypothesis H0 : T ( ¯ X) = 0 against H1 : T ( ¯ X) = 0 T ( ˆ X λ) = 0 ⇔ λ0(Y ) ≤ λ PH0(T ( ˆ X λ) = 0) = PH0(λ0(Y ) ≤ λ). For 0 < ǫ < 1, PH0(λ0(Y ) ≤ λε) > 1 − ε then the test φ(Y ) =

  • 1

if T ( ˆ X λε) = 0

  • therwise,

defines a test of level 1 − ε for H0.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 22 / 39

slide-25
SLIDE 25

Quantile universal threshold

In practice PH0(λ0(Y ) < λε) is unknown. Compute ˆ X0 Generate M1 Poisson matrices Yℓ ∼ P(exp( ˆ X0)), 1 ≤ ℓ ≤ M1 For all ℓ compute λ0(Yℓ) Set λQUT to the 1 − ε quantile of the λ0(Yℓ). ⇒ Monte Carlo simulation of the distribution of the largest singular value

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 23 / 39

slide-26
SLIDE 26

Experiments

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 24 / 39

slide-27
SLIDE 27

Existing method

⇒ RC model Simulation under model Xij = αi + βj + (UDV ⊤)ij Estimation through maximization of a Poisson quasi-log-likelihood ΦY (X) = (m1m2)−1

m1

  • i=1

m2

  • j=1

(YijXij − exp(Xij)) ˆ X MLE = argmax ΦY (X) s.t. rk(Θ) = K Implemented in the R package gnm (Turner and Firth, 2015) Requires to know the rank K Fails with large values of K, m1, m2

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 25 / 39

slide-28
SLIDE 28

Choice of λ

Figure: L2 loss (black triangles) of ADMM estimator for λ ∈ [1e − 4, 0.2] m1 = 20, m2 = 15, K = 3. Comparison of λCV (cyan dashed line) and λQUT (red dashed line) with the independence model RC(0) (purple squares) and the MLE with oracle rank RC(K) (blue points).

⇒ Two-step approach

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 26 / 39

slide-29
SLIDE 29

Regularization grids

Figure: 50 × 20 matrices. Comparison of the L2 error of GAMMIT (black triangles) with the independence model (purple squares), the rank oracle RC(K) model (blue points) and the RC(KQUT) (green diamonds). Results are drawn for a grid of λ with λQUT (red dashed line). The rank of the interaction is written on the top axis for every λ. K = 2, SNR = 0.2, 0.7, 1.7 (left to right).

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 27 / 39

slide-30
SLIDE 30

Regularization grids

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 28 / 39

slide-31
SLIDE 31

Thresholding test

size rank SNR 0.2 0.3 0.4 0.5 0.7 1.7 50 × 20 2 0 (0) 57 (2) 99 (2) 100 (2) 100 (2) 0 (4) 50 × 20 10 0 (0) 0 (0) 0 (1) 0 (1) 0 (2) 0 (5) Table: Number of times λQUT returned the right rank over 100 simulations for different interaction intensities. The mode of the estimated rank is given between parenthesis.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 29 / 39

slide-32
SLIDE 32

Thresholding test

N chisq thresh 13 1.00 1.00 673 0.95 0.96 4537 0.95 0.95 89556 0.95 0.94 990027 0.95 0.95

Table: Comparison of the levels of the thresholding and χ2 tests for M1 = M2 = 1e5.

⇒ needs further investigation (power, etc.)

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 30 / 39

slide-33
SLIDE 33

Data analyses

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 31 / 39

slide-34
SLIDE 34

Mortality data

Crosses 65 causes of death over 12 age categories in 2006 in France. Use GAMMIT for biplot visualization ¯ Xij = αi + βj + ˜ Ui,. ˜ V.,j = ˜ αi + ˜ βj − 1 2˜ UT

i,. − ˜

V.,j2, (8) ˜ U = UD1/2 and ˜ V = VD1/2. Represent data points on axis (˜ U, ˜ V ): two close points interact highly.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 32 / 39

slide-35
SLIDE 35

Mortality data

Figure: Visualization of the 10 largest interactions between age categories (red) and mortality causes (blue) in the two first dimensions of interaction with the RC(3) model (left) and GAMMIT (right).

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 33 / 39

slide-36
SLIDE 36

Aravo data

Crosses 82 species and 75 environments Environments and species covariates are known ⇒ Compare the results of GAMMIT with Xij = αi + βj + Θij and Xij = (αR)ij + (Cβ)ij + Θij.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 34 / 39

slide-37
SLIDE 37

Aravo data

Figure: Correlation between environment (left) and species (middle) covariates with the two first GAMMIT dimensions. Scatterplot (right) in the two first dimensions of interaction.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 35 / 39

slide-38
SLIDE 38

Aravo data

Figure: Visualization of the 10 largest interactions between environments (blue) and species (red) in the two first dimensions of interaction with GAMMIT for row-column indices (left) and explanatory covariates (right).

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 36 / 39

slide-39
SLIDE 39

Conclusion

Summary Low-rank model for contingency table analysis with known covariates Statistical guarantees, optimization algorithm, automatic choice of λ Visualization and interpretation through biplots Perspectives Adaptive regularization of singular values (Josse and Sardy, 2015) Add regularization of X0 Use GAMMIT to impute contingency tables Other sparsity inducing penalties Define a pivotal test statistic for QUT test

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 37 / 39

slide-40
SLIDE 40

Thank you for your attention !

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 38 / 39

slide-41
SLIDE 41

References

Boyd, S., N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1), 1–22. Cai, J.-F., E. J. Cand` es, and Z. Shen (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20(4), 1956–1982. Cao, Y. and Y. Xie (2016, March). Poisson matrix recovery and

  • completion. IEEE Transactions on Signal Processing 64(6).

Dray, S. and A. Dufour (2007). The ade4 package: implementing the duality diagram for ecologists. Journal of Statistical Software 22(4), 1–20. Giacobino, C., S. Sardy, J. Diaz Rodriguez, and N. Hengardner (2016). Quantile universal threshold for model selection. arXiv:1511.05433v2. Hastie, T., R. Mazumder, J. Lee, and R. Zadeh (2015, January). Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research 16, 3367–3402. Josse, J. and S. Sardy (2015). Adaptive shrinkage of singular values.

Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 39 / 39