Low rank Interaction Log-linear Model for Contingency Table Analysis - PowerPoint PPT Presentation

Low rank Interaction Log-linear Model for Contingency Table Analysis Genevi` eve Robin ´ Ecole Polytechnique genevieve.robin@polytechnique.edu advisors: Julie Josse and ´ Eric Moulines February 16, 2017 Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 1 / 39

Overview Motivations 1 Generalized additive main effects & multiplicative interaction 2 thresholded (GAMMIT) statistical guarantees optimization algorithm Automatic selection of the regularization parameter 3 cross validation quantile universal threshold Experiments 4 Data analyses 5 Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 2 / 39

Motivations Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 3 / 39

High dimensional count data Single-cell RNA sequencing (counts of genes in cells) Image processing (number of photons on a grid) Ecological data (abundance of species across environments)  2 0 17 5   ∈ N m 1 × m 2 Y = 4 1 3 9  23 7 7 2 Y ij counts occurrences of ( i , j ) Y ij independent. Estimate E [ Y ij ] = µ ij ⇒ Denoise and visualize data Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 4 / 39

Ecological data Alop.alpi Alch.pent Geum.mont Pote.aure Sali.herb AR26 0 0 2 2 0 AR08 1 0 2 1 0 AR05 0 0 3 3 0 AR06 0 0 3 0 0 AR69 1 0 2 2 2 AR32 2 0 3 3 1 AR40 2 3 3 4 0 Table: Excerpt of Aravo dataset. 82 species of plants across 75 environments in the French Alps (Dray and Dufour, 2007). How do species interact with environments ? Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 5 / 39

Log-linear model Observation matrix Y ∈ N m 1 × m 2 , E [ Y ij ] = µ ij , X ij := log( µ ij ). X ij = α i + β j + Θ ij (1) α i effect of i -th environment β j effect of j -th species Θ ij interaction between i -th environment and j -th species Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 6 / 39

Log-linear model Observation matrix Y ∈ N m 1 × m 2 , E [ Y ij ] = µ ij , X ij := log( µ ij ). X ij = α i + β j + Θ ij (1) α i effect of i -th environment β j effect of j -th species Θ ij interaction between i -th environment and j -th species Θ has rank K < min( m 1 − 1 , m 2 − 1) X ij = α i + β j + ( UDV ⊤ ) ij , UDV ⊤ , the SVD of Θ. (RC model, Goodman, 1985; log-bilinear model, Falguerolle, 1998; GAMMI, Gower, 2011) Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 6 / 39

Log-linear model with known covariates Environment characteristics, species traits are known. Aspect Slope Form PhysD ZoogD Snow Height Spread Angle Area Thick SLA N mass Seed AR26 5 0 3 20 no 140 Alop.alpi 5.00 20 20 190.90 0.20 15.10 203.85 0.21 AR08 8 20 3 60 some 160 Poa.alpi 8.00 15 45 160.00 0.18 10.70 204.37 0.32 AR05 9 10 4 20 high 150 Alch.pent 2.00 20 15 218.10 0.16 23.70 364.98 0.31 AR06 8 20 3 40 high 160 Geum.mont 5.00 10 15 852.60 0.20 11.30 223.74 1.67 AR69 8 30 2 30 high 160 Plan.alpi 0.50 10 20 40.00 0.22 11.90 242.76 0.33 Pote.aure 3.00 20 15 264.50 0.10 17.50 253.75 0.24 AR32 8 10 5 20 some 160 Sali.herb 1.00 50 60 82.50 0.18 14.70 367.50 0.05 AR40 8 15 4 10 some 180 Figure: Environment (left) an species (right) covariates for Aravo data (excerpt) X ij = ( α R ) ij + ( C β ) ij + Θ ij (2) Known parameters: R ∈ R K 2 × m 2 matrix of row covariates, C ∈ R m 1 × K 1 matrix of column covariates Unknown parameters: α ∈ R m 1 × K 2 , β ∈ R K 1 × m 2 , Θ ij Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 7 / 39

Generalized additive main effects and multiplicative interaction thresholded (GAMMIT) Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 8 / 39

Model We can re-write model X = α R + C β + Θ as X = X 0 + Θ, (3) X 0 ∈ V , Θ ∈ V ⊥ , V 1 linear span of C , V 2 linear span of R ; Π 1 orthogonal projection on V 1 , Π 2 orthogonal projection on V 2 ; V = { X ∈ R m 1 × m 2 ; X = Π 1 X + X Π 2 − Π 1 X Π 2 } ; T orthogonal projector on V ⊥ ; ⇒ Main effects X 0 = Π 1 X + X Π 2 − Π 1 X Π 2 , ⇒ Interaction Θ = T ( X ) Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 9 / 39

Low-rank interaction log-linear model Penalized negative Poisson quasi-log-likelihood for λ > 0 (relaxation of the rank constraint) m 1 m 2 Φ λ = − ( m 1 m 2 ) − 1 � � Y ( X ) ( Y ij X ij − exp( X ij )) + λ �T ( X ) � σ, 1 (4) i =1 j =1 X λ ˆ Φ λ = argmin Y ( X ) , X ∈K γ ] m 1 × m 2 , K = [ γ, ¯ γ > 0, ¯ γ < ∞ compact set. ¯ ¯ ⇒ We recover ˆ Θ λ = T ( ˆ X λ ), Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 10 / 39

Statistical guarantees X true parameter matrix, ¯ ¯ X ij = log( E [ Y ij ]). Theorem � ∇ Φ Y ( ¯ � � Assume µ ≤ E [ Y ij ] ≤ ¯ µ , µ > 0 , ¯ µ < ∞ and λ ≥ 2 X ) σ, ∞ , then � ¯ ¯ 2 � � X λ − ¯ � ˆ X � � � σ, 2 ≤ λ 2 / µ 2 m 1 m 2 18 rk T ( ¯ � � X ) + K 1 + K 2 , (5) m 1 m 2 ¯ K 1 , K 2 number of column and row covariates. Strong convexity of Φ Y � � Φ Y ( ˆ � T ( ˆ σ, 1 ≤ Φ Y ( ¯ � T ( ¯ X λ ) + λ X λ ) � � X ) + λ X ) � � � σ, 1 � Klopp et al. (2015) Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 11 / 39

Statistical guarantees Theorem (Risk bound) Assume σ 2 ≤ Var( Y ij ) ≤ ¯ σ 2 < ∞ ; σ 2 , ¯ σ 2 , ¯ µ ≤ E [ Y ij ] ≤ ¯ µ , µ > 0 , ¯ µ < ∞ , ¯ ¯ ¯ There exists δ > 0 , such that for all i , j, E [exp ( | Y ij | /δ )] < ∞ ; δ 2 (2¯ σ 2 σ 2 ) − 1 , (4 δ 2 / ¯ σ 2 ) 4 � � m 1 + m 2 ≥ max . ¯ � Set λ = 2 c δ ¯ σ 2( m 1 ∨ m 2 ) log( m 1 + m 2 ) / ( m 1 m 2 ) . Then 2 � � � ¯ X − ˆ X λ � � 18 rk T ( ¯ � � µ 2 ( m 1 + m 2 ) X ) + K 1 + K 2 log( m 1 + m 2 ) � σ, 2 σ 2 / ≤ 4 c δ ¯ , m 1 m 2 m 1 m 2 ¯ (6) with probability at least 1 − ( m 1 + m 2 ) − 1 , where c δ is a numerical constant that depends only on δ . Improves rate of Cao and Xie (2016) Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 12 / 39

Optimization problem m 1 m 2 � � Φ Y ( X ) = − ( m 1 m 2 ) − 1 ( Y ij X ij − exp( X ij )) i =1 j =1 X λ = argmin Reparametrize ˆ Φ Y ( X ) + λ �T ( X ) � σ, 1 in X ∈K X λ ˆ = argmin Φ Y ( X ) + λ � U � σ, 1 s.t. T ( X ) = U , (7) X ∈K , U ∈K T where K T image of K by T is compact. (7) is a separable, linearly constrained, strongly convex program on a compact set. ⇒ admits a unique solution. Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 13 / 39

Alternating direction method of multipliers (ADMM) Augmented Lagrangian indexed by τ , Γ dual variable: L τ ( X , U , Γ) = Φ Y ( X ) + λ � U � σ, 1 + � Γ , T ( X ) − U � + τ 2 �T ( X ) − U � 2 2 . At iteration k + 1 ADMM update rules are given by � X , U k , Γ k � X k +1 = argmin X ∈K L τ � X k +1 , U , Γ k � U k +1 = argmin U ∈K T L τ = Γ k + τ � T ( X k +1 ) − U k +1 � Γ k +1 . Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 14 / 39

Update rules X update: gradient descent X k +1 = argmin X ∈K � � U k � σ, 1 + � Γ k , T ( X ) − U k � Φ Y ( X ) + λ � � � + τ 2 � � T ( X ) − U k � � � 2 � 2 � X , U k , Γ k � = ∇ Φ Y ( X ) + Γ k + τ � T ( X ) − U k � ∇ X L τ . Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 15 / 39

Update rules X update: gradient descent X k +1 = argmin X ∈K � � U k � σ, 1 + � Γ k , T ( X ) − U k � Φ Y ( X ) + λ � � � + τ 2 � T ( X ) − U k � � � � 2 � 2 � X , U k , Γ k � = ∇ Φ Y ( X ) + Γ k + τ � T ( X ) − U k � ∇ X L τ . U update: closed form U k +1 = D λ/τ � � T ( X k +1 ) + Γ k /τ , D λ/τ operator for soft-thresholding of singular values at level λ/τ (Cai et al., 2010). Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 15 / 39

Alternating Direction Method of Multipliers (ADMM) Convergence is guaranteed by (Boyd et al., 2011, Theorem 3.2.1) Warm start strategy (Hastie et al., 2015) Run the algo using λ 0 such that T ( ˆ X λ 0 ) = 0 For decreasing values of λ initialize with previous run Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 16 / 39

Automatic selection of λ Genevi` eve Robin (Polytechnique) Low rank Interaction Log-linear Models February 16, 2017 17 / 39

Low rank Interaction Log-linear Model for Contingency Table Analysis - PowerPoint PPT Presentation

Low rank Interaction Log-linear Model for Contingency Table Analysis Genevi` eve Robin Ecole Polytechnique genevieve.robin@polytechnique.edu advisors: Julie Josse and Eric Moulines February 16, 2017 Genevi` eve Robin (Polytechnique)

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Linear Algebra Chapter 2. Dimension, Rank, and Linear Transformations Section 2.2. The Rank of a

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Contingency planning and Outbreak management Nia Meddins Plant Health Policy Lead What does

Experience of child mortality estimates Danzhen You UNICEF United Nations Expert Group Meeting

World Cancer Day THE IMPACT OF THE IAEA-SUPPORTED MASTERS DEGREE IN ADVANCED RADIOTHERAPY

Akademistatistik Centre for Applied Biostatistics Akademistatistik - Mission To be a centre

STATS 8: Introduction to Biostatistics Statistical Inference for the Relationship Between Two

The EU Framework Programme for Research and Innovation HORIZON 2020 Information Session on the

Birth, Marriage and Death statistics from registration systems Frank Dixon, Vital Events

Global, Regional, National Frameworks gives relevance and sets the direction Goal 3 Ensure

The Birthday Problem Math 10120, Spring 2013 February 17, 2013 Math 10120 (Spring 2013)