penalized fits to a multiway layout with multivariate
play

Penalized Fits to a Multiway Layout with Multivariate Responses - PowerPoint PPT Presentation

Penalized Fits to a Multiway Layout with Multivariate Responses Rudolf Beran University of California, Davis Workshop on Model Selection and Related Areas University of Vienna 24 July 2008 1 Multivariate Linear Model Y = CM + E , where


  1. Penalized Fits to a Multiway Layout with Multivariate Responses Rudolf Beran University of California, Davis Workshop on Model Selection and Related Areas University of Vienna 24 July 2008 1

  2. Multivariate Linear Model Y = CM + E , where • the rows of n × d matrix Y are d -variate responses; • the n × p design matrix C has rank p ≤ n ; • the p × d matrix M is unknown; • the n × d error matrix E = V Σ 1 / 2 , where Σ is an unknown p.d. covariance matrix and the elements of V are iid with mean 0 , variance 1 , and finite 4 -th moment. The least squares estimator of M is ˆ M ls = C + Y . Let y = vec( Y ) , m = vec( M ) , e = vec( E ) and ˜ C = I d ⊗ C . The vectorized model asserts y = ˜ Cm + e . m ls = ˜ C + y = vec( ˆ The least squares estimator of m is ˆ M ls ) . For now, assume Σ = I d . 2

  3. Quadratic Loss and Risk η be any estimator of η = ˜ Let ˆ Cm = E( y ) . η − η | 2 and the corresponding η, η ) = p − 1 | ˆ The loss of η is L (ˆ risk is R (ˆ η, η ) = E L (ˆ η, η ) . Equivalently, these are loss and risk η = ˜ functions on estimators of m through the 1-to-1 map ˆ C ˆ m . η ls = ˜ m ls = ˜ C ˜ C + y has risk The least squares estimator ˆ C ˆ R (ˆ η ls , η ) = d . Biased estimators of η can reduce risk substantially: Stein (1956), James and Stein (1961), Stein (1966); also papers on symmetric linear estimators such as Stein (1981), Li and Hwang (1984), Buja, Hastie and Tibshirani (1989), Kneip (1994), Beran (2007) . . . Penalized least squares (PLS) generates promising, biased, candidate symmetric linear estimators of η . 3

  4. General Structure of PLS for the Multivariate Linear Model Let S be an index set of fixed cardinality. Let { Q s : s ∈ S} be p × p p.s.d. penalty matrices . N = { N s : s ∈ S} be d × d p.s.d. affine penalty weights . Cm | 2 + m ′ Q ( N ) m , PLS criterion: G ( m, N ) = | y − ˜ where Q ( N ) = � s ∈S ( N s ⊗ Q s ) . The PLS estimators of m and η are then C ′ ˜ C + Q ( N )] − 1 ˜ m pls ( N ) = argmin m G ( m, N ) = [ ˜ C ′ y , ˆ C ′ ˜ C + Q ( N )] − 1 ˜ η pls ( N ) = ˜ m pls = ˜ C [ ˜ C ′ y , a symmetric linear ˆ C ˆ estimator (generalized ridge). These estimators can be derived as Bayes estimators in a normal error version of the multivariate linear model. Kimeldorf and Wahba (1970) make the general point. 4

  5. • When d = 1 , the penalty weights are non-negative scalars. E.g. Wood (2000), Beran (2005) use multiple penalty terms with scalar weights. • Functional data-analysis treats penalized estimation of a function m of continuous covariates. E.g. Wahba, Wang, Gu, Klein, Klein (1995), Li (2000), Ramsay and Silverman (2002). To be considered: • Data-based choice of the affine penalty weights { N s : s ∈ S} ; • Supporting asymptotic theory for the foregoing, as p → ∞ ; • Penalty matrices { Q s : s ∈ S} suitable for the multiway layout with d -variate responses; • Modifications for the case of a general unknown covariance matrix Σ . 5

  6. Canonical Form and Risk of ˆ η pls ( N ) Let ˜ R = I d ⊗ C ′ C , a pd × pd matrix of full rank. Let ˜ U = I d ⊗ C ( C ′ C ) − 1 / 2 , a nd × pd matrix. U ′ ˜ R 1 / 2 and ˜ Then ˜ C = I d ⊗ C = ˜ U ˜ U = I pd . Hence, C ′ ˜ C + Q ( N )] − 1 ˜ η pls ( N ) = ˜ C [ ˜ C ′ y = ˜ US ( N ) ˜ U ′ y , ˆ R − 1 / 2 ] − 1 is symmetric. where S ( N ) = [ I pd + ˜ R − 1 / 2 Q ( N ) ˜ U ′ ˜ Because R ( ˜ C ) = R ( ˜ U ) and ˜ U = I pd , η = ˜ Cm = ˜ Uξ , with ξ = ˜ U ′ η . Let z = ˜ η pls ( N ) = ˜ U ′ y . Then ˆ US ( N ) z . This is the canonical form of ˆ η pls ( N ) . The risk of ˆ η pls ( N ) is thus η ( N ) , η ) = p − 1 E | S ( N ) z − ξ | 2 = p − 1 [tr( T ( N )) + tr( ¯ T ( N ) ξξ ′ )] , R (ˆ where T ( N ) = S 2 ( N ) and ¯ T ( N ) = [ I pd − S ( N )] 2 . 6

  7. Estimated Risk The estimated risk of ˆ η pls ( N ) is T ( N )( zz ′ − I pd ) ′ )] , R ( N ) = p − 1 [tr( T ( N )) + tr( ¯ ˆ (cf. Mallows (1973), Stein (1981)). Let ˆ N = argmin N ˆ R ( N ) . E.g. Use Cholesky N s = L s L ′ s with { l s,i,i ≥ 0 } . The adaptive PLS estimators of η and of m are η pls ( ˆ m apls = C + ˆ η apls = ˆ ˆ N ) and ˆ η apls . Supporting Asymptotics Let | · | sp denote spectral matrix norm: | B | sp = sup x � =0 [ | Bx | / | x | ] . • Let W ( N ) denote either the loss or estimated risk of ˆ η pls ( N ) . Let N = { N : max s ∈S | N s | sp ≤ b } . Then, for every finite a > 0 , | W ( N ) − R (ˆ η pls ( N ) , η ) | ] = 0 . lim sup E[sup p →∞ p − 1 | η | 2 ≤ a N ∈N 7

  8. • For every finite a > 0 , lim sup | R (ˆ η apls , η ) − min N ∈N R (ˆ η ( N ) , η ) | = 0 . p →∞ p − 1 | η | 2 ≤ a • Let V denote either the loss or risk of ˆ η apls , Then, for every finite a > 0 , E | ˆ R ( ˆ N ) − V | = 0 . lim sup p →∞ p − 1 | η | 2 ≤ a The loss, risk and estimated risk of the candidate estimator η pls ( N ) converge together, as p → ∞ , uniformly over N ∈ N . ˆ Estimated risk is here a trustworthy surrogate for loss or risk. η apls converges, as p → ∞ , to the minimal risk The risk of ˆ achievable by the PLS candidate estimators The plug-in risk estimator ˆ R ( ˆ N ) converges to the loss or risk of ˆ η apls as p → ∞ . 8

  9. Complete k 0 -way Layout with Multivariate Responses Now the d dimensional responses depend on k 0 covariates. Covariate k has p k distinct levels x k, 1 < x k, 2 < . . . x k,p k . Let I denote all k 0 -tuples i = ( i 1 , i 2 , . . . , i k 0 ) , where 1 ≤ i k ≤ p k for 1 ≤ k ≤ k 0 . Thus, i k indexes the levels of covariate k and I lists all possible covariate-level combinations. We put the elements of I in mirror-dictionary order . We observe Y = CM + E , the assumptions on E as before. Here C is the n × p data-incidence matrix of 0’s and 1’s that suitably replicates rows of the p × d matrix M into the rows of E( Y ) = CM . The design is complete: rank( C ) = p . Row i ∈ I of M equals f ( x 1 ,i 1 , x 2 ,i 2 , . . . , x k 0 ,i k 0 ) where f is an unknown vector-valued function. 9

  10. Constructing Penalty Matrices { Q s : s ∈ S} We devise a scheme that penalizes individually the main effects and interactions in the MANOVA decomposition of M . For 1 ≤ k ≤ k 0 , define the p k × 1 vector u k = p − 1 / 2 (1 , 1 , . . . , 1) ′ . k Let A k be an annihilator: a matrix such that A k u k = 0 . Let S denote the set of all subsets of { 1 , 2 , . . . , k 0 } , including ∅ . Let Q s,k = u k u ′ ∈ s ; and Q s,k = A ′ k A k if k ∈ s . Define k if k / k 0 � Q s = Q s,k − k 0 +1 , s ∈ S . k =1 Special case: A k = I p k − u k u ′ k . Denote Q s in this case by P AN,s . The matrices { P AN,s : s ∈ S} are mutually orthogonal, orthogonal projections such that � s ∈S P AN,s = I p . MANOVA decomposition: M = � s ∈S P AN,s M . 10

  11. From the foregoing definitions, P AN,s Q s = Q s P AN,s = Q s for every s ∈ S ; and P AN,s 1 Q s 2 = Q s 2 P AN,s 1 = 0 if s 1 � = s 2 . Thus, s | 2 = | Q 1 / 2 m ′ ( N s ⊗ Q s ) m = | Q 1 / 2 s MN 1 / 2 s ( P AN,s M ) N 1 / 2 s | 2 . The penalty term in the PLS criterion is seen to operate on the summands in the MANOVA decompostion of M : s ∈S | Q 1 / 2 s ( P AN,s M ) N 1 / 2 m ′ Q ( N ) m = � s ∈S m ′ ( N s ⊗ Q s ) m = � s | 2 . Spectral Form of the Penalty Matrices { Q s } A ′ k A k = U k Λ k U ′ k , where Λ k = diag { l k,i k : 1 ≤ i k ≤ p k } and 0 = λ k, 1 ≤ λ k, 2 ≤ . . . ≤ λ k,p k . The first column of U k is chosen to be u k . Then u k u ′ k = U k E k U ′ k , where E k = diag { e k,i k : 1 ≤ i k ≤ p k } , with e k, 1 = 1 and e k,i k = 0 if i k ≥ 2 . Hence, Q s,k = U k Γ s,k U ′ k , where Γ s,k = diag { γ s,k,i k : 1 ≤ i k ≤ p k } , ∈ s ; γ s,k,i k = λ k,i k if k ∈ s . with γ s,k,i k = e k,i k if k / 11

  12. Write U k = [ u k, 1 , . . . u k,p k ] . Then, Q s,k = � p k i k =1 γ s,k,i k P k,i k , where P k,i k = u k,i k u ′ k,i k is a rank one orthogonal projection. For i ∈ I , let P i = � k 0 k =1 P k 0 − k +1 ,i k and γ s,i = � k 0 k =1 γ s,k 0 − k +1 ,i k . Let I s = { i ∈ I : i k = 1 if k / ∈ s and i k ≥ 2 if k ∈ s } . This defines a partition of I . Then, Q s = � k 0 k =1 Q s,k − k 0 +1 = � i ∈I s γ s,i P i . Here, γ {∅} ,i = 1 if i ∈ I ∅ and γ s,i = � k ∈ s λ s,i k if s � = ∅ and i ∈ I s . Note: The { P i } are mutually orthogonal projections such that � i ∈I P i = I pd . The MANOVA projection P AN,s = � i ∈I s P i . Next steps • Structure of the PLS estimators in balanced layouts. • Construction of suitable annihilator matrices. • Extension of PLS estimators to a general covariance matrix Σ . 12

  13. Balanced k 0 -way Layout with Multivariate Responses In a balanced layout C ′ C = n 0 I p for some n 0 ≥ 1 . Then, C ′ ˜ C ) − 1 ˜ m ls = ( ˜ 0 ˜ C ′ y = n − 1 C ′ y (averaging responses over ˆ replications) and, for Q ( N ) = � s ∈S ( N s ⊗ Q s ) , C ′ ˜ C + Q ( N )] − 1 ˜ 0 Q ( N )] − 1 ˆ m pls = [ ˜ C ′ y = [ I pd + n − 1 ˆ m ls . Using also Q s = � i ∈I s γ s,i P i yields I pd + n − 1 i ∈I s [( I d + n − 1 0 Q ( N ) = � � 0 γ s,i N s ) ⊗ P i ] . s ∈S Hence, for a balanced layout, 0 γ s,i N s ) − 1 ⊗ P i ] ˆ i ∈I s [( I d + n − 1 m pls ( N ) = � � ˆ m ls . s ∈S In matrix form, ˆ i ∈I s P i ˆ M ls ( I d + n − 1 0 γ s,i N s ) − 1 . M pls ( N ) = � � s ∈S The annihilators determine the projections { P i } and the { γ s,i } in the affine shrinkage factors. Estimated risk also simplifies. 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend