 
              Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019
Learning Sparse Monomials A Simple 3 dimensions Nonlinear Function Class In - dimensions Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, and . sparse . = 4
The Learning Problem ) Given: , drawn i.i.d. ! (#) , & ! (#) #'( Assumption 1: & is a * -sparse monomial function Assumption 2: ! (#) ~, 0, Σ Goal: Recover & exactly
Attribute-Efficient Learning • Sample efficiency: ! = poly(log ) , +) • Runtime efficiency: poly(), +, !) ops • Goal: achieve both!
Motivation ! " ∈ ±1 ! " ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse sums of monomials • [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] [Andoni+’14] For uncorrelated features: - + , - 0 + - ( !! ) = - + . - + / 0 - + 0 - + 1
Motivation ( ) ∈ ±1 ( ) ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • Question: What if [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse polynomials • 1 [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, ≤ 4 Kalai+’09, Kocaoglu+’14…] [Andoni+’14] 1 / (( 0 = ? 1 For uncorrelated features: 1 # ! " ≤ 4 1 # 0 ! # / (( 0 = 1 # ! $ # ! % 0 # ! & # ! '
Potential Degeneracy of = 4 !! 5 )(0, 1) ! " 1 0 .5 0 )(0, 1) 0 ! # 0 … 0 0 1 .5 0 ! $ 0 Ex: (! " + ! # )/√2 = .5 .5 1 ~ ! % ⋱ 0 0 0 ⋱ ⋮ )(0, 1) 0 ⋱ ⋮ 1 0 ⋮ ⋮ 0 … 0 1 ! ' )(0, 1) Singular matrix can be low-rank !
Rest of the Talk 1. Algorithm 2. Intuition 3. Analysis 4. Conclusion
1. Algorithm
The Algorithm Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, Step 1 log | ⋅ | 2 2 log |- / |, log |!(- (/) )| - (/) , ! - (/) /1# /1# Gaussian Data Log-transformed Data Step 2 1 Sparse Regression: (Ex: Basis Pursuit) feature 0 3 17 44 79
2. Intuition
Why is our Algorithm Attribute-Efficient? • Runtime: basis pursuit is efficient • Sample complexity? • Sparse linear regression? E.g., log $ % & , … , % ) ≔ log |% , | + log |% &. | + log |% // | + log |% .0 | • But: sparse recovery properties may not hold…
= " ## $ Degenerate High Correlation 1 0 .5 0 0 0 … 0 0 1 .5 0 = Recall the example: 0 .5 .5 1 ⋱ 0 0 0 ⋱ ⋮ 0 ⋱ ⋮ 1 0 … 0 0 1 −1/2 −1/2 0 -eigenvectors can be 0 -sparse = + 1/ 2 0 ⋮ 0 Sparse recovery conditions false! 3-sparse
Summary of Challenges • Highly correlated features • Nonlinearity of log | ⋅ | • Need a recovery condition…
Log-Transform affects Data Covariance log | ⋅ | & log ' log ' ( ≻ 0 & '' ( ≽ 0 “inflating the balloon” Spectral View: Destroys correlation structure
3. Analysis
Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09] Ex: < = 3, 17, 44, 79 Restricted Eigenvalue ,-(/) / = 4 $ 7 88 7 $ min > ; 9 ||$|| 9 Cone restriction 4∈6 “restricted strong convexity” Note: ,- / ≥ C DEF 88 7 ! = {$: ||$ ' || ( ≥ ||$ ' * || ( } Sufficient to prove exact recovery for basis pursuit! < = /
= 4 log 8 log 8 9 Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - " #$(&) > 0 with high probability Exact Recovery for Basis Pursuit with high probability
= ? log @ log @ A Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - Sample Complexity Bound: 3 = O 6 7 log 26 ⋅ log 7 2> " #$(&) > 0 1 − < - with high probability Exact Recovery for Basis Pursuit with high probability
= F log G log G H Population Minimum Eigenvalue = F GG H • Hermite expansion of log | ⋅ |: • Apply 7 89: to Hermite formula: B B + ⋅ (+,) 7 89: + 7 89: (+,) + 1 =>= + @ = * < * +, ≥ @ * +, ,A1 ,A1 + ~ √/ 1 • ' ≥ 1: * +, 0 ⋅ • Apply Gershgorin Circle Theorem: , ⁄ 3 4 (+,) ≥ 1 − D − 1 E +, 7 89: ⋅ (+,) off-diagonals decay fast! • ⋅ (for large enough ' )
= / log 3 log 3 4 Concentration of Restricted Eigenvalue • |" #$(&) - " #$(&) | < ) ⋅ || − || , • Log-transformed variables are sub-exponential • Elementwise ℓ , error concentrates [Kuchibhotla & Chakrabortty ‘18]
4. Conclusion
Recap • Attribute-efficient algorithm for monomials • Prior (nonlinear) work: uncorrelated features • This work: allow highly correlated features • Works beyond multilinear monomials • Blessing of nonlinearity log | ⋅ |
Future Work • Rotations of product distributions • Additive noise • Sparse polynomials with correlated features Thanks! Questions?
Recommend
More recommend