attribute efficient learning of monomials over highly
play

Attribute-Efficient Learning of Monomials over Highly-Correlated - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019 Learning Sparse Monomials A Simple 3 dimensions


  1. Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019

  2. Learning Sparse Monomials A Simple 3 dimensions Nonlinear Function Class In - dimensions Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, and . sparse . = 4

  3. The Learning Problem ) Given: , drawn i.i.d. ! (#) , & ! (#) #'( Assumption 1: & is a * -sparse monomial function Assumption 2: ! (#) ~, 0, Σ Goal: Recover & exactly

  4. Attribute-Efficient Learning • Sample efficiency: ! = poly(log ) , +) • Runtime efficiency: poly(), +, !) ops • Goal: achieve both!

  5. Motivation ! " ∈ ±1 ! " ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse sums of monomials • [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] [Andoni+’14] For uncorrelated features: - + , - 0 + - ( !! ) = - + . - + / 0 - + 0 - + 1

  6. Motivation ( ) ∈ ±1 ( ) ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • Question: What if [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse polynomials • 1 [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, ≤ 4 Kalai+’09, Kocaoglu+’14…] [Andoni+’14] 1 / (( 0 = ? 1 For uncorrelated features: 1 # ! " ≤ 4 1 # 0 ! # / (( 0 = 1 # ! $ # ! % 0 # ! & # ! '

  7. Potential Degeneracy of = 4 !! 5 )(0, 1) ! " 1 0 .5 0 )(0, 1) 0 ! # 0 … 0 0 1 .5 0 ! $ 0 Ex: (! " + ! # )/√2 = .5 .5 1 ~ ! % ⋱ 0 0 0 ⋱ ⋮ )(0, 1) 0 ⋱ ⋮ 1 0 ⋮ ⋮ 0 … 0 1 ! ' )(0, 1) Singular matrix can be low-rank !

  8. Rest of the Talk 1. Algorithm 2. Intuition 3. Analysis 4. Conclusion

  9. 1. Algorithm

  10. The Algorithm Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, Step 1 log | ⋅ | 2 2 log |- / |, log |!(- (/) )| - (/) , ! - (/) /1# /1# Gaussian Data Log-transformed Data Step 2 1 Sparse Regression: (Ex: Basis Pursuit) feature 0 3 17 44 79

  11. 2. Intuition

  12. Why is our Algorithm Attribute-Efficient? • Runtime: basis pursuit is efficient • Sample complexity? • Sparse linear regression? E.g., log $ % & , … , % ) ≔ log |% , | + log |% &. | + log |% // | + log |% .0 | • But: sparse recovery properties may not hold…

  13. = " ## $ Degenerate High Correlation 1 0 .5 0 0 0 … 0 0 1 .5 0 = Recall the example: 0 .5 .5 1 ⋱ 0 0 0 ⋱ ⋮ 0 ⋱ ⋮ 1 0 … 0 0 1 −1/2 −1/2 0 -eigenvectors can be 0 -sparse = + 1/ 2 0 ⋮ 0 Sparse recovery conditions false! 3-sparse

  14. Summary of Challenges • Highly correlated features • Nonlinearity of log | ⋅ | • Need a recovery condition…

  15. Log-Transform affects Data Covariance log | ⋅ | & log ' log ' ( ≻ 0 & '' ( ≽ 0 “inflating the balloon” Spectral View: Destroys correlation structure

  16. 3. Analysis

  17. Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09] Ex: < = 3, 17, 44, 79 Restricted Eigenvalue ,-(/) / = 4 $ 7 88 7 $ min > ; 9 ||$|| 9 Cone restriction 4∈6 “restricted strong convexity” Note: ,- / ≥ C DEF 88 7 ! = {$: ||$ ' || ( ≥ ||$ ' * || ( } Sufficient to prove exact recovery for basis pursuit! < = /

  18. = 4 log 8 log 8 9 Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - " #$(&) > 0 with high probability Exact Recovery for Basis Pursuit with high probability

  19. = ? log @ log @ A Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - Sample Complexity Bound: 3 = O 6 7 log 26 ⋅ log 7 2> " #$(&) > 0 1 − < - with high probability Exact Recovery for Basis Pursuit with high probability

  20. = F log G log G H Population Minimum Eigenvalue = F GG H • Hermite expansion of log | ⋅ |: • Apply 7 89: to Hermite formula: B B + ⋅ (+,) 7 89: + 7 89: (+,) + 1 =>= + @ = * < * +, ≥ @ * +, ,A1 ,A1 + ~ √/ 1 • ' ≥ 1: * +, 0 ⋅ • Apply Gershgorin Circle Theorem: , ⁄ 3 4 (+,) ≥ 1 − D − 1 E +, 7 89: ⋅ (+,) off-diagonals decay fast! • ⋅ (for large enough ' )

  21. = / log 3 log 3 4 Concentration of Restricted Eigenvalue • |" #$(&) - " #$(&) | < ) ⋅ || − || , • Log-transformed variables are sub-exponential • Elementwise ℓ , error concentrates [Kuchibhotla & Chakrabortty ‘18]

  22. 4. Conclusion

  23. Recap • Attribute-efficient algorithm for monomials • Prior (nonlinear) work: uncorrelated features • This work: allow highly correlated features • Works beyond multilinear monomials • Blessing of nonlinearity log | ⋅ |

  24. Future Work • Rotations of product distributions • Additive noise • Sparse polynomials with correlated features Thanks! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend