Attribute-Efficient Learning of Monomials over Highly-Correlated - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019

Learning Sparse Monomials A Simple 3 dimensions Nonlinear Function Class In - dimensions Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, and . sparse . = 4

The Learning Problem ) Given: , drawn i.i.d. ! (#) , & ! (#) #'( Assumption 1: & is a * -sparse monomial function Assumption 2: ! (#) ~, 0, Σ Goal: Recover & exactly

Attribute-Efficient Learning • Sample efficiency: ! = poly(log ) , +) • Runtime efficiency: poly(), +, !) ops • Goal: achieve both!

Motivation ! " ∈ ±1 ! " ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse sums of monomials • [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…] [Andoni+’14] For uncorrelated features: - + , - 0 + - ( !! ) = - + . - + / 0 - + 0 - + 1

Motivation ( ) ∈ ±1 ( ) ∈ ℝ • Monomials ≡ Parity functions Sparse linear regression • Question: What if [Candes+’04, Donoho+’04, Bickel+’09…] • No attribute-efficient algs! Sparse polynomials • 1 [Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, ≤ 4 Kalai+’09, Kocaoglu+’14…] [Andoni+’14] 1 / (( 0 = ? 1 For uncorrelated features: 1 # ! " ≤ 4 1 # 0 ! # / (( 0 = 1 # ! $ # ! % 0 # ! & # ! '

Potential Degeneracy of = 4 !! 5 )(0, 1) ! " 1 0 .5 0 )(0, 1) 0 ! # 0 … 0 0 1 .5 0 ! $ 0 Ex: (! " + ! # )/√2 = .5 .5 1 ~ ! % ⋱ 0 0 0 ⋱ ⋮ )(0, 1) 0 ⋱ ⋮ 1 0 ⋮ ⋮ 0 … 0 1 ! ' )(0, 1) Singular matrix can be low-rank !

Rest of the Talk 1. Algorithm 2. Intuition 3. Analysis 4. Conclusion

1. Algorithm

The Algorithm Ex: ! " # , … , " & ≔ " ( ⋅ " #* ⋅ " ++ ⋅ " *, Step 1 log | ⋅ | 2 2 log |- / |, log |!(- (/) )| - (/) , ! - (/) /1# /1# Gaussian Data Log-transformed Data Step 2 1 Sparse Regression: (Ex: Basis Pursuit) feature 0 3 17 44 79

2. Intuition

Why is our Algorithm Attribute-Efficient? • Runtime: basis pursuit is efficient • Sample complexity? • Sparse linear regression? E.g., log $ % & , … , % ) ≔ log |% , | + log |% &. | + log |% // | + log |% .0 | • But: sparse recovery properties may not hold…

= " ## $ Degenerate High Correlation 1 0 .5 0 0 0 … 0 0 1 .5 0 = Recall the example: 0 .5 .5 1 ⋱ 0 0 0 ⋱ ⋮ 0 ⋱ ⋮ 1 0 … 0 0 1 −1/2 −1/2 0 -eigenvectors can be 0 -sparse = + 1/ 2 0 ⋮ 0 Sparse recovery conditions false! 3-sparse

Summary of Challenges • Highly correlated features • Nonlinearity of log | ⋅ | • Need a recovery condition…

Log-Transform affects Data Covariance log | ⋅ | & log ' log ' ( ≻ 0 & '' ( ≽ 0 “inflating the balloon” Spectral View: Destroys correlation structure

3. Analysis

Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09] Ex: < = 3, 17, 44, 79 Restricted Eigenvalue ,-(/) / = 4 $ 7 88 7 $ min > ; 9 ||$|| 9 Cone restriction 4∈6 “restricted strong convexity” Note: ,- / ≥ C DEF 88 7 ! = {$: ||$ ' || ( ≥ ||$ ' * || ( } Sufficient to prove exact recovery for basis pursuit! < = /

= 4 log 8 log 8 9 Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - " #$(&) > 0 with high probability Exact Recovery for Basis Pursuit with high probability

= ? log @ log @ A Sample Complexity Analysis Concentration of Restricted Eigenvalue Population Transformed Eigenvalue " ./0 > ) > 0 |" #$(&) - " #$(&) | < ) with probability ≥ 1 − - Sample Complexity Bound: 3 = O 6 7 log 26 ⋅ log 7 2> " #$(&) > 0 1 − < - with high probability Exact Recovery for Basis Pursuit with high probability

= F log G log G H Population Minimum Eigenvalue = F GG H • Hermite expansion of log | ⋅ |: • Apply 7 89: to Hermite formula: B B + ⋅ (+,) 7 89: + 7 89: (+,) + 1 =>= + @ = * < * +, ≥ @ * +, ,A1 ,A1 + ~ √/ 1 • ' ≥ 1: * +, 0 ⋅ • Apply Gershgorin Circle Theorem: , ⁄ 3 4 (+,) ≥ 1 − D − 1 E +, 7 89: ⋅ (+,) off-diagonals decay fast! • ⋅ (for large enough ' )

= / log 3 log 3 4 Concentration of Restricted Eigenvalue • |" #$(&) - " #$(&) | < ) ⋅ || − || , • Log-transformed variables are sub-exponential • Elementwise ℓ , error concentrates [Kuchibhotla & Chakrabortty ‘18]

4. Conclusion

Recap • Attribute-efficient algorithm for monomials • Prior (nonlinear) work: uncorrelated features • This work: allow highly correlated features • Works beyond multilinear monomials • Blessing of nonlinearity log | ⋅ |

Future Work • Rotations of product distributions • Additive noise • Sparse polynomials with correlated features Thanks! Questions?

Attribute-Efficient Learning of Monomials over Highly-Correlated - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019 Learning Sparse Monomials A Simple 3 dimensions

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni,

Polynomials a variable expression whose terms are Monomials. Monomials have 1 term. Binomials

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Highly Efficient Gradient Computation for Highly Efficient Gradient Computation for Density-

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

Section 5.1 Dr. Doug Ensley Fall 2013 Polynomial Functions A polynomial is a sum of monomials. A

Puiseux series dynamics and leading monomials of escape regions Jan Kiwi PUC, Chile Workshop on

Using mixed volume theory to compute the convex hull volume for trilinear monomials 23 rd

Simplify Mar 111:15 AM 1 Polynomials Review.notebook February 18, 2013 7.4 Scientific

gholzmann@acm.org ISO 26262: highly recommended EN 50128: highly recommended IEC 61508: highly

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Enrique Soriano Martn Universidad Politcnica de Madrid 3rd International Electronic

Computing Nucleon Electric Dipole Moments in Lattice QCD Hiroshi Ohki Nara Womens University

Extraction of moments of net-particle event-by-event fluctuations in the CBM experiment V.

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

Note This slide was added after the presentation at the Stata User Group Meeting in London. As of

Correlation Learning Objectives At the end of this lecture, the student should be able to:

Computational Linguistics: Evaluation Methods Raffaella Bernardi University of Trento Contents

Attribute-Efficient Learning of Monomials over Highly-Correlated - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019 Learning Sparse Monomials A Simple 3 dimensions

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni,

Polynomials a variable expression whose terms are Monomials. Monomials have 1 term. Binomials

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Highly Efficient Gradient Computation for Highly Efficient Gradient Computation for Density-

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

Section 5.1 Dr. Doug Ensley Fall 2013 Polynomial Functions A polynomial is a sum of monomials. A

Puiseux series dynamics and leading monomials of escape regions Jan Kiwi PUC, Chile Workshop on

Using mixed volume theory to compute the convex hull volume for trilinear monomials 23 rd

Simplify Mar 111:15 AM 1 Polynomials Review.notebook February 18, 2013 7.4 Scientific

gholzmann@acm.org ISO 26262: highly recommended EN 50128: highly recommended IEC 61508: highly

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Enrique Soriano Martn Universidad Politcnica de Madrid 3rd International Electronic

Computing Nucleon Electric Dipole Moments in Lattice QCD Hiroshi Ohki Nara Womens University

Extraction of moments of net-particle event-by-event fluctuations in the CBM experiment V.

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

Note This slide was added after the presentation at the Stata User Group Meeting in London. As of

Correlation Learning Objectives At the end of this lecture, the student should be able to:

Computational Linguistics: Evaluation Methods Raffaella Bernardi University of Trento Contents

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data