Attribute-Efficient Learning of Monomials over Highly-Correlated - - PowerPoint PPT Presentation

attribute efficient learning of monomials over highly
SMART_READER_LITE
LIVE PREVIEW

Attribute-Efficient Learning of Monomials over Highly-Correlated - - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019 Learning Sparse Monomials A Simple 3 dimensions


slide-1
SLIDE 1

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables

Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Algorithmic Learning Theory 2019

slide-2
SLIDE 2

A Simple Nonlinear Function Class

! "#, … , "& ≔ "( ⋅ "#* ⋅ "++ ⋅ "*,

3 dimensions

In - dimensions

. = 4

and . sparse

Learning Sparse Monomials

Ex:

slide-3
SLIDE 3

The Learning Problem

!(#), & !(#)

#'( )

Given: , drawn i.i.d. Goal: Recover & exactly Assumption 1: & is a *-sparse monomial function Assumption 2: !(#)~, 0, Σ

slide-4
SLIDE 4

Attribute-Efficient Learning

  • Sample efficiency: ! = poly(log ) , +)
  • Runtime efficiency: poly(), +, !) ops
  • Goal: achieve both!
slide-5
SLIDE 5

Motivation

!" ∈ ±1

  • Monomials ≡ Parity functions
  • No attribute-efficient algs!

[Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…]

!" ∈ ℝ

  • Sparse linear regression

[Candes+’04, Donoho+’04, Bickel+’09…]

  • Sparse sums of monomials

[Andoni+’14]

For uncorrelated features:

( !!) =

+,

  • +-
  • +.
  • +/
  • +0
  • +1
slide-6
SLIDE 6

!"

#

!#

#

!$

#

!%

#

!&

#

!'

#

() ∈ ±1

  • Monomials ≡ Parity functions
  • No attribute-efficient algs!

[Helmbold+ ‘92, Blum’98, Klivans&Servedio’06, Kalai+’09, Kocaoglu+’14…]

() ∈ ℝ

  • Sparse linear regression

[Candes+’04, Donoho+’04, Bickel+’09…]

  • Sparse polynomials

[Andoni+’14]

For uncorrelated features:

/ ((0 =

Motivation

Question: What if

/ ((0 =

?

≤ 4 ≤ 4

1 1 1 1 1 1

slide-7
SLIDE 7

Potential Degeneracy of

Ex:

!" !# !$ !% ⋮ !' ~ )(0, 1) )(0, 1) (!"+ !#)/√2 )(0, 1) ⋮ )(0, 1)

= 4 !!5

can be low-rank! =

1 .5 1 .5 .5 .5 1 … ⋱ ⋱ ⋮ ⋮ ⋱ … 1 1

Singular matrix

slide-8
SLIDE 8

Rest of the Talk

  • 1. Algorithm
  • 2. Intuition
  • 3. Analysis
  • 4. Conclusion
slide-9
SLIDE 9
  • 1. Algorithm
slide-10
SLIDE 10

The Algorithm

Ex: ! "#, … , "& ≔ "( ⋅ "#* ⋅ "++ ⋅ "*,

  • (/), ! -(/)

/1# 2

Gaussian Data

log | ⋅ |

log |- / |, log |!(-(/))|

/1# 2

Log-transformed Data

Step 1 Step 2

Sparse Regression:

1 3 17 44 79

feature

(Ex: Basis Pursuit)

slide-11
SLIDE 11
  • 2. Intuition
slide-12
SLIDE 12

Why is our Algorithm Attribute-Efficient?

  • Runtime: basis pursuit is efficient
  • Sample complexity?
  • Sparse linear regression? E.g.,
  • But: sparse recovery properties may not hold…

log $ %&, … , %) ≔ log |%,| + log |%&.| + log |%//| + log |%.0|

slide-13
SLIDE 13

Degenerate High Correlation

Recall the example:

= " ##$

3-sparse

−1/2 −1/2 1/ 2 ⋮

= + =

1 .5 1 .5 .5 .5 1 … ⋱ ⋱ ⋮ ⋮ ⋱ … 1 1

0-eigenvectors can be 0-sparse Sparse recovery conditions false!

slide-14
SLIDE 14

Summary of Challenges

  • Highly correlated features
  • Nonlinearity of log | ⋅ |
  • Need a recovery condition…
slide-15
SLIDE 15

Log-Transform affects Data Covariance log | ⋅ |

& ''( ≽ 0 & log ' log ' ( ≻ 0

Spectral View:

“inflating the balloon” Destroys correlation structure

slide-16
SLIDE 16
  • 3. Analysis
slide-17
SLIDE 17

Restricted Eigenvalue Condition [Bickel, Ritov, & Tsybakov ‘09]

! = {$: ||$'||( ≥ ||$'*||(} Cone restriction

Sufficient to prove exact recovery for basis pursuit!

Restricted Eigenvalue ,-(/)

min

4∈6

$7887$ ||$||9

9

> ;

“restricted strong convexity” < = / < = 3, 17, 44, 79 / = 4 Ex: Note: ,- / ≥ CDEF 887

slide-18
SLIDE 18

Sample Complexity Analysis

Population Transformed Eigenvalue

Concentration of Restricted Eigenvalue

|"#$(&)

  • "#$(&)

| < )

with probability ≥ 1 − -

"./0 > ) > 0

Exact Recovery for Basis Pursuit

with high probability

"#$(&) > 0

with high probability = 4 log 8 log 8 9

slide-19
SLIDE 19

Sample Complexity Analysis

Population Transformed Eigenvalue

Concentration of Restricted Eigenvalue

|"#$(&)

  • "#$(&)

| < )

with probability ≥ 1 − -

"./0 > ) > 0

Exact Recovery for Basis Pursuit

with high probability

"#$(&) > 0

with high probability

Sample Complexity Bound: 3 = O 67 log 26 1 − < ⋅ log7 2>

  • = ? log @ log @ A
slide-20
SLIDE 20

Population Minimum Eigenvalue

  • Hermite expansion of log | ⋅ |:
  • ' ≥ 1: *+,

+ ~ √/ 0 ⋅ 1 , ⁄

3 4

(+,) off-diagonals decay fast!

  • Apply 789: to Hermite formula:
  • Apply Gershgorin Circle Theorem:

= *<

+1=>= + @ ,A1 B

*+,

+ ⋅ (+,) 789:

≥ @

,A1 B

*+,

+ 789: (+,)

789: ⋅

(+,) ≥ 1 − D − 1 E+,

(for large enough ') = F GGH = F log G log G H

slide-21
SLIDE 21

Concentration of Restricted Eigenvalue

  • |"#$(&)
  • "#$(&)

| < ) ⋅ || − ||,

  • Log-transformed variables are sub-exponential
  • Elementwise ℓ, error concentrates [Kuchibhotla & Chakrabortty ‘18]

= / log 3 log 3 4

slide-22
SLIDE 22
  • 4. Conclusion
slide-23
SLIDE 23

Recap

  • Attribute-efficient algorithm for monomials
  • Prior (nonlinear) work: uncorrelated features
  • This work: allow highly correlated features
  • Works beyond multilinear monomials
  • Blessing of nonlinearity

log | ⋅ |

slide-24
SLIDE 24

Future Work

  • Rotations of product distributions
  • Additive noise
  • Sparse polynomials with correlated features

Thanks! Questions?