attribute efficient learning of monomials over highly
play

Attribute-Efficient Learning of Monomials over Highly-Correlated - PowerPoint PPT Presentation

Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University Problem Statement Prior work: Attribute-efficient learning of polynomials Boolean


  1. Attribute-Efficient Learning of Monomials over Highly-Correlated Variables Alexandr Andoni, Rishabh Dudeja, Daniel Hsu, Kiran Vodrahalli Columbia University

  2. Problem Statement

  3. Prior work: Attribute-efficient learning of polynomials Boolean domain Real domain - Learning sparse parities is a hard problem! - Sparse linear regression: attribute-efficient - RIP, REC, NSP assumptions on data Parity ⇔ monomial over {-1, +1} p - [Candes ‘04, Donoho ‘04, Bickel ‘09, …] - Many papers: [Helmbold et. al. ‘92, Blum ‘98, Klivans - General polynomials (NOT attribute-efficient ) & Servedio ‘06, Kalai et. al. ‘09, Kocaoglu et. al ‘14, ... ] - Sparse polynomials [Andoni et. al. ‘14] - Most results: - product distribution - Assume product distribution (often - Gaussian or uniform data uniform) - Runtime & sample complexity: Runtime ~ dimension c * sparsity , c < 1 - poly(dimension, 2 degree , sparsity) - NOT attribute-efficient Compare to naive dimension degree - Takeaway: Boolean setting Takeaway: Most work linear, rest well-studied and difficult! assumes product distribution.

  4. This work: Non-product distributions for monomials ● One weird trick: Take the log of features and responses, run Lasso! ⇒ Attribute-efficient algorithm! ○ Learns k-sparse monomials ● ● Gaussian data ● Variance 1, covariance at most 1 - ε ○ Arbitrarily high correlation between features! Runtime: poly(samples, dimension, sparsity) ● Sample complexity: ~ ●

  5. Binary Data Setting (reference for details) ● Boolean features ( Valiant ‘84, Littlestone ‘88, Helmbold et. al. ‘92, Klivans et. al. ‘06, Valiant ‘15 ): Conjunctions over {0, 1} p are learnable efficiently ○ Monomials over {+1, -1} p are parity functions and are PAC learnable ○ k-sparse parities: Sample efficient ( ), computationally inefficient ( ) ○ ■ Runtime improvement over naive case: Improper learner: samples, runtime ■ ○ Attribute-inefficient noisy parity: time for data under uniform dist. is noise parameter ■ ● Average case analysis for learning parity ( Kalai et. al. ‘09, Kocauglu et. al. ‘14 ): Learn DNF/ functions defined on {+1, -1} p ○ Can learn over adversarial + perturbed product distribution ○ ○ Can learn in smoothed analysis settings (adversarial + perturbed function)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend