Distribution Regression o ( Zolt an Szab Ecole Polytechnique) - PowerPoint PPT Presentation

Distribution Regression o (´ Zolt´ an Szab´ Ecole Polytechnique) Joint work with ◦ Bharath K. Sriperumbudur (Department of Statistics, PSU), ◦ Barnab´ as P´ oczos (ML Department, CMU), ◦ Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481 December 1, 2016 Szab´ o et al. Distribution Regression

Example: sustainability Goal : aerosol prediction → climate. Prediction using labelled bags: bag := multi-spectral satellite measurements over an area, label := local aerosol value. Szab´ o et al. Distribution Regression

Example: existing methods Multi-instance learning: [Haussler, 1999, G¨ artner et al., 2002] (set kernel): sensible methods in regression: few, restrictive technical conditions, 1 super-high resolution satellite image: would be needed. 2 Szab´ o et al. Distribution Regression

One-page summary Contributions: Practical: state-of-the-art accuracy (aerosol). 1 Theoretical: 2 General bags: graphs, time series, texts, . . . Consistency of set kernel in regression (17-year-old open problem). How many samples/bag? → [Szab´ o et al., 2016]. Szab´ o et al. Distribution Regression

Objects in the bags time-series modelling: user = set of time-series, computer vision: image = collection of patch vectors, NLP: corpus = bag of documents, network analysis: group of people = bag of friendship graphs, . . . Szab´ o et al. Distribution Regression

Regression on labelled bags Given: �� ˆ �� ℓ i =1 , ˆ P i : bag from P i , N := | ˆ labelled bags: ˆ z = P i , y i P i | . test bag: ˆ P . Szab´ o et al. Distribution Regression

Regression on labelled bags Given: �� ˆ �� ℓ i =1 , ˆ P i : bag from P i , N := | ˆ labelled bags: ˆ z = P i , y i P i | . test bag: ˆ P . Estimator: 1 � � 2 � ℓ � � + λ � f � 2 f λ z = arg min f µ ˆ − y i H . ˆ P i ℓ i =1 f ∈ H �� feature of ˆ P i Szab´ o et al. Distribution Regression

Regression on labelled bags Given: �� ˆ �� ℓ i =1 , ˆ P i : bag from P i , N := | ˆ labelled bags: ˆ z = P i , y i P i | . test bag: ˆ P . Estimator: � � 2 1 � ℓ � � + λ � f � 2 f λ z = arg min f − y i µ ˆ H . ˆ P i ℓ i =1 f ∈ H ( K ) Prediction: � ˆ � = g T ( G + ℓλ I ) − 1 y , y ˆ P � � �� g = K µ ˆ P , µ ˆ , G = K µ ˆ P i , µ ˆ , y = [ y i ] . P i P j Szab´ o et al. Distribution Regression

Regression on labelled bags Given: �� ˆ �� ℓ i =1 , ˆ P i : bag from P i , N := | ˆ labelled bags: ˆ z = P i , y i P i | . test bag: ˆ P . Estimator: � � 2 1 � ℓ � � + λ � f � 2 f λ z = arg min f − y i µ ˆ H . ˆ P i ℓ i =1 f ∈ H ( K ) Prediction: � ˆ � = g T ( G + ℓλ I ) − 1 y , y ˆ P � � �� g = K µ ˆ P , µ ˆ , G = K µ ˆ P i , µ ˆ , y = [ y i ] . P i P j Challenge How many samples/bag? Szab´ o et al. Distribution Regression

Regression on labelled bags: similarity Let us define an inner product on distributions [ ˜ K ( P , Q )]: Set kernel: A = { a i } N i =1 , B = { b j } N j =1 . 1 � 1 N N N K ( A , B ) = 1 , 1 � � � � ˜ k ( a i , b j ) = ϕ ( a i ) ϕ ( b j ) . N 2 N N i , j =1 i =1 j =1 � �� feature of bag A Remember: Szab´ o et al. Distribution Regression

Regression on labelled bags: similarity Let us define an inner product on distributions [ ˜ K ( P , Q )]: Set kernel: A = { a i } N i =1 , B = { b j } N j =1 . 1 � 1 N N N K ( A , B ) = 1 , 1 � � � � ˜ k ( a i , b j ) = ϕ ( a i ) ϕ ( b j ) . N 2 N N i , j =1 i =1 j =1 � �� feature of bag A Taking ’limit’ [Berlinet and Thomas-Agnan, 2004, 2 Altun and Smola, 2006, Smola et al., 2007]: a ∼ P , b ∼ Q � � ˜ K ( P , Q ) = E a , b k ( a , b ) = E a ϕ ( a ) , E b ϕ ( b ) . � �� feature of distribution P =: µ P Example (Gaussian kernel): k ( a , b ) = e −� a − b � 2 2 / (2 σ 2 ) . Szab´ o et al. Distribution Regression

Regression on labelled bags: baseline Quality of estimator, baseline: R ( f ) = E ( µ P , y ) ∼ ρ [ f ( µ P ) − y ] 2 , f ρ = best regressor . How many samples/bag to get the accuracy of f ρ ? Possible? Assume (for a moment): f ρ ∈ H ( K ). Szab´ o et al. Distribution Regression

Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/achieved rate � � bc ℓ − R ( f λ z ) − R ( f ρ ) = O , bc +1 b – size of the input space, c – smoothness of f ρ . Szab´ o et al. Distribution Regression

Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/achieved rate � � bc ℓ − R ( f λ z ) − R ( f ρ ) = O , bc +1 b – size of the input space, c – smoothness of f ρ . Let N = ˜ O ( ℓ a ). N : size of the bags. ℓ : number of bags. Our result If 2 ≤ a , then f λ z attains the best achievable rate. ˆ Szab´ o et al. Distribution Regression

Our result: how many samples/bag Known [Caponnetto and De Vito, 2007]: best/achieved rate � � bc ℓ − R ( f λ z ) − R ( f ρ ) = O , bc +1 b – size of the input space, c – smoothness of f ρ . Let N = ˜ O ( ℓ a ). N : size of the bags. ℓ : number of bags. Our result If 2 ≤ a , then f λ z attains the best achievable rate. ˆ In fact, a = b ( c +1) bc +1 < 2 is enough. Consequence: regression with set kernel is consistent. Szab´ o et al. Distribution Regression

Extensions K : linear → H¨ older, e.g. RBF [Christmann and Steinwart, 2010]. 1 Szab´ o et al. Distribution Regression

Extensions K : linear → H¨ older, e.g. RBF [Christmann and Steinwart, 2010]. 1 Misspecified setting ( f ρ ∈ L 2 \ H ): 2 Consistency: convergence to inf f ∈ H � f − f ρ � L 2 . Smoothness on f ρ : computational & statistical tradeoff. Szab´ o et al. Distribution Regression

Extensions Vector-valued output: 3 Y : separable Hilbert space ⇒ K ( µ P , µ Q ) ∈ L ( Y ). Prediction on a test bag ˆ P : � ˆ � = g T ( G + ℓλ I ) − 1 y , y ˆ P g = [ K ( µ ˆ P , µ ˆ P i )] , G = [ K ( µ ˆ P i , µ ˆ P j )] , y = [ y i ] . Specifically: Y = R ⇒ L ( Y ) = R ; Y = R d ⇒ L ( Y ) = R d × d . Szab´ o et al. Distribution Regression

Aerosol prediction result (100 × RMSE ) We perform on par with the state-of-the-art, hand-engineered method. [Wang et al., 2012]: 7 . 5 − 8 . 5: hand-crafted features. Our prediction accuracy: 7 . 81: no expert knowledge. Code in ITE: https://bitbucket.org/szzoli/ite/ Szab´ o et al. Distribution Regression

Summary Problem: distribution regression. Contribution: computational & statistical tradeoff analysis, specifically, the set kernel is consistent, minimax optimal rate is achievable: sub-quadratic bag size. Open question: optimal bag size. Szab´ o et al. Distribution Regression

Thank you for the attention! Acknowledgments : This work was supported by the Gatsby Charitable Foundation, and by NSF grants IIS1247658 and IIS1250350. A part of the work was carried out while Bharath K. Sriperumbudur was a research fellow in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge, UK. Szab´ o et al. Distribution Regression

Altun, Y. and Smola, A. (2006). Unifying divergence minimization and statistical inference via convex duality. In Conference on Learning Theory (COLT) , pages 139–153. Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics . Kluwer. Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics , 7:331–368. Christmann, A. and Steinwart, I. (2010). Universal kernels on non-standard input spaces. In Advances in Neural Information Processing Systems (NIPS) , pages 406–414. G¨ artner, T., Flach, P. A., Kowalczyk, A., and Smola, A. (2002). Szab´ o et al. Distribution Regression

Multi-instance kernels. In International Conference on Machine Learning (ICML) , pages 179–186. Haussler, D. (1999). Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz. ( http://cbse.soe.ucsc.edu/sites/default/files/ convolutions.pdf ). Smola, A., Gretton, A., Song, L., and Sch¨ olkopf, B. (2007). A Hilbert space embedding for distributions. In Algorithmic Learning Theory (ALT) , pages 13–31. Szab´ o, Z., Sriperumbudur, B., P´ oczos, B., and Gretton, A. (2016). Learning theory for distribution regression. Journal of Machine Learning Research , 17(152):1–40. Wang, Z., Lan, L., and Vucetic, S. (2012). Szab´ o et al. Distribution Regression

Mixture model for multiple instance regression and applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing , 50:2226–2237. Szab´ o et al. Distribution Regression

Distribution Regression o ( Zolt an Szab Ecole Polytechnique) - PowerPoint PPT Presentation

Distribution Regression o ( Zolt an Szab Ecole Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnab as P oczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Public Health Promise or Peril? The Rise of E-Cigarettes and Implications for Tobacco Control

How Recycling Works in Massachusetts Where does your recycling go? Dos and Dont of

PREVIEWING THE 2021 TCN POLICY RECOMMENDATIONS GUIDE TCN Virtual Coffee Chat Series July 28,

Remote Sensing tools from Ground, Airborne and Space: Measuring radiation and designing in

Chapter 4 Response of the climate system to a perturbation Climate system dynamics and modelling

Atmospheric calibration of the Cherenkov Telescope Array Jan Ebr for the CTA Consortium

REVISED 10 CFR PART 35: MEDICAL USE OF BYPRODUCT MATERIAL Subpart D: Unsealed Byproduct Material

Case s e study: Bringi ging 3 g 3D P Printed ed Elec ectronics i into Mass P Producti

Distribution Regression o ( Zolt an Szab Ecole Polytechnique) - PowerPoint PPT Presentation

Distribution Regression o ( Zolt an Szab Ecole Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnab as P oczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Public Health Promise or Peril? The Rise of E-Cigarettes and Implications for Tobacco Control

How Recycling Works in Massachusetts Where does your recycling go? Dos and Dont of

PREVIEWING THE 2021 TCN POLICY RECOMMENDATIONS GUIDE TCN Virtual Coffee Chat Series July 28,

Remote Sensing tools from Ground, Airborne and Space: Measuring radiation and designing in

Chapter 4 Response of the climate system to a perturbation Climate system dynamics and modelling

Atmospheric calibration of the Cherenkov Telescope Array Jan Ebr for the CTA Consortium

REVISED 10 CFR PART 35: MEDICAL USE OF BYPRODUCT MATERIAL Subpart D: Unsealed Byproduct Material

Case s e study: Bringi ging 3 g 3D P Printed ed Elec ectronics i into Mass P Producti

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and