scalable non parametric statistical estimation
play

Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT - PowerPoint PPT Presentation

Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6, 2017 Statistics Statistical model Performance measure Estimator Convergence: F (# obs ) Optimization Statistics Minimize a given function


  1. Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6, 2017

  2. Statistics Statistical model Performance measure Estimator Convergence: F (# obs )

  3. Optimization Statistics Minimize a given function Statistical model Algorithm focused Performance measure Scales with dimension and Estimator observations Convergence: F (# obs ) Convergence: F (#iter)

  4. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter)

  5. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Regression Square loss Tikhonov regularization

  6. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Stochastic Regression algorithms Square loss First order methods Tikhonov regularization Few passes on the data

  7. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015

  8. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression.

  9. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) .

  10. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  11. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  12. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  13. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  14. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ).

  15. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation.

  16. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation. Depending on assumptions on: ◮ the Gaussian complexity of the unit ball of the kernel space, ◮ the smoothness in H of the optimal predictor f ∗ ( X ) = E [ Y | X ].

  17. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ H L 2 ρ X

  18. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ H H L 2 L 2 ρ X ρ X

  19. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X

  20. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence.

  21. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  22. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  23. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  24. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015

  25. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015 Faster Rates for Least-Squares Regression, Tech. report, 2016

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend