Average-case Acceleration Through Spectral Density Estimation - PowerPoint PPT Presentation

Dec 26, 2022 •147 likes •271 views

Average-case Acceleration Through Spectral Density Estimation Fabian Pedregosa (Google Research) Damien Scieur (Samsung SAIT AI Lab, Montral) International Conference on Machine Learning 2020 Complexity Analysis in Optimization Worst-case

Average-case Acceleration Through Spectral Density Estimation Fabian Pedregosa (Google Research) Damien Scieur (Samsung SAIT AI Lab, Montréal) International Conference on Machine Learning 2020
Complexity Analysis in Optimization Worst-case analysis ✓ Bound on the complexity for any input. ✗ Potentially worse than observed runtime. Simplex method (Dantzig, '98, Spielman & Teng '04) ✗ Exponential worst-case. ✓ Runtime typically polynomial.
Average-case Complexity ✓ Complexity averaged over all problem instances. ✓ Representative of the typical complexity. Betuer bounds, sometimes betuer algorithms → Quicksoru (Hoare ’62): Fast average-case soruing Rarely used in optimization
Main contributions Average-case analysis for optimization on quadratics. Optimal methods under this analysis.
Problem Distribution: Random Quadratics where H , x ★ are random matrix, vector. ✓ exact runtime known depends on eigenvalues( H ). ✓ shares (some) dynamics of real problems, e.g., Neural Tangent Kernel (Jacot et al., 2018).
Example: Random Least Squares When elements of A are iid, standardized: Spectrum of H will be close to Marchenko-Pastur.
Expected Error For Gradient-Based Methods Fixed R 2 is the distance to optimum at initialization . Problem diffjculty represented by expected density Hessian eigenvalue d 𝜈 is a polynomial of degree t determined from the optimization algorithm. P t Flexible: algorithm design
Average-case Optimal Method Goal : Find method with minimal expected error = Algorithms ↔ Polynomials of degree t that minimizes expected error (with Find polynomial P t proper normalization). Solution: Polynomial of degree t, oruhogonal wru to λd 𝜈 (λ) .
Marchenko-Pastur Acceleration Model for d 𝜈 = Marchenko-Pastur(r, 𝛕 ). r and 𝛕 estimated from: - Largest eigenvalue No need to know strong convexity constant. - Trace of H Algorithm Simple momentum-like method, low memory requirements.
Decaying Exponential Acceleration Model for d 𝜈 = decaying exponential(λ 0 ). Unbounded largest eigenvalue. Only access to Tr( H ). Algorithm - Decaying step-size - Similar to Polyak averaging
Benchmarks: Least Squares
Conclusions Average-case analysis based on random quadratics. Optimal methods under difgerent eigenvalue distribution. ✓ Acceleration without knowledge of strong convexity. In paper + More methods, convergence rates, empirical extension to non-quadratic objectives. Follow-up work on asymptotic analysis (Scieur and P., "Universal Average-Case Optimality of Polyak Momentum" )

Recommend

Average-Case Acceleration Through Spectral Density Estimation and Universal Asymptotic Optimality

Average-Case Acceleration Through Spectral Density Estimation and Universal Asymptotic Optimality of Polyak Momentum Fabian Pedregosa (Google Brain, Montreal) Damien Scieur (SAIT AI Lab, Montreal) Motivation Consider the convex, quadratic

1.09k views • 59 slides

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of some matrices Involve eigen-decomposition (or spectral decomposition) Seungjin Choi Spectral clustering methods: Algorithms that cluster data

501 views • 10 slides

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

2/5/2015 Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from height of 25 mm Maximum Density Vibrate for 8 minutes 2 2/5/2015 Relative Density Correlations Clean sand (0 F c 5%) Sand w/

373 views • 13 slides

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion with constant acceleration, the instantaneous and average accelera- tions are equal so we have a x = a x = v x t = v xf v xi . t f t

385 views • 4 slides

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density The problem of spectral estimation is very large and has applications very different from each other Applications: To study the vibrations of

826 views • 79 slides

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey Kingyens and J. Gregory Steffan Electrical and Computer Engineering University of Toronto 1 FGPA-Based Acceleration In-socket acceleration

343 views • 22 slides

Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim

Average Connectivity and Average Edge-connectivity Average Connectivity and Matching Average Edge-connectivity in Regular Graphs Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim University of

869 views • 68 slides

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene (LDPE) Linear Low Density Polyethylene (LLDPE) 1 Low Density Polyethylene (LDPE) High Density Polyethylene (HDPE ) High degree of

528 views • 17 slides

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk aggregate including the volume of the individual particles and the volume of the voids between them. m kg agg bulk density 3 V m

1.45k views • 30 slides

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral Learning Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words An

410 views • 22 slides

Power Spectral Density Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Power Spectral Density Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay April 10, 2013 1 / 9 Power Spectral Density Fourier transform X ( f ) = x ( t ) exp (

352 views • 9 slides

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for grade level-acceleration entering fourth grade OR entering fifth grade. Enables students to be placed in the M4+ pathway upon entering middle

526 views • 14 slides

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2 Outline Introduction Motivation for Beam Driven Plasmas Wakefield Acceleration Experiments Electron and proton driven PWA

827 views • 63 slides

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

IPQ806x Hardware acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration model Features Designed for Home Gateways (CPE) Flow detection based All -or- nothing offload Acceleration

248 views • 8 slides

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

The Dark Matter density F. Nesti Problem The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers Fabrizio Nesti Fits Annihilation Local density Universit` a dellAquila, Italy Method Data: Sun

428 views • 41 slides

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm Protons 60/15cm flatoptics Protons at Injection Conclusion Summary 10Hz Spectral Lines 3 Introduction In the context of the luminosity problems

318 views • 30 slides

SWALP: Stochastic Weight Averaging in Low-Precision Training Guandao Yang, Tianyi Zhang,

SWALP: Stochastic Weight Averaging in Low-Precision Training Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa Low-precision Computation Problem Statement We study how to leverage

514 views • 19 slides

Fredrik Kahl Chalmers University of Technology Collaborators Carl Olsson Anders

Rotation Averaging and Strong Duality Fredrik Kahl Chalmers University of Technology Collaborators Carl Olsson Anders Eriksson Viktor Larsson Tat-Jun Chin Chalmers/Lund University of ETH Zurich University of

833 views • 45 slides

SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which

CSE 547/Stat 548: Machine Learning for Big Data Lecture SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which SGD can be made optimal, if we perform averaging. SGD itself is really not optimal,

272 views • 4 slides

The ROI Workforce for Public Health J. Mac McCullough, PhD, MPH Assistant Professor

The ROI Workforce for Public Health J. Mac McCullough, PhD, MPH Assistant Professor School for the Science of Health Care Delivery Arizona State University The LHD Health Economist Workforce The position Health Economist is

355 views • 17 slides

Resolution Effects and Local Averaging in Turbulence Simulations up to 4 Trillion Grid Points

Resolution Effects and Local Averaging in Turbulence Simulations up to 4 Trillion Grid Points P.K. Yeung (PI) Schools of AE and ME, Georgia Tech E-mail: pk.yeung@ae.gatech.edu NSF: PRAC (1036170, 1640771) and Fluid Dynamics Programs BW Team,

792 views • 20 slides

4.1 Discrete Differential Geometry Hao Li http://cs599.hao-li.com 1 Outline Discrete

Spring 2015 CSCI 599: Digital Geometry Processing 4.1 Discrete Differential Geometry Hao Li http://cs599.hao-li.com 1 Outline Discrete Differential Operators Discrete Curvatures Mesh Quality Measures 2 Differential Operators on

1.02k views • 62 slides

Reynolds Averaging Reynolds Averaging We separate the dynamical fields into slowly varying mean

Reynolds Averaging Reynolds Averaging We separate the dynamical fields into slowly varying mean fields and rapidly varying turbulent components. Reynolds Averaging We separate the dynamical fields into slowly varying mean fields and rapidly

618 views • 42 slides

Bayesian model averaging Dr. Jarad Niemi STAT 544 - Iowa State University March 9, 2017 Jarad

Bayesian model averaging Dr. Jarad Niemi STAT 544 - Iowa State University March 9, 2017 Jarad Niemi (STAT544@ISU) Bayesian model averaging March 9, 2017 1 / 27 Outline Bayesian model averaging BIC model averaging Model search Parameter

385 views • 27 slides