SWALP: Stochastic Weight Averaging in Low-Precision Training - PowerPoint PPT Presentation

Dec 22, 2023 •306 likes •514 views

SWALP: Stochastic Weight Averaging in Low-Precision Training Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa Low-precision Computation Problem Statement We study how to leverage

SWALP: Stochastic Weight Averaging   in Low-Precision Training Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Christopher De Sa
Low-precision Computation
Problem Statement We study how to leverage low-precision training to obtain a high-accuracy model.
Problem Statement We study how to leverage low-precision training to obtain a high-accuracy model. Output model can be higher-precision.
SWALP SWALP SGD-LP model model
SWALP SWALP SGD-LP model model Updating
SWALP SWALP SGD-LP Every c model model iterations Averaging Updating
SWALP Infrequently SWALP SGD-LP Every c model model iterations Averaging Updating
  Convergence Analysis Let T be the number of iterations.   Theorem 1 (quadratic)   SWALP converges to the optimal solution   at a O(1/T) rate.
  Convergence Analysis Let T be the number of iterations.   Theorem 1 (quadratic)   SWALP converges to the optimal solution   at a O(1/T) rate. SWALP has the same convergence rate   as full precision SGD.
  Convergence Analysis Let δ be the quantization gap.   Theorem 2 (strongly convex)   The expected distance between SWALP solution   and the optimal one is bounded by O( δ ^2).
  Convergence Analysis Let δ be the quantization gap.   Theorem 2 (strongly convex)   The expected distance between SWALP solution   and the optimal one is bounded by O( δ ^2). • The best bound for SGD-LP is O( δ )   (Li et al, NeurIPS 2017). • SWALP requires half the number of bits to   reduce the noise ball by the same factor.
Experiments
Experiments 1.3 2.9 0.8 2.3
Experiments
Poster @ Pacific Ballroom #58 SWALP Codes QPyTorch:   A Low-Precision   Framework

Recommend

Value Averaging I nvesting The Strategy for Enhancing Investment Returns What is Value Averaging?

Value Averaging I nvesting The Strategy for Enhancing Investment Returns What is Value Averaging? It is a combination of Dollar Cost Averaging and Portfolio Rebalancing It is an averaging technique where the portfolio value increases in a

306 views • 27 slides

Reynolds Averaging Reynolds Averaging We separate the dynamical fields into slowly varying mean

Reynolds Averaging Reynolds Averaging We separate the dynamical fields into slowly varying mean fields and rapidly varying turbulent components. Reynolds Averaging We separate the dynamical fields into slowly varying mean fields and rapidly

618 views • 42 slides

Bayesian model averaging Dr. Jarad Niemi STAT 544 - Iowa State University March 9, 2017 Jarad

Bayesian model averaging Dr. Jarad Niemi STAT 544 - Iowa State University March 9, 2017 Jarad Niemi (STAT544@ISU) Bayesian model averaging March 9, 2017 1 / 27 Outline Bayesian model averaging BIC model averaging Model search Parameter

385 views • 27 slides

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi (Iowa State) Bayesian model averaging September 7, 2017 1 / 30 Bayesian model averaging Bayesian model averaging Let { M : } indicate

423 views • 30 slides

cProbLog: Restricting the Possible Worlds of Probabilistic Logic Programs Dimitar Shterionov

cProbLog: Restricting the Possible Worlds of Probabilistic Logic Programs Dimitar Shterionov Prof. Gerda Janssens 1 Weight: 3 Weight: 4 Weight: 8 Weight: 6 2 Weight: 3 Weight: 4 0.33 Weight: 8 Weight: 6 0.25 0.125 0.16 3 Weight:

671 views • 63 slides

Mixed Precision Training PAI Overview What is mixed-precision

Mixed Precision Training PAI Overview What is mixed-precision & Why mixed-precision How mixed-precision Mixed-precision tools on PAI-tensorflow Experimental results 1 What is mixed-precision

1.38k views • 45 slides

Dr. Najaf Masood Assistant Professor Pediatrics BBH Rawalpindi Low Birth Weight Normal birth

Dr. Najaf Masood Assistant Professor Pediatrics BBH Rawalpindi Low Birth Weight Normal birth weiht (NBW) 2500-3900g Low Birth Weight < 2500g Very Low Birth Weight <1500g Extremely Low Birth Weight <1000g

704 views • 45 slides

FAST UNCERTAINTY ESTIMATES AND BAYESIAN MODEL AVERAGING OF DNNS WESLEY MADDOX JOINT WORK WITH

1 FAST UNCERTAINTY ESTIMATES AND BAYESIAN MODEL AVERAGING OF DNNS WESLEY MADDOX JOINT WORK WITH TIMUR GARIPOV , PAVEL IZMAILOV , DMITRY VETROV , ANDREW GORDON WILSON 2 SUMMARY Stochastic Weight Averaging (Izmailov et al, UAI,

301 views • 19 slides

Time (integrator) parallel exponential integration and phase-averaging for geophysical fluid

Time (integrator) parallel exponential integration and phase-averaging for geophysical fluid dynamics Colin Cotter September 28, 2019 Colin Cotter Averaging Timescales in atmospheric flows Colin Cotter Averaging Linear shallow water

360 views • 25 slides

Capital Budgeting: CoC Averaging (Welch, Chapter 13-2) Ivo Welch Averaging (Opportunity) CoC

Capital Budgeting: CoC Averaging (Welch, Chapter 13-2) Ivo Welch Averaging (Opportunity) CoC Value Creation by Diversification Does combining two unrelated projects creates a lower-risk firm? Can firms create value by reducing risk through

651 views • 40 slides

Averaging kernels and their use in validating AIRS temperature and water vapor A work in

Averaging kernels and their use in validating AIRS temperature and water vapor A work in progress Bill Irion - April 17, 2008 With thanks to Evan Manning and Van Dang Whats an averaging kernel? The averaging kernel matrix is a measure of

511 views • 13 slides

Gemstones a Unit of Weight Gemstones a Unit of Weight The historical unit of weight

Gemstones a Unit of Weight Gemstones a Unit of Weight The historical unit of weight for gemstones has been the Carat - the weight of a single seed from the seedpod of the carob tree (Ceratonia Siliqua) hence latin - siliqua

498 views • 12 slides

INTRODUCING Connecting Weight Loss Patients Directly to your Weight Loss Center Physicians Weight

INTRODUCING Connecting Weight Loss Patients Directly to your Weight Loss Center Physicians Weight Loss Network is a premier patient referral program. We are the industrys largest direct marketer that solely focuses on medical weight loss

393 views • 7 slides

Formulation and development of foods for weight management Paola Vitaglione Weight control and

Formulation and development of foods for weight management Paola Vitaglione Weight control and energy balance Weight Weight Weight maintenance gain loss ENERGY IN ENERGY OUT Food intake: Physical activity (15-30%) Carbohydrates

921 views • 69 slides

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

On defining the generalized rank weight Ruud Pellikaan joint work with Relinde Jurrius Autonomous University Barcelona, 6 November 2014 /k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight enumerator

165 views • 15 slides

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Stochastic Gradient Algorithm Connexion with Stochastic Approximation Asymptotic Efficiency and Averaging Practical Considerations Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization Stochastic

338 views • 33 slides

Fredrik Kahl Chalmers University of Technology Collaborators Carl Olsson Anders

Rotation Averaging and Strong Duality Fredrik Kahl Chalmers University of Technology Collaborators Carl Olsson Anders Eriksson Viktor Larsson Tat-Jun Chin Chalmers/Lund University of ETH Zurich University of

833 views • 45 slides

SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which

CSE 547/Stat 548: Machine Learning for Big Data Lecture SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which SGD can be made optimal, if we perform averaging. SGD itself is really not optimal,

272 views • 4 slides

The ROI Workforce for Public Health J. Mac McCullough, PhD, MPH Assistant Professor

The ROI Workforce for Public Health J. Mac McCullough, PhD, MPH Assistant Professor School for the Science of Health Care Delivery Arizona State University The LHD Health Economist Workforce The position Health Economist is

355 views • 17 slides

1.1 Solving Linear Equations Revenue The amount of money brought into a business through sales.

1.1 Solving Linear Equations Revenue The amount of money brought into a business through sales. Revenue is often calculated as Revenue = price quantity sold Cost The amount of money spent by a business to create and/or sell a product.

580 views • 28 slides

Average-case Acceleration Through Spectral Density Estimation Fabian Pedregosa (Google Research)

Average-case Acceleration Through Spectral Density Estimation Fabian Pedregosa (Google Research) Damien Scieur (Samsung SAIT AI Lab, Montral) International Conference on Machine Learning 2020 Complexity Analysis in Optimization Worst-case

271 views • 12 slides

Resolution Effects and Local Averaging in Turbulence Simulations up to 4 Trillion Grid Points

Resolution Effects and Local Averaging in Turbulence Simulations up to 4 Trillion Grid Points P.K. Yeung (PI) Schools of AE and ME, Georgia Tech E-mail: pk.yeung@ae.gatech.edu NSF: PRAC (1036170, 1640771) and Fluid Dynamics Programs BW Team,

792 views • 20 slides

4.1 Discrete Differential Geometry Hao Li http://cs599.hao-li.com 1 Outline Discrete

Spring 2015 CSCI 599: Digital Geometry Processing 4.1 Discrete Differential Geometry Hao Li http://cs599.hao-li.com 1 Outline Discrete Differential Operators Discrete Curvatures Mesh Quality Measures 2 Differential Operators on

1.02k views • 62 slides

Averaging for non-homogeneous switched DAEs Stephan Trenn Technomathematics group, University of

Averaging for non-homogeneous switched DAEs Stephan Trenn Technomathematics group, University of Kaiserslautern, Germany joint work with E. Mostacciuolo, F. Vasca (Universit` a del Sannio, Benevento, Italy) 54th IEEE Conference on Decision and

516 views • 16 slides