Semi-Cyclic SGD Hubert Eichner Tomer Koren Brendan McMahan Kunal - PowerPoint PPT Presentation

Feb 13, 2024 •108 likes •198 views

Semi-Cyclic SGD Hubert Eichner Tomer Koren Brendan McMahan Kunal Talwar Google Google Google Google Nati Srebro +1 , SGD is great +1

Semi-Cyclic SGD Hubert Eichner Tomer Koren Brendan McMahan Kunal Talwar Google Google Google Google Nati Srebro
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 𝑥 𝑈 ෝ SGD is great ……
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 A b A g A z B e B o B y C h C l C o C o C u D a D e D i D o D r D y E d E f 𝑥 𝑈 ෝ E l E n E p E r E s E t E x F a F i F l F o F r F u G e G i SGD is great …… G l G m G r H a H i H o if you run on iid (randomly shuffled) data
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 1 𝑛 σ 𝑗 𝒠 𝑗 overall distribution: 𝒠 = 𝑥 𝑈 ෝ SGD is great …… if you run on iid (randomly shuffled) data Cyclically varying (not fully shuffled) data
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 1 𝑛 σ 𝑗 𝒠 𝑗 overall distribution: 𝒠 = 𝑥 𝑈 ෝ SGD is great …… if you run on iid (randomly shuffled) data Cyclically varying (not fully shuffled) data, e.g. in Federated Learning • Train model by executing SGD steps on user devices when device available (plugged in, idle, on WiFi) • Diurnal variations (e.g. Day vs night available devices; US vs UK vs India)
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 1 𝑛 σ 𝑗 𝒠 𝑗 overall distribution: 𝒠 = 𝑥 𝑈 ෝ • Train ෝ 𝑥 𝑈 by running block-cyclic SGD ➔ could be MUCH slower, by an arbitrary large factor
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 𝑥 1 ෝ 𝑥 2 ෝ • Train ෝ 𝑥 𝑈 by running block-cyclic SGD ➔ could be MUCH slower, by an arbitrary large factor 𝑥 𝑗 for each block 𝑗 = 1. . 𝑛 Pluralistic approach: learn different ෝ 𝑥 𝑗 separately on data from that block (across all cycles) • Train each ෝ ➔ could be slower/less efficient by a factor of 𝑛
𝑥 𝑢+1 ← 𝑥 𝑢 − 𝜃∇𝑔 𝑥 𝑢 , 𝑨 𝑢 Samples in block 𝑗 = 1. . 𝑛 are sampled from as 𝑨 𝑢 ∼ 𝒠 𝑗 𝑥 1 ෥ 𝑥 2 ෥ • Train ෝ 𝑥 𝑈 by running block-cyclic SGD ➔ could be MUCH slower, by an arbitrary large factor 𝑥 𝑗 for each block 𝑗 = 1. . 𝑛 Pluralistic approach: learn different ෝ 𝑥 𝑗 separately on data from that block (across all cycles) • Train each ෝ ➔ could be slower/less efficient by a factor of 𝑛 𝑥 𝑗 using single SGD chain+ “ pluralistic averaging ” • Our solution: train ෥ ➔ exactly same guarantee as if using random shuffling (no degradation) ➔ no extra comp. cost, no assumptions about 𝓔 𝒋 nor relatedness

Recommend

SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which

CSE 547/Stat 548: Machine Learning for Big Data Lecture SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which SGD can be made optimal, if we perform averaging. SGD itself is really not optimal,

272 views • 4 slides

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

CS6501: Deep Learning for Visual Recognition Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap Regression vs Classification Generalization / Overfitting / Underfitting Regularization

634 views • 30 slides

Homotopy theory of Segal cyclic operads Philip Hackney, Marcy Robertson, Donald Yau Cyclic

Homotopy theory of Segal cyclic operads Philip Hackney, Marcy Robertson, Donald Yau Cyclic Operads Cyclic operads Operad P . n = Aut{1,,n} acts on P(n) Cyclic operads Operad P . n = Aut{1,,n} acts on P(n) Extend the n

719 views • 47 slides

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Cyclic Codes ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of Colorado Spring 2007 Peter Mathys ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes, Basic Definitions Cyclic Codes

739 views • 29 slides

A study of cyclic codes BCH and Reed-Solomon code Welington Santos UFPR January 2015 Cyclic

A study of cyclic codes BCH and Reed-Solomon code Welington Santos UFPR January 2015 Cyclic Codes Definition A linear ( n, k ) code C over F q is called cyclic if ( a 0 , a 1 , . . . , a n 1 ) C implies ( a n 1 , a 0 , . . . , a n

85 views • 6 slides

Week 9 Difference Equations Discrete Math April 23, 2020 Marie Demlova: Discrete Math Cyclic

Cyclic groups Difference Equations, Recursive Equations Week 9 Difference Equations Discrete Math April 23, 2020 Marie Demlova: Discrete Math Cyclic groups Subgroups of a Finite Cyclic Group Difference Equations, Recursive Equations Cyclic

405 views • 13 slides

Soft Gamma-ray Polarimetry with ASTRO-H SGD August 23, 2014 HEAPA Symposium on Future

Soft Gamma-ray Polarimetry with ASTRO-H SGD August 23, 2014 HEAPA Symposium on Future missions@ISAS T. Mizuno (Hiroshima Univ.) on behalf of the SGD team and Polarization team T. Mizuno et al. 1 ASTRO-H & SGD Objectives of ASTRO-H

367 views • 18 slides

3 rd ISSMGE McClelland Lecture Cyclic soil parameters for offshore foundation design Knut H.

3 rd ISSMGE McClelland Lecture Cyclic soil parameters for offshore foundation design Knut H. Andersen Norwegian Geotechnical Institute Cyclic soil parameters for offshore foundation design Main goals Cyclic contour diagram framework Data

820 views • 79 slides

On quasi-cyclic codes as a generalization of cyclic codes Morgan Barbier

On quasi-cyclic codes as a generalization of cyclic codes Morgan Barbier morgan.barbier@unicaen.fr Joint work with: Christophe Chabot Guillaume Quintin University of Caen GREYC Dinard, C2 October 9th 2012 1 / 23 Outline

428 views • 26 slides

Cyclic Codes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering

Cyclic Codes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 26, 2014 1 / 25 Cyclic Codes Definition A cyclic shift of a vector v 0 v 1 v n

559 views • 25 slides

An Introduction to Cyclic Proofs James Brotherston University College London PARIS workshop,

An Introduction to Cyclic Proofs James Brotherston University College London PARIS workshop, FLoC, Oxford, 7th July 2018 1/ 21 Cyclic pre-proofs A cyclic pre-proof is a derivation tree with a backlink from each open leaf (bud) to an

819 views • 62 slides

Holography via Dynamic Cyclic Spectroscopy Paul Demorest Willem van Straten (NRAO) (Auckland

Holography via Dynamic Cyclic Spectroscopy Paul Demorest Willem van Straten (NRAO) (Auckland U. Tech.) Mark Walker (Manly Astrophysics) Overview Cyclic spectroscopy Phases and phase retrieval Dynamic cyclic spectroscopy New approach to

328 views • 18 slides

Efficient Compilation of Cyclic Esterel Programs Jan Lukoschus Reinhard von Hanxleden

Classes of Cyclic Esterel Programs Existing solutions Proposal 1: Runtime solution Proposal 2: Static Partial Evaluation Proposal 3: Esterel preprocessing for cyclic signals Efficient Compilation of Cyclic Esterel Programs Jan Lukoschus

943 views • 60 slides

Computational complexity of lattice problems and cyclic lattices Lenny Fukshansky Claremont

Lattices Computational complexity Complexity of cyclic lattices Well-rounded cyclic lattices Computational complexity of lattice problems and cyclic lattices Lenny Fukshansky Claremont McKenna College Undergraduate Summer Research Program

1.17k views • 65 slides

An Introduction to Cyclic Proofs (part II) James Brotherston University College London PARIS

An Introduction to Cyclic Proofs (part II) James Brotherston University College London PARIS workshop, FLoC, Oxford, 8th July 2018 1/ 13 Cyclic proofs Cyclic pre-proofs are derivation trees with backlinks: (Axiom)

142 views • 13 slides

On the various definitions of cyclic operads Category Theory 2015, Aveiro Pierre-Louis Curien and

Cyclic operads revisited The -syntax Microcosm principle for cyclic operads On the various definitions of cyclic operads Category Theory 2015, Aveiro Pierre-Louis Curien and Jovana Obradovi c r 2 team, PPS Laboratory, CNRS, Universit

528 views • 24 slides

Intraday Trading Invariants for Equity-Index Futures Torben G. Andersen, Oleg Bondarenko, Albert

Intraday Trading Invariants for Equity-Index Futures Torben G. Andersen, Oleg Bondarenko, Albert S. Kyle and Anna Obizhaeva Fields Institute Toronto, Canada January 2015 Andersen, Bondarenko, Kyle, and Obizhaeva Intraday Invariance 1/29

404 views • 29 slides

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google CRA-W Undergraduate Town Hall

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google CRA-W Undergraduate Town Hall April 5 th , 2018 Speaker & Moderator The image part with relationship ID rId3 was not found in the file. Lori Pollock Kathryn S McKinley Dr.

678 views • 40 slides

The Met Office Weakly-Coupled Atmosphere/Land/Ocean/Sea-Ice Data Assimilation System Isabelle

The Met Office Weakly-Coupled Atmosphere/Land/Ocean/Sea-Ice Data Assimilation System Isabelle Mirouze, Dan Lea, Matt Martin, Ann Shelly, Adrian Hines, Peter Sykes Technical design First assessments Analysis runs Forecast runs Conclusion and

598 views • 34 slides

Dr. Ramanan Krishnamoorti Chief Energy Officer UH Energy Hydrogen October 23 rd October 30 th

Dr. Ramanan Krishnamoorti Chief Energy Officer UH Energy Hydrogen October 23 rd October 30 th Circular Plastics Economy To learn more about the Houston: Low-Carbon Energy Capital Four Ways Forward series visit:

873 views • 37 slides

SHAPE ANALYSIS OF FUNCTIONAL DATA Anuj Srivastava Joint work with Sutanoy Dasgupta, Ian Jermyn,

SHAPE ANALYSIS OF FUNCTIONAL DATA Anuj Srivastava Joint work with Sutanoy Dasgupta, Ian Jermyn, Debdeep Pati Department of Statistics, Florida State University Presented at Statistical Modeling for Shapes and Imaging Workshop, IHP , Paris,

1.36k views • 41 slides

Covariances for four reference points t + 2 s ,t + m s ,t M s

Recent, current, and future issues for large scale space-time and nonlinear INLA Finn Lindgren ( finn.lindgren@ed.ac.uk ) Avignon 2018-11-08 GMRFs based on SPDEs (Lindgren et al., 2011) GMRF representations of SPDEs can be constructed for

440 views • 40 slides

Energies Douglas Arent, Ph.D., Deputy Associate Laboratory Director, Scientific Computing and

Technology, Policy and Finance for Clean and Renewable Energies Douglas Arent, Ph.D., Deputy Associate Laboratory Director, Scientific Computing and Energy Analysis UNU-WIDER; September 2018 NREL at a Glance nearly $872M 1,800 750

415 views • 24 slides

Feed Following consumers producers Yahoo!: over 650 million

Feed Following CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 16 : 590.02 Spring 13 1 Feed Following consumers producers Yahoo!: over 650

592 views • 35 slides