stochastic quasi gradient methods variance reduction via
play

Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian - PowerPoint PPT Presentation

Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching Peter Richtrik Randomized Numerical Linear Algebra and Applications (Program: Foundations of Data Science) Simons Institute for the Theory of Computing, UC Berkeley


  1. Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching Peter Richtárik Randomized Numerical Linear Algebra and Applications (Program: Foundations of Data Science) Simons Institute for the Theory of Computing, UC Berkeley September 24-27, 2018

  2. Outline 1. Introduction 2. Jacobian Sketching 3. Controlled Stochastic Reformulations 4. JacSketch and SAGA 5. Iteration Complexity of JacSketch 6. Experiments

  3. 1. Introduction

  4. Finite Sum Minimization Problem n x ∈ R d f ( x ) := 1 X min f i ( x ) n i =1 Data vector Label L2 regularizer i x � y i ) 2 + λ L2 regularized least squares f i ( x ) = 1 2 ( a > 2 k x k 2 (ridge regression) f i ( x ) = 1 + λ L2 regularized logistic regression ⇣ i x ⌘ 1 + e − y i a > 2 k x k 2 2 log

  5. Stochastic Gradient Methods Current iterate Stepsize x k +1 = x k − α g k Next iterate Unbiased estimator of the gradient: g k ⇤ = r f ( x k ) ⇥ E

  6. Variance Matters k g k � r f ( x k ) k 2 ⇤ g k ⇤ ⇥ ⇥ := E V g k ⇤ ⇥ E Gradient Descent (GD) g k r f ( x k ) g k ⇤ ⇥ V = 0 Stochastic Gradient Descent (SGD) g k r f i ( x k ) g k ⇤ ⇥ = BIG V

  7. GD vs SGD Gradient Descent Stochastic Gradient Descent (GD) (SGD) x ∗ x ∗ x 0 x 0

  8. Variance Reduction Decreasing Mini- Importance Adjusting the stepsizes batching sampling direction Sample more Duality (SDCA) How does it Scaling down More samples, important data or Control work? the noise less variance (or parameters) Variate (SVRG, more often S2GD, SAGA) A bit (SVRG, S2GD) or a lot Slow down; Might overfit More work per CONS: Hard to tune probabilities to (SDCA, SAGA) iteration the stepsize outliers more memory needed Improved Improved Still converges PROS: Parallelizable condition dependence on Widely known number epsilon All tricks can be combined!

  9. 2. Jacobian Sketching (JacSketch as a Stochastic Quasi-Gradient Method) Robert M Gower, Peter Richtárik and Francis Bach Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching arXiv:1805.02632, 2018

  10. Lift and Sketch

  11. Lift and Sketch LIFT 1 Jacobian of F   f 1 ( x ) f 2 ( x )   F ( x ) =  ∈ R n . r F ( x ) = [ r f 1 ( x ) , r f 2 ( x ) , . . . , r f n ( x )] 2 R d × n   .   .  f n ( x ) i th unit basis vector SKETCH 2 Vector of all ones 1 n r F ( x ) e = r f ( x ) r F ( x ) e i = r f i ( x ) Leads to Stochastic Gradient Descent Leads to Gradient Descent

  12. Introducing General Sketches We would like to solve the linear matrix equation: Too expensive to solve! J = r F ( x k ) d n Solve a random linear matrix equation instead: Random matrix S k ∼ D JS k = r F ( x k ) S k Has many solutions: which solution to pick? q Jacobian sketch

  13. Sketch and Project

  14. <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> Sketch and Project New Jacobian Current Jacobian Frobenius norm estimate estimate J k +1 := J ∈ R d × n k J � J k k arg min JS k = r F ( x k ) S k subject to Solution: Random LME J k +1 = J k + ( r F ( x k ) � J k ) Π S k ensuring consistency with Jacobian sketch � † S > def S > � = S k k S k Π S k k

  15. Sketch and Project I Original sketch and project • 2017 IMA Fox Prize (2 nd Prize) in Numerical Analysis Robert Mansel Gower and P.R. • Most downloaded SIMAX paper (2017) Randomized Iterative Methods for Linear Systems SIAM J. Matrix Analysis and Applications 36(4) : 1660-1690, 2015 Removal of full rank assumption + duality Robert Mansel Gower and P.R. Stochastic Dual Ascent for Solving Linear Systems arXiv:1512.06890 , 2015 Inverting matrices & connection to quasi-Newton updates Robert Mansel Gower and P.R. Randomized Quasi-Newton Methods are Linearly Convergent Matrix Inversion Algorithms SIAM J. on Matrix Analysis and Applications 38(4), 1380-1409, 2017 Computing the pseudoinverse Robert Mansel Gower and P.R. Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse arXiv:1612.06255, 2016 Application to machine learning Robert Mansel Gower, Donald Goldfarb and P.R. Stochastic Block BFGS: Squeezing More Curvature out of Data ICML 2016 Sketch and project revisited: stochastic reformulations of linear systems P.R. and Martin Takáč Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory arXiv:1706.01108, 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend