Learning Unitaries with gradient descent optimization Reevu Maity - PowerPoint PPT Presentation

Learning Unitaries with gradient descent optimization Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13

Outline • We consider classical optimization algorithms to learn / simulate parametrized unitary transformations generated by two Hamiltonians applied in alternation ( QAOA ). • Gradient Descent algorithms are first order classical methods widely studied in the machine learning community for convex optimization. • Recently, it was shown that QAOA can be used for universal quantum computation. S. Lloyd (2018) Any unitary can be simulated with parameters via the alternating operator method. S. Lloyd & RM (2019) • Aim : Study the learnability / simulability of unitaries under the alternating operator / QAOA formalism with gradient descent.

Problem QAOA Unitary : are random matrices of dimension sampled from the GUE and . Learning problem • Given access to a target unitary and knowledge of , can we simulate by a sequence using gradient descent on all parameters such that ? What is the time complexity = minimum number of parameters + total number of gradient descent steps ? • Suppose is a shallow depth unitary (say, depth-4 with parameters ), can we find a sequence such that ?

Non-Convex Optimization QAOA Unitary : • The space of the set of unitaries is in general non-convex. • Standard gradient descent algorithms do not converge in non-convex spaces. • Gradient descent usually gets stuck at some local critical point where .

Non-Convex Optimization • Second order optimization techniques (eg. Newton’s method : calculate Hessian and then it’s inverse) require, a) at least time for a Hessian matrix of dimension . b) fine tuning of hyperparameters. • Gradient descent methods can be powerful due to their computational efficiency from the above perspectives. Require time to calculate gradients for parameters, fine tuning not required. • Can gradient descent optimization enable us to learn paramterized/QAOA unitaries ?

Results so far QAOA Unitary : • We find that gradient descent optimization requires at least parameters in to approximate with accuracy .where is sampled from a parameter manifold of dimension . ≤ The rate of learning increases when gradient descent is done in overparametrized spaces with dimension . ≥ . However • We propose a greedy algorithm for learning low-depth in time ≪ the success probability of efficient learning in non-convex spaces is not ideal.

Gradient Descent Some basic equations for gradient descent optimization , • Loss function : = • Gradients : , • Learning rate : , fixed to a certain value during the entire iteration . e.g. = 0.001 • Parameter update : , Aim : Optimize to a desired accuracy .

Learning with gradient descent In this work, Loss Function : = . Aim : Learn to an accuracy . Simulations for 32 dimension target unitaries with or 512 parameters while varying the number of learning parameters in .

Gradient Descent Numerics A `transition ’ occurs when gradient descent is performed in the overparameterized domain, . The rate of learning increases as we do gradient descent on more parameters beyond .

Gradient Descent Numerics (Contd.) Underparametrized Overparametrized α = rate of learning For the first 200 gradient descent steps, Loss = κ (no. of grad. descent steps) −α The underparametrized models learn following a power law while the overparametrized models learn faster than the power law.

A Greedy Algorithm for low depth QAOA unitary Can we learn low depth with << parameters ? A layer = Pseudocode : Given access to with known . 1. 𝑏 0 = Initial Loss = . 2. Add a layer to with parameters . Cost function = . 3. Perform gradient descent on to obtain optimized . 𝑏 1 = Updated Loss = . and 𝑏 1 < 𝑏 0 . 4. Add a new layer with parameters to the layer in the previous step. Updated Cost function = . 5. Perform gradient descent on to obtain optimized 𝑏 2 = Updated Loss = and 𝑏 2 < 𝑏 1 . 6. Repeat the above for n steps till convergence i.e. 𝑏 𝑜 < ϵ .

Greedy Algorithm Performance Can we learn low depth with << parameters ? • Approximating with depth-4 corresponding to n = 2, 3, 4, 5, 6 qubits . • Succeeds in finding a sequence with at most 20-24 parameters and • Success probability of learning in non-convex spaces is not ideal , between 0.1 and 0.15 . • Usually gets stuck at some local critical point or saddle point .

Learning with random local circuits A general learning setting Motivation : Study many-body dynamics / MBL . Goal : Learn / Simulate with 𝑉 1 𝑊 1 𝑉 2 𝑊 2 … 𝑉 𝑂 𝑊 𝑂 without assuming knowledge of . 2 2 2 2 𝜐 3 𝜐 1 𝜐 2 𝜐 4 𝑊 2 2 2 2 𝑢 2 𝑢 1 𝑢 3 𝑉 2 , are random matrices sampled from GUE. 1 1 1 1 𝜐 1 𝜐 4 𝜐 2 𝜐 3 𝑊 1 Result : Simulates depth-4 when gradient descent is done on all parameters. 1 1 1 𝑢 1 𝑢 2 𝑢 3 𝑉 1 Can the local circuit model simulate low depth with << parameters ? Can it simulate 1 2 3 4 5 6 Haar random unitaries ?

Remarks QAOA Unitary : • Numerical simulations of the learnability of with at least parameters by gradient descent. • A greedy algorithm for simulating short depth with << parameters. Success probability is not ideal. In progress • A rigorous justification of the requirement of more than parameters for learning . Investigate the distribution of critical points in the loss function landscape. • A local circuit model algorithm that can efficiently simulate low depth with higher success probability than the greedy one. • Noise resilience of simulating constant depth QAOA unitaries in NISQ devices.

Learning Unitaries with gradient descent optimization Reevu Maity - PowerPoint PPT Presentation

Learning Unitaries with gradient descent optimization Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13 Outline We consider classical

Quantum Linear Optics by Any Beamsplitter Adam Bouland Based on joint work with Scott Aaronson

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Efficient Simulation of Random States and Random Unitaries Gorjan Alagic, Christian Majenz and

Efficient Simulation of Random States and Random Unitaries Gorjan Alagic, Christian Majenz and

A generalization of unitaries T. S. S. R. K. Rao StatMath Unit Indian Statistical Institute

Diffuse traces and Haar unitaries Hannes Thiel TU Dresden Saarbr ucken, 11. November 2020 1

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

CSC321 Lecture 21: Policy Gradient Roger Grosse Roger Grosse CSC321 Lecture 21: Policy Gradient

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Clustering Clustering What? Given some input data, partition the data in multiple groups

Power k -Means Clustering (Poster #96) Jason Xu and Kenneth Lange Department of

On the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur (Stanford

Understanding Dynamex and its Implications in Legal & Historical Context V.B. Dubal, J.D.,

Quantum Algorithms for Systems of Linear Equations Rolando Somma Theoretical Division Los Alamos

Scalable Precision Tuning of Numerical Software Cindy Rubio-Gonzlez Department of Computer

Semidefinite programming converse bounds for quantum communication arXiv:1709.00200 Kun Fang

K -Medoids for K -Means Seeding James Newling & Franc ois Fleuret Machine Learning Group,

Learning Unitaries with gradient descent optimization Reevu Maity - PowerPoint PPT Presentation

Learning Unitaries with gradient descent optimization Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13 Outline We consider classical

Quantum Linear Optics by Any Beamsplitter Adam Bouland Based on joint work with Scott Aaronson

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Efficient Simulation of Random States and Random Unitaries Gorjan Alagic, Christian Majenz and

Efficient Simulation of Random States and Random Unitaries Gorjan Alagic, Christian Majenz and

A generalization of unitaries T. S. S. R. K. Rao StatMath Unit Indian Statistical Institute

Diffuse traces and Haar unitaries Hannes Thiel TU Dresden Saarbr ucken, 11. November 2020 1

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

CSC321 Lecture 21: Policy Gradient Roger Grosse Roger Grosse CSC321 Lecture 21: Policy Gradient

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Clustering Clustering What? Given some input data, partition the data in multiple groups

Power k -Means Clustering (Poster #96) Jason Xu and Kenneth Lange Department of

On the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur (Stanford

Understanding Dynamex and its Implications in Legal &amp; Historical Context V.B. Dubal, J.D.,

Quantum Algorithms for Systems of Linear Equations Rolando Somma Theoretical Division Los Alamos

Scalable Precision Tuning of Numerical Software Cindy Rubio-Gonzlez Department of Computer

Semidefinite programming converse bounds for quantum communication arXiv:1709.00200 Kun Fang

K -Medoids for K -Means Seeding James Newling &amp; Franc ois Fleuret Machine Learning Group,

Understanding Dynamex and its Implications in Legal & Historical Context V.B. Dubal, J.D.,

K -Medoids for K -Means Seeding James Newling & Franc ois Fleuret Machine Learning Group,