Learning Unitaries with gradient descent optimization Reevu Maity - - PowerPoint PPT Presentation

learning unitaries with gradient
SMART_READER_LITE
LIVE PREVIEW

Learning Unitaries with gradient descent optimization Reevu Maity - - PowerPoint PPT Presentation

Learning Unitaries with gradient descent optimization Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13 Outline We consider classical


slide-1
SLIDE 1

Learning Unitaries with gradient descent optimization

Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13

slide-2
SLIDE 2

Outline

  • We consider classical optimization algorithms to learn / simulate parametrized unitary

transformations generated by two Hamiltonians applied in alternation ( QAOA ).

  • Gradient Descent algorithms are first order classical methods widely studied in the

machine learning community for convex optimization.

  • Recently, it was shown that QAOA can be used for universal quantum computation.
  • S. Lloyd (2018)

Any unitary can be simulated with parameters via the alternating operator method.

  • S. Lloyd & RM (2019)
  • Aim : Study the learnability / simulability of unitaries under the alternating operator /

QAOA formalism with gradient descent.

slide-3
SLIDE 3

Problem

QAOA Unitary : are random matrices of dimension sampled from the GUE and . Learning problem

  • Given access to a target unitary and knowledge of , can we simulate

by a sequence using gradient descent on all parameters such that ? What is the time complexity = minimum number of parameters + total number

  • f gradient descent steps ?
  • Suppose is a shallow depth unitary (say, depth-4 with parameters ),

can we find a sequence such that ?

slide-4
SLIDE 4

Non-Convex Optimization

QAOA Unitary :

  • The space of the set of unitaries

is in general non-convex.

  • Standard gradient descent algorithms do not converge in non-convex spaces.
  • Gradient descent usually gets stuck at some local critical point where

.

slide-5
SLIDE 5

Non-Convex Optimization

  • Second order optimization techniques (eg. Newton’s method : calculate Hessian and

then it’s inverse) require, a) at least time for a Hessian matrix of dimension . b) fine tuning of hyperparameters.

  • Gradient descent methods can be powerful due to their computational efficiency from

the above perspectives. Require time to calculate gradients for parameters, fine tuning not required.

  • Can gradient descent optimization enable us to learn paramterized/QAOA unitaries ?
slide-6
SLIDE 6

Results so far

QAOA Unitary :

  • We find that gradient descent optimization requires at least parameters in to

approximate with accuracy .where is sampled from a parameter manifold of dimension . The rate of learning increases when gradient descent is done in overparametrized spaces with dimension .

  • We propose a greedy algorithm for learning low-depth

in time ≪

. However

the success probability of efficient learning in non-convex spaces is not ideal.

≥ ≤

slide-7
SLIDE 7

Gradient Descent

Some basic equations for gradient descent optimization ,

  • Loss function : =
  • Gradients : ,
  • Learning rate : , fixed to a certain value during the entire iteration . e.g. = 0.001
  • Parameter update : ,

Aim : Optimize to a desired accuracy .

slide-8
SLIDE 8

Learning with gradient descent

In this work, Loss Function : = . Aim : Learn to an accuracy . Simulations for 32 dimension target unitaries with or 512 parameters while varying the number of learning parameters in .

slide-9
SLIDE 9

Gradient Descent Numerics

A `transition’ occurs when gradient descent is performed in the overparameterized domain, . The rate of learning increases as we do gradient descent on more parameters beyond .

slide-10
SLIDE 10

Gradient Descent Numerics (Contd.)

α = rate of learning For the first 200 gradient descent steps, Loss = κ (no. of grad. descent steps)−α The underparametrized models learn following a power law while the

  • verparametrized models learn faster than the power law.

Underparametrized Overparametrized

slide-11
SLIDE 11

A Greedy Algorithm for low depth QAOA unitary

Can we learn low depth with << parameters ? A layer = Pseudocode : Given access to with known .

  • 1. 𝑏0 = Initial Loss = .
  • 2. Add a layer to with parameters . Cost function = .
  • 3. Perform gradient descent on to obtain optimized .

𝑏1 = Updated Loss = . and 𝑏1 < 𝑏0 .

  • 4. Add a new layer with parameters to the layer in the previous step. Updated Cost

function = .

  • 5. Perform gradient descent on to obtain optimized

𝑏2 = Updated Loss = and 𝑏2 < 𝑏1 .

  • 6. Repeat the above for n steps till convergence i.e. 𝑏𝑜 < ϵ.
slide-12
SLIDE 12

Greedy Algorithm Performance

  • Approximating with depth-4 corresponding to n = 2, 3, 4, 5, 6 qubits .
  • Succeeds in finding a sequence with at most 20-24 parameters and
  • Success probability of learning in non-convex spaces is not ideal , between 0.1 and 0.15 .
  • Usually gets stuck at some local critical point or saddle point .

Can we learn low depth with << parameters ?

slide-13
SLIDE 13

Learning with random local circuits

A general learning setting Motivation : Study many-body dynamics / MBL . Goal : Learn / Simulate with without assuming knowledge of . , are random matrices sampled from GUE. Result : Simulates depth-4 when gradient descent is done on all parameters. Can the local circuit model simulate low depth with << parameters ? Can it simulate Haar random unitaries ?

1 2 3 4 5 6 𝑉1 𝑊

1

𝑉2 𝑊

2

𝑉1 𝑊

1 𝑉2 𝑊 2 … 𝑉𝑂 𝑊 𝑂

𝑢1

1

𝑢2

1

𝑢3

1

𝑢1

2

𝑢2

2

𝑢3

2

𝜐1

1

𝜐2

1

𝜐3

1

𝜐4

1

𝜐1

2

𝜐2

2

𝜐4

2

𝜐3

2

slide-14
SLIDE 14

Remarks

QAOA Unitary :

  • Numerical simulations of the learnability of with at least parameters by

gradient descent.

  • A greedy algorithm for simulating short depth with << parameters. Success

probability is not ideal. In progress

  • A rigorous justification of the requirement of more than parameters for learning

. Investigate the distribution of critical points in the loss function landscape.

  • A local circuit model algorithm that can efficiently simulate low depth with

higher success probability than the greedy one.

  • Noise resilience of simulating constant depth QAOA unitaries in NISQ devices.