Statistical Aspects of Quantum Computing Yazhen Wang Department of - - PowerPoint PPT Presentation

statistical aspects of quantum computing
SMART_READER_LITE
LIVE PREVIEW

Statistical Aspects of Quantum Computing Yazhen Wang Department of - - PowerPoint PPT Presentation

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of Wisconsin-Madison http://www.stat.wisc.edu/ yzwang Near-term Applications of Quantum Computing Fermilab, December 6-7, 2017 Yazhen (at UW-Madison)


slide-1
SLIDE 1

Statistical Aspects of Quantum Computing

Yazhen Wang

Department of Statistics University of Wisconsin-Madison http://www.stat.wisc.edu/∼yzwang Near-term Applications of Quantum Computing Fermilab, December 6-7, 2017

Yazhen (at UW-Madison) 1 / 40

slide-2
SLIDE 2

Outline

Statistical learning with quantum annealing Statistical analysis of quantum computing data

Yazhen (at UW-Madison) 2 / 40

slide-3
SLIDE 3

Statistics and Optimization

MLE/M-estimation, Non-parametric smoothing, · · ·

  • Stochastic optimization problem:

min

θ

L(θ; Xn) = 1 n

n

  • i=1

ℓ(θ; Xi)

  • Minimization solution gives an estimator or a classifier.

Examples : ℓ(θ; Xi) = log pdf; residual square sum / loss + penalty

Yazhen (at UW-Madison) 3 / 40

slide-4
SLIDE 4

Statistics and Optimization

MLE/M-estimation, Non-parametric smoothing, · · ·

  • Stochastic optimization problem:

min

θ

L(θ; Xn) = 1 n

n

  • i=1

ℓ(θ; Xi)

  • Minimization solution gives an estimator or a classifier.

Examples : ℓ(θ; Xi) = log pdf; residual square sum / loss + penalty

Take g(θ) = E[L(θ; Xn)] = E[ℓ(θ; X1)]

  • Optimization problem:

min

θ

g(θ)

  • Minimization solution defines a true parameter value.

Yazhen (at UW-Madison) 3 / 40

slide-5
SLIDE 5

Statistics and Optimization

MLE/M-estimation, Non-parametric smoothing, · · ·

  • Stochastic optimization problem:

min

θ

L(θ; Xn) = 1 n

n

  • i=1

ℓ(θ; Xi)

  • Minimization solution gives an estimator or a classifier.

Examples : ℓ(θ; Xi) = log pdf; residual square sum / loss + penalty

Take g(θ) = E[L(θ; Xn)] = E[ℓ(θ; X1)]

  • Optimization problem:

min

θ

g(θ)

  • Minimization solution defines a true parameter value.

Goals: Use data Xn to do the following

(i) Evaluate estimators/classifiers (minimization solutions) Computing (ii) Statistical study of estimators/classifiers – Inference

Yazhen (at UW-Madison) 3 / 40

slide-6
SLIDE 6

Computer Power Demand

Yazhen (at UW-Madison) 4 / 40

slide-7
SLIDE 7

Computer Power Demand

BIG DATA

Yazhen (at UW-Madison) 4 / 40

slide-8
SLIDE 8

Computer Power Demand

BIG DATA

Scientific Studies and Computational Applications

Yazhen (at UW-Madison) 4 / 40

slide-9
SLIDE 9

Learning examples

Machine learning and compressed sensing

  • Matrix completion, matrix factorization, tensor decomposition,

phase retrieval, neural network.

Yazhen (at UW-Madison) 5 / 40

slide-10
SLIDE 10

Learning examples

Machine learning and compressed sensing

  • Matrix completion, matrix factorization, tensor decomposition,

phase retrieval, neural network.

History

Yazhen (at UW-Madison) 5 / 40

slide-11
SLIDE 11

Learning examples

Machine learning and compressed sensing

  • Matrix completion, matrix factorization, tensor decomposition,

phase retrieval, neural network.

History Dog vs cat

Yazhen (at UW-Madison) 5 / 40

slide-12
SLIDE 12

Learning examples

Machine learning and compressed sensing

  • Matrix completion, matrix factorization, tensor decomposition,

phase retrieval, neural network.

Neural network: Layers in a chain structure

Each layer is a function of the layer preceded it. Layer j: hj = gj(ajhj−1 + bj), (aj, bj) = weights, gj = activation function (sigmoid, softmax or rectifier)

History Dog vs cat

Yazhen (at UW-Madison) 5 / 40

slide-13
SLIDE 13

Gradient Descent Alorithms: Solve minθ g(θ)

Gradient descent algorithm

  • Start at initial value x0,

xk = xk−1 − δ∇g(xk−1), δ = learning rate, ∇ = derivative operator

Yazhen (at UW-Madison) 6 / 40

slide-14
SLIDE 14

Gradient Descent Alorithms: Solve minθ g(θ)

Gradient descent algorithm

  • Start at initial value x0,

xk = xk−1 − δ∇g(xk−1), δ = learning rate, ∇ = derivative operator

Accelerated Gradient descent algorithm (Nesterov)

  • Start at initial values x0 and y0 = x0,

xk = yk−1 − δ∇g(yk−1), yk = xk + k − 1 k + 2(xk − xk−1)

Yazhen (at UW-Madison) 6 / 40

slide-15
SLIDE 15

Gradient Descent Alorithms: Solve minθ g(θ)

Gradient descent algorithm

  • Start at initial value x0,

xk = xk−1 − δ∇g(xk−1), δ = learning rate, ∇ = derivative operator

Continuous curve Xt to approximate discrete {xk : k ≥ 0}

Differential equation: ˙ Xt + ∇g(Xt) = 0, ˙ Xt = derivative = dXt dt

Accelerated Gradient descent algorithm (Nesterov)

  • Start at initial values x0 and y0 = x0,

xk = yk−1 − δ∇g(yk−1), yk = xk + k − 1 k + 2(xk − xk−1)

Yazhen (at UW-Madison) 6 / 40

slide-16
SLIDE 16

Gradient Descent Alorithms: Solve minθ g(θ)

Gradient descent algorithm

  • Start at initial value x0,

xk = xk−1 − δ∇g(xk−1), δ = learning rate, ∇ = derivative operator

Continuous curve Xt to approximate discrete {xk : k ≥ 0}

Differential equation: ˙ Xt + ∇g(Xt) = 0, ˙ Xt = derivative = dXt dt

Accelerated Gradient descent algorithm (Nesterov)

  • Start at initial values x0 and y0 = x0,

xk = yk−1 − δ∇g(yk−1), yk = xk + k − 1 k + 2(xk − xk−1)

Continuous curve Xt to approximate discrete {xk : k ≥ 0}

Differential equation: ¨ Xt + 3 t ˙ Xt + ∇g(Xt) = 0, ¨ Xt = d2Xt dt2

Yazhen (at UW-Madison) 6 / 40

slide-17
SLIDE 17

Gradient Descent Alorithms: Solve minθ g(θ)

Gradient descent algorithm

  • Start at initial value x0,

xk = xk−1 − δ∇g(xk−1), δ = learning rate, ∇ = derivative operator

Continuous curve Xt to approximate discrete {xk : k ≥ 0}

Differential equation: ˙ Xt + ∇g(Xt) = 0, ˙ Xt = derivative = dXt dt

Convergence to the minimization solution at rate= 1/k or 1/t (↑)

as t, k → ∞. For the ccelerated case: Rate = 1/k2 or 1/t2(↓)

Accelerated Gradient descent algorithm (Nesterov)

  • Start at initial values x0 and y0 = x0,

xk = yk−1 − δ∇g(yk−1), yk = xk + k − 1 k + 2(xk − xk−1)

Continuous curve Xt to approximate discrete {xk : k ≥ 0}

Differential equation: ¨ Xt + 3 t ˙ Xt + ∇g(Xt) = 0, ¨ Xt = d2Xt dt2

Yazhen (at UW-Madison) 6 / 40

slide-18
SLIDE 18

Stochastic Gradient Descent

Stochastic optimization: minθ L(θ; Xn), Xn = (X1, · · · , Xn)

  • Gradient descent algorithm to compute xk iteratively

xk = xk−1 − δ∇L(xk−1; Xn), ∇L(θ; Xn) = 1 n

n

  • i=1

∇ℓ(θ; Xi)

Yazhen (at UW-Madison) 7 / 40

slide-19
SLIDE 19

Stochastic Gradient Descent

Stochastic optimization: minθ L(θ; Xn), Xn = (X1, · · · , Xn)

  • Gradient descent algorithm to compute xk iteratively

xk = xk−1 − δ∇L(xk−1; Xn), ∇L(θ; Xn) = 1 n

n

  • i=1

∇ℓ(θ; Xi)

BigData: expensive to evaluate all ∇ℓ(θ; Xi) at each iteration

  • Replace ∇L(θ; Xn) by

∇ ˆ Lm(θ; X∗

m) = 1

m

m

  • j=1

∇ℓ(θ; X ∗

j ),

m ≪ n X∗

m = (X ∗ 1 , · · · , X ∗ m)= subsample of Xn (minibatch or bootstrap sample).

Yazhen (at UW-Madison) 7 / 40

slide-20
SLIDE 20

Stochastic Gradient Descent

Stochastic optimization: minθ L(θ; Xn), Xn = (X1, · · · , Xn)

  • Gradient descent algorithm to compute xk iteratively

xk = xk−1 − δ∇L(xk−1; Xn), ∇L(θ; Xn) = 1 n

n

  • i=1

∇ℓ(θ; Xi)

BigData: expensive to evaluate all ∇ℓ(θ; Xi) at each iteration

  • Replace ∇L(θ; Xn) by

∇ ˆ Lm(θ; X∗

m) = 1

m

m

  • j=1

∇ℓ(θ; X ∗

j ),

m ≪ n X∗

m = (X ∗ 1 , · · · , X ∗ m)= subsample of Xn (minibatch or bootstrap sample).

Stochastic gradient descent algorithm

x∗

k = x∗ k−1 − δ∇ ˆ

Lm(x∗

k−1; X∗ m)

Yazhen (at UW-Madison) 7 / 40

slide-21
SLIDE 21

Stochastic Gradient Descent

Stochastic optimization: minθ L(θ; Xn), Xn = (X1, · · · , Xn)

  • Gradient descent algorithm to compute xk iteratively

xk = xk−1 − δ∇L(xk−1; Xn), ∇L(θ; Xn) = 1 n

n

  • i=1

∇ℓ(θ; Xi)

BigData: expensive to evaluate all ∇ℓ(θ; Xi) at each iteration

  • Replace ∇L(θ; Xn) by

∇ ˆ Lm(θ; X∗

m) = 1

m

m

  • j=1

∇ℓ(θ; X ∗

j ),

m ≪ n X∗

m = (X ∗ 1 , · · · , X ∗ m)= subsample of Xn (minibatch or bootstrap sample).

Stochastic gradient descent algorithm

x∗

k = x∗ k−1 − δ∇ ˆ

Lm(x∗

k−1; X∗ m)

Continuous curve X ∗

t to approximate discrete {x∗ k : k ≥ 0}

X ∗

t obeys stochastic differential equation.

Yazhen (at UW-Madison) 7 / 40

slide-22
SLIDE 22

Gradient Descent vs Stochastic Gradient Descent

Gradient Descent

Yazhen (at UW-Madison) 8 / 40

slide-23
SLIDE 23

Gradient Descent vs Stochastic Gradient Descent

Gradient Descent Stochastic gradient descent

Yazhen (at UW-Madison) 8 / 40

slide-24
SLIDE 24

Statistical Analysis of Gradient Descent (Wang, 2017)

Continuous curve model

Stochastic differential equation: dX ∗

t + ∇g(X ∗ t )dt + σ(X ∗ t )dWt = 0

Wt = Brownian motion For the accelerated case: 2nd order stochastic differential equation

Yazhen (at UW-Madison) 9 / 40

slide-25
SLIDE 25

Statistical Analysis of Gradient Descent (Wang, 2017)

Continuous curve model

Stochastic differential equation: dX ∗

t + ∇g(X ∗ t )dt + σ(X ∗ t )dWt = 0

Wt = Brownian motion For the accelerated case: 2nd order stochastic differential equation

and their asymptotic distribution

as m, n → ∞ via stochastic differential equations

Yazhen (at UW-Madison) 9 / 40

slide-26
SLIDE 26

Statistical Analysis of Gradient Descent (Wang, 2017)

Continuous curve model

Stochastic differential equation: dX ∗

t + ∇g(X ∗ t )dt + σ(X ∗ t )dWt = 0

Wt = Brownian motion For the accelerated case: 2nd order stochastic differential equation

and their asymptotic distribution

as m, n → ∞ via stochastic differential equations

Example Xi = (Ui, Vi), i = 1, · · · , n = 10000

Vi = Uiθ + εi, Ui ∼ i.i.d.bivariateN(0, Σ), εi ∼ i.i.d.N(0, τ 2) ℓ(θ; Xi) = (Vi − Uiθ)2, m = 200, true θ = (0, 0).

Yazhen (at UW-Madison) 9 / 40

slide-27
SLIDE 27

Statistical Analysis of Gradient Descent (Wang, 2017)

Continuous curve model

Stochastic differential equation: dX ∗

t + ∇g(X ∗ t )dt + σ(X ∗ t )dWt = 0

Wt = Brownian motion For the accelerated case: 2nd order stochastic differential equation

and their asymptotic distribution

as m, n → ∞ via stochastic differential equations

Example Xi = (Ui, Vi), i = 1, · · · , n = 10000

Vi = Uiθ + εi, Ui ∼ i.i.d.bivariateN(0, Σ), εi ∼ i.i.d.N(0, τ 2) ℓ(θ; Xi) = (Vi − Uiθ)2, m = 200, true θ = (0, 0).

Yazhen (at UW-Madison) 9 / 40

slide-28
SLIDE 28

Deep Learning

Boltzmann Machine (BM) on graph G = (V, E)

  • P(s) = exp[−E(s)]

Z , Z =

  • s

exp [−E(s)]

  • Energy

E(s) = −

  • (i,j)∈E

Wijsisj −

  • i∈V

bisi, s = (s1, · · · , s|V|) ∈ {−1, 1}|V|

Yazhen (at UW-Madison) 10 / 40

slide-29
SLIDE 29

Deep Learning

Boltzmann Machine (BM) on graph G = (V, E)

  • P(s) = exp[−E(s)]

Z , Z =

  • s

exp [−E(s)]

  • Energy

E(s) = −

  • (i,j)∈E

Wijsisj −

  • i∈V

bisi, s = (s1, · · · , s|V|) ∈ {−1, 1}|V|

Take s = (v, h)

v = (v1, · · · , vn): visible nodes (observed variables) h = (h1, · · · , hm): hidden nodes (latent variables). Boltzmann distribution models data v: P(v) =

  • h

P(v, h)

Yazhen (at UW-Madison) 10 / 40

slide-30
SLIDE 30

Deep Learning

Boltzmann Machine (BM) on graph G = (V, E)

  • P(s) = exp[−E(s)]

Z , Z =

  • s

exp [−E(s)]

  • Energy

E(s) = −

  • (i,j)∈E

Wijsisj −

  • i∈V

bisi, s = (s1, · · · , s|V|) ∈ {−1, 1}|V|

Take s = (v, h)

v = (v1, · · · , vn): visible nodes (observed variables) h = (h1, · · · , hm): hidden nodes (latent variables). Boltzmann distribution models data v: P(v) =

  • h

P(v, h)

Learning

Use training data v to learn model parameters Wij & bi.

Yazhen (at UW-Madison) 10 / 40

slide-31
SLIDE 31

Restricted Boltzmann Machine (RBM)

Bipartite undirected graph G

Connections between hidden layer and visible layer but not within each layer

Yazhen (at UW-Madison) 11 / 40

slide-32
SLIDE 32

Restricted Boltzmann Machine (RBM)

Bipartite undirected graph G

Connections between hidden layer and visible layer but not within each layer

Model

Variables in visible layer: v = (v1, · · · , vn), Variables in hidden layer: h = (h1, · · · , hm) P(v, h) = exp{−E(v, h)}/Z

Yazhen (at UW-Madison) 11 / 40

slide-33
SLIDE 33

Restricted Boltzmann Machine (RBM)

Bipartite undirected graph G

Connections between hidden layer and visible layer but not within each layer

Model

Variables in visible layer: v = (v1, · · · , vn), Variables in hidden layer: h = (h1, · · · , hm) P(v, h) = exp{−E(v, h)}/Z E(v, h) = −

n

  • i=1

m

  • j=1

wijvihj −

n

  • i=1

bivi −

m

  • j=1

cjhj

Yazhen (at UW-Madison) 11 / 40

slide-34
SLIDE 34

Deep Neural Network: Restricted Boltzmann Machine

Yazhen (at UW-Madison) 12 / 40

slide-35
SLIDE 35

Deep Neural Network: Restricted Boltzmann Machine

Conditional independence within each layer given the others

P(h|v) =

m

  • j=1

P(hj|v), P(v|h) =

n

  • i=1

P(vi|h)

Yazhen (at UW-Madison) 12 / 40

slide-36
SLIDE 36

Deep Neural Network: Restricted Boltzmann Machine

Conditional independence within each layer given the others

P(h|v) =

m

  • j=1

P(hj|v), P(v|h) =

n

  • i=1

P(vi|h)

Sigmoid activation function for forward and backward conditional probabilities: sigmoid(x) = 1/[1 + e−x]

P(hj = 1|v) = sigmoid n

  • i=1

wijvi + cj

  • P(vi = 1|h) = sigmoid

 

n

  • j=1

wijhj + bi  

Yazhen (at UW-Madison) 12 / 40

slide-37
SLIDE 37

Deep Learning

Gradient ascent/descent to compute model parameters wij, bi and cj.

Yazhen (at UW-Madison) 13 / 40

slide-38
SLIDE 38

Deep Learning

Gradient ascent/descent to compute model parameters wij, bi and cj.

Parameter updates with learning rate η

w(t+1)

ij

= wt

ij + η∂ log P

∂wij b(t+1)

i

= bt

i + η∂ log P

∂bi , c(t+1)

j

= ct

j + η∂ log P

∂cj

Yazhen (at UW-Madison) 13 / 40

slide-39
SLIDE 39

Deep Learning

Gradient ascent/descent to compute model parameters wij, bi and cj.

Gradient

∂ log P ∂wij = vihjdata − vihjmodel ∂ log P ∂bi = vidata − vimodel, ∂ log P ∂cj = hjdata − hjmodel

  • vihjdata: the clamped expectation with v fixed

Bottleneck : vihjmodel =

  • v,h

vihjP(v, h)

Parameter updates with learning rate η

w(t+1)

ij

= wt

ij + η∂ log P

∂wij b(t+1)

i

= bt

i + η∂ log P

∂bi , c(t+1)

j

= ct

j + η∂ log P

∂cj

Yazhen (at UW-Madison) 13 / 40

slide-40
SLIDE 40

Markov Chain Monte Carlo (MCMC)

Metropolis-Hastings algorithm/Gibbs sampler

Sample from Boltzmann distribution P(s) = exp[−HIsing(s)/T] ZT , ZT =

  • s

exp

  • −HIsing(s)

T

  • , T =temperature

Yazhen (at UW-Madison) 14 / 40

slide-41
SLIDE 41

Markov Chain Monte Carlo (MCMC)

Metropolis-Hastings algorithm/Gibbs sampler

Sample from Boltzmann distribution P(s) = exp[−HIsing(s)/T] ZT , ZT =

  • s

exp

  • −HIsing(s)

T

  • , T =temperature

Simulated annealing: Thermal Fluctuation

Slowly lower the temperature to reduce the escape probability of trapping in local minima, Annealing schedule : Ti ∝ 1 i + 1 or 1 log(i + 1)

Yazhen (at UW-Madison) 14 / 40

slide-42
SLIDE 42

Markov Chain Monte Carlo (MCMC)

Metropolis-Hastings algorithm/Gibbs sampler

Sample from Boltzmann distribution P(s) = exp[−HIsing(s)/T] ZT , ZT =

  • s

exp

  • −HIsing(s)

T

  • , T =temperature

Simulated annealing: Thermal Fluctuation

Slowly lower the temperature to reduce the escape probability of trapping in local minima, Annealing schedule : Ti ∝ 1 i + 1 or 1 log(i + 1)

BigData

Issues: not easy for parallel computing; very hard to scale-up!

Yazhen (at UW-Madison) 14 / 40

slide-43
SLIDE 43

Quantum Annealing (QA): Basic Idea

Classical optimization: Min{HIsing(s) : s ∈ {−1, 1}N}

Yazhen (at UW-Madison) 15 / 40

slide-44
SLIDE 44

Quantum Annealing (QA): Basic Idea

Classical optimization: Min{HIsing(s) : s ∈ {−1, 1}N} Find a target quantum system with Hamiltonian H(1) whose

energies match HIsing(s): H(1) = diag{HIsing(s1, ) · · · , HIsing(s2N)}.

Yazhen (at UW-Madison) 15 / 40

slide-45
SLIDE 45

Quantum Annealing (QA): Basic Idea

Classical optimization: Min{HIsing(s) : s ∈ {−1, 1}N} Find a target quantum system with Hamiltonian H(1) whose

energies match HIsing(s): H(1) = diag{HIsing(s1, ) · · · , HIsing(s2N)}.

Create an initial quantum system with Hamiltonian H(0)

whose lowest energy state is known and easy to prepare.

Yazhen (at UW-Madison) 15 / 40

slide-46
SLIDE 46

Quantum Annealing (QA): Basic Idea

Classical optimization: Min{HIsing(s) : s ∈ {−1, 1}N} Find a target quantum system with Hamiltonian H(1) whose

energies match HIsing(s): H(1) = diag{HIsing(s1, ) · · · , HIsing(s2N)}.

Create an initial quantum system with Hamiltonian H(0)

whose lowest energy state is known and easy to prepare.

QA: Engineer H(0) in its lowest energy state and gradually move

H(0) − → H(1)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Energy State

  • H(0)

H(1)

Ground State

  • Yazhen (at UW-Madison)

15 / 40

slide-47
SLIDE 47

Simulated Quantum Annealing (SQA)

Spin glass in transverse field

H = A(t)HX + B(t)HIsing, two parts non-commuting

A B C Yazhen (at UW-Madison) 18 / 40

slide-48
SLIDE 48

Simulated Quantum Annealing (SQA)

Spin glass in transverse field

H = A(t)HX + B(t)HIsing, two parts non-commuting

Path integral representation via Suzuki-Trotter expansion

H ≈ H2+1 = classical (2+1)-dimensional anisotropic Ising system

A B C Yazhen (at UW-Madison) 18 / 40

slide-49
SLIDE 49

Simulated Quantum Annealing (SQA)

Spin glass in transverse field

H = A(t)HX + B(t)HIsing, two parts non-commuting

Path integral representation via Suzuki-Trotter expansion

H ≈ H2+1 = classical (2+1)-dimensional anisotropic Ising system

(2 + 1)-dimensional system

Two directions: along the original 2-dimensional direction spins have Chimera graph couplings, and along the extra (imaginary-time) direction spins have uniform couplings

A B C Yazhen (at UW-Madison) 18 / 40

slide-50
SLIDE 50

Simulated Quantum Annealing (SQA)

Spin glass in transverse field

H = A(t)HX + B(t)HIsing, two parts non-commuting

Path integral representation via Suzuki-Trotter expansion

H ≈ H2+1 = classical (2+1)-dimensional anisotropic Ising system

(2 + 1)-dimensional system

Two directions: along the original 2-dimensional direction spins have Chimera graph couplings, and along the extra (imaginary-time) direction spins have uniform couplings

Quantum Monte Carlo

H2+1: a collection of 2-dimensional classical Ising systems, that can be simulated by MCMC with moves in both directions

A B C Yazhen (at UW-Madison) 18 / 40

slide-51
SLIDE 51

SSSV Annealing Model

Magnet i points in direction with angle θi w.r.t. z-axis in the xz plane, an external magnetic field with intensity A(t) pointing in the x-axis, Hamiltonian, Jij = coupling of magnets θi and θj, H(t) = −A(t)

N

  • i=1

sin θi − B(t)

  • 1≤i<j≤N

Jij cos θi cos θj

Yazhen (at UW-Madison) 22 / 40

slide-52
SLIDE 52

SSSV Annealing Model

Magnet i points in direction with angle θi w.r.t. z-axis in the xz plane, an external magnetic field with intensity A(t) pointing in the x-axis, Hamiltonian, Jij = coupling of magnets θi and θj, H(t) = −A(t)

N

  • i=1

sin θi − B(t)

  • 1≤i<j≤N

Jij cos θi cos θj The model can be simulated by the Metropolis algorithm with temperature T = 0.22, and initial condition θi = π/2

Yazhen (at UW-Madison) 22 / 40

slide-53
SLIDE 53

SSSV Annealing Model

Magnet i points in direction with angle θi w.r.t. z-axis in the xz plane, an external magnetic field with intensity A(t) pointing in the x-axis, Hamiltonian, Jij = coupling of magnets θi and θj, H(t) = −A(t)

N

  • i=1

sin θi − B(t)

  • 1≤i<j≤N

Jij cos θi cos θj The model can be simulated by the Metropolis algorithm with temperature T = 0.22, and initial condition θi = π/2 Interpretation: angle θi as state |↑(=+1) or state |↓(= -1) according to the sign of cos(θi) (its projection on z direction).

Yazhen (at UW-Madison) 22 / 40

slide-54
SLIDE 54

SSSV Annealing Model

Magnet i points in direction with angle θi w.r.t. z-axis in the xz plane, an external magnetic field with intensity A(t) pointing in the x-axis, Hamiltonian, Jij = coupling of magnets θi and θj, H(t) = −A(t)

N

  • i=1

sin θi − B(t)

  • 1≤i<j≤N

Jij cos θi cos θj The model can be simulated by the Metropolis algorithm with temperature T = 0.22, and initial condition θi = π/2 Interpretation: angle θi as state |↑(=+1) or state |↓(= -1) according to the sign of cos(θi) (its projection on z direction). Use the converted states to evaluate HIsing(s) and find its minimizer

Yazhen (at UW-Madison) 22 / 40

slide-55
SLIDE 55

DW Signal vs Background Noise

  • 5

10 15 20 problem instance background noise in percentage s0 s10 s20 s30 s40 s50 s60 s70 s80 s90 s99

  • 116

270 485 760 1094 Yazhen (at UW-Madison) 23 / 40

slide-56
SLIDE 56

DW Signal vs Background Noise

  • 20

40 60 80 100 energy level background noise in percentage 12 24 36 48 60 72 84 96 108 120 132 144

  • 116

270 485 760 1094 Yazhen (at UW-Madison) 24 / 40

slide-57
SLIDE 57

Correlation of Ground State Success Probability Data

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(a)

SA DW 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(b)

SQA DW 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(c)

SSSV DW 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(d)

SSSV SQA

Yazhen (at UW-Madison) 25 / 40

slide-58
SLIDE 58

Multiple Statistical Tests

For the r-th instance, repeat m times of annealing, let ˆ p0rm be DW success frequency out of m repetitions and ˆ qℓrm, ℓ = 1, 2, 3, the success frequencies for SA, SQA & SSSV

Yazhen (at UW-Madison) 26 / 40

slide-59
SLIDE 59

Multiple Statistical Tests

For the r-th instance, repeat m times of annealing, let ˆ p0rm be DW success frequency out of m repetitions and ˆ qℓrm, ℓ = 1, 2, 3, the success frequencies for SA, SQA & SSSV

H0r : p0r∞ = qℓr∞ vs Har : p0r∞ = qℓr∞

Trℓ = m(ˆ pr − ˆ qℓ,r)2 ˆ pr(1 − ˆ pr) + ˆ qℓ,r(1 − ˆ qℓ,r)

Yazhen (at UW-Madison) 26 / 40

slide-60
SLIDE 60

Multiple Statistical Tests

For the r-th instance, repeat m times of annealing, let ˆ p0rm be DW success frequency out of m repetitions and ˆ qℓrm, ℓ = 1, 2, 3, the success frequencies for SA, SQA & SSSV

H0r : p0r∞ = qℓr∞ vs Har : p0r∞ = qℓr∞

Trℓ = m(ˆ pr − ˆ qℓ,r)2 ˆ pr(1 − ˆ pr) + ˆ qℓ,r(1 − ˆ qℓ,r) T ∗

rℓ = 2m

  • arcsin
  • ˆ

pr

  • − arcsin
  • ˆ

qℓ,r 2

Yazhen (at UW-Madison) 26 / 40

slide-61
SLIDE 61

Multiple Statistical Tests

For the r-th instance, repeat m times of annealing, let ˆ p0rm be DW success frequency out of m repetitions and ˆ qℓrm, ℓ = 1, 2, 3, the success frequencies for SA, SQA & SSSV

H0r : p0r∞ = qℓr∞ vs Har : p0r∞ = qℓr∞

Trℓ = m(ˆ pr − ˆ qℓ,r)2 ˆ pr(1 − ˆ pr) + ˆ qℓ,r(1 − ˆ qℓ,r) T ∗

rℓ = 2m

  • arcsin
  • ˆ

pr

  • − arcsin
  • ˆ

qℓ,r 2

Asymptotic distribution under H0r

As m, n → ∞, if log n/m → 0, then Trℓ − → χ2

1,

T ∗

rℓ −

→ χ2

1

uniformly over r = 1, · · · , n

Yazhen (at UW-Madison) 26 / 40

slide-62
SLIDE 62

Multiple Statistical Tests

For the r-th instance, repeat m times of annealing, let ˆ p0rm be DW success frequency out of m repetitions and ˆ qℓrm, ℓ = 1, 2, 3, the success frequencies for SA, SQA & SSSV

H0r : p0r∞ = qℓr∞ vs Har : p0r∞ = qℓr∞

Trℓ = m(ˆ pr − ˆ qℓ,r)2 ˆ pr(1 − ˆ pr) + ˆ qℓ,r(1 − ˆ qℓ,r) T ∗

rℓ = 2m

  • arcsin
  • ˆ

pr

  • − arcsin
  • ˆ

qℓ,r 2

Asymptotic distribution under H0r

As m, n → ∞, if log n/m → 0, then Trℓ − → χ2

1,

T ∗

rℓ −

→ χ2

1

uniformly over r = 1, · · · , n

p-values & FDR

H0r vs Har : p-value = P(χ2

1 ≥ Trℓ)

p-value = P(χ2

1 ≥ T ∗ rℓ)

Yazhen (at UW-Madison) 26 / 40

slide-63
SLIDE 63

Goodness-of-fit test

H0 : p0r∞ = qℓr∞ for all 1 ≤ r ≤ n vs Ha : p0r∞ = qℓr∞ for some r

Uℓ = (2n)−1/2

n

  • r=1

(Trℓ − n) U∗

ℓ = (2n)−1/2 n

  • r=1

(Trℓ − n)

Yazhen (at UW-Madison) 27 / 40

slide-64
SLIDE 64

Goodness-of-fit test

H0 : p0r∞ = qℓr∞ for all 1 ≤ r ≤ n vs Ha : p0r∞ = qℓr∞ for some r

Uℓ = (2n)−1/2

n

  • r=1

(Trℓ − n) U∗

ℓ = (2n)−1/2 n

  • r=1

(Trℓ − n)

Asymptotic distribution under H0 as m, n → ∞

Uℓ → N(0, 1) U∗

ℓ → N(0, 1)

Conditions

(1) √n/m → 0. (2) p0r∞ = qℓr∞=true success probability for method ℓ with the r-th instance are bounded away from 0 and 1.

Yazhen (at UW-Madison) 27 / 40

slide-65
SLIDE 65

Goodness-of-fit test

H0 : p0r∞ = qℓr∞ for all 1 ≤ r ≤ n vs Ha : p0r∞ = qℓr∞ for some r

Uℓ = (2n)−1/2

n

  • r=1

(Trℓ − n) U∗

ℓ = (2n)−1/2 n

  • r=1

(Trℓ − n)

Asymptotic distribution under H0 as m, n → ∞

Uℓ → N(0, 1) U∗

ℓ → N(0, 1)

p-value = 2[1 − Φ(|Uℓ|)] p-value = 2[1 − Φ(|U∗

ℓ |)]

Conditions

(1) √n/m → 0. (2) p0r∞ = qℓr∞=true success probability for method ℓ with the r-th instance are bounded away from 0 and 1.

Yazhen (at UW-Madison) 27 / 40

slide-66
SLIDE 66

Multiple Tests: FDR

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(a)

DW SA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(b)

DW SQA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(c)

DW SSSV 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(d)

SSSV SQA

Yazhen (at UW-Madison) 28 / 40

slide-67
SLIDE 67

Multiple Tests: FDR

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(a)

DW SA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(b)

DW SQA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(c)

DW SSSV 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(d)

SSSV SQA

p-values

Yazhen (at UW-Madison) 28 / 40

slide-68
SLIDE 68

Multiple Tests: FDR

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(a)

DW SA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(b)

DW SQA 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(c)

DW SSSV 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(d)

SSSV SQA

p-values FDR

q-value = essentially zero

Yazhen (at UW-Madison) 28 / 40

slide-69
SLIDE 69

Goodness-of-fit-test

Yazhen (at UW-Madison) 29 / 40

slide-70
SLIDE 70

Goodness-of-fit-test

SQA vs DW

p-values = 0

Yazhen (at UW-Madison) 29 / 40

slide-71
SLIDE 71

Goodness-of-fit-test

SQA vs DW

p-values = 0

SSSV vs DW

p-values = 0

Yazhen (at UW-Madison) 29 / 40

slide-72
SLIDE 72

Goodness-of-fit-test

SA vs DW

p-values = 0

SQA vs DW

p-values = 0

SSSV vs DW

p-values = 0

Yazhen (at UW-Madison) 29 / 40

slide-73
SLIDE 73

Goodness-of-fit-test

Reject null hypothesis

all p-values ≤ 3.87 × 10−6

SA vs DW

p-values = 0

SQA vs DW

p-values = 0

SSSV vs DW

p-values = 0

Yazhen (at UW-Madison) 29 / 40

slide-74
SLIDE 74

Goodness-of-fit-test

Reject null hypothesis

all p-values ≤ 3.87 × 10−6

SA vs DW

p-values = 0

SQA vs DW

p-values = 0

SSSV vs DW

p-values = 0

Conclusion: Overwhelming rejection

Overwhelming evidence to reject that DW is statistically consistent with SQA or SSSV in terms of ground state success probability

Yazhen (at UW-Madison) 29 / 40

slide-75
SLIDE 75

Histogram of Ground State Success Probability Data

(a) DW

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SA

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SQA

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SSSV

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 30 / 40

slide-76
SLIDE 76

SA Histograms for different annealing times

(a) SA with 100 sweeps

Success Probability Number of Instance 0.0 0.1 0.2 0.3 0.4 0.5 0.6 50 100 150 200

(b) SA with 1000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SA with 10000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SA with 50000 sweeps

Success Probability Number of Instance 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 31 / 40

slide-77
SLIDE 77

SQA Histograms

Various annealing times

(a) SQA with 3000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SQA with 5000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SQA with 7000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SQA with 10000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 32 / 40

slide-78
SLIDE 78

SQA Histograms

Various annealing times

(a) SQA with 3000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SQA with 5000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SQA with 7000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SQA with 10000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Various temperatures

(a) SQA with T=0.1 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (b) SQA with T=0.2 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (c) SQA with T=0.3 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (d) SQA with T=0.5 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (e) SQA with T=1 Success Probability Number of Instance 0.2 0.4 0.6 0.8 1.0 50 100 200

Yazhen (at UW-Madison) 32 / 40

slide-79
SLIDE 79

SSSV Histograms

Various annealing times

(a) SSSV with 5000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SSSV with 75000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SSSV with 15000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SSSV with 150000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 33 / 40

slide-80
SLIDE 80

SSSV Histograms

Various annealing times

(a) SSSV with 5000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SSSV with 75000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SSSV with 15000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SSSV with 150000 sweeps

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Various temperatures

(a) SSSV with T=0.1 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (b) SSSV with T=0.2 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (c) SSSV with T=0.3 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (d) SSSV with T=0.5 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200 (e) SSSV with T=1 Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 200

Yazhen (at UW-Madison) 33 / 40

slide-81
SLIDE 81

DIP Test for Shape Patterns

DIP(Fn) = max

0≤p≤1 |Fn(p) − ˆ

Fn(p)| Fn=empirical DF , ˆ Fn= DF estimator under unimodality or U-shape Under uniform null (asymptotic least favorable) distribution, as n → ∞, √ nDIP(Fn) → DIP(B), B(t) = Brownian bridge on [0, 1]

Yazhen (at UW-Madison) 34 / 40

slide-82
SLIDE 82

DIP Test for Shape Patterns

DIP(Fn) = max

0≤p≤1 |Fn(p) − ˆ

Fn(p)| Fn=empirical DF , ˆ Fn= DF estimator under unimodality or U-shape Under uniform null (asymptotic least favorable) distribution, as n → ∞, √ nDIP(Fn) → DIP(B), B(t) = Brownian bridge on [0, 1]

Unimodality (including monotone) DW: no SA: yes SQA: no SSSV: no

Yazhen (at UW-Madison) 34 / 40

slide-83
SLIDE 83

DIP Test for Shape Patterns

DIP(Fn) = max

0≤p≤1 |Fn(p) − ˆ

Fn(p)| Fn=empirical DF , ˆ Fn= DF estimator under unimodality or U-shape Under uniform null (asymptotic least favorable) distribution, as n → ∞, √ nDIP(Fn) → DIP(B), B(t) = Brownian bridge on [0, 1]

Unimodality (including monotone) DW: no SA: yes SQA: no SSSV: no U-shape DW: no SA: no SQA: yes SSSV: yes

Yazhen (at UW-Madison) 34 / 40

slide-84
SLIDE 84

Histogram of Success Probability

(a) DW

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SA

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SQA

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SSSV

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 35 / 40

slide-85
SLIDE 85

Histogram of Success Probability

(a) DW

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(b) SA

Success Probability Number of Instance 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(c) SQA

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

(d) SSSV

Success Probability Number of Instance 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200

Yazhen (at UW-Madison) 36 / 40

slide-86
SLIDE 86

Shape Pattern Analysis by Regression

Covariates

Energy gap & Hamming distance between ground state and 1st excited state

Yazhen (at UW-Madison) 37 / 40

slide-87
SLIDE 87

Shape Pattern Analysis by Regression

SQA

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (c) SQA Energy Gap Success Probability

10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (d) SQA Hamming Distance Success Probability

Covariates

Energy gap & Hamming distance between ground state and 1st excited state

Yazhen (at UW-Madison) 37 / 40

slide-88
SLIDE 88

Shape Pattern Analysis by Regression

SQA

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (c) SQA Energy Gap Success Probability

10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (d) SQA Hamming Distance Success Probability

SSSV

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (e) SSSV Energy Gap Success Probability

  • 10

20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (f) SSSV Hamming Distance Success Probability

Covariates

Energy gap & Hamming distance between ground state and 1st excited state

Yazhen (at UW-Madison) 37 / 40

slide-89
SLIDE 89

Shape Pattern Analysis by Regression

SQA

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (c) SQA Energy Gap Success Probability

10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (d) SQA Hamming Distance Success Probability

SA

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (a) SA Energy Gap Success Probability

  • 10

20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (b) SA Hamming Distance Success Probability

SSSV

  • 2

4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 (e) SSSV Energy Gap Success Probability

  • 10

20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 (f) SSSV Hamming Distance Success Probability

Covariates

Energy gap & Hamming distance between ground state and 1st excited state

Yazhen (at UW-Madison) 37 / 40

slide-90
SLIDE 90

Shape Pattern Analysis by Regression

SQA

SQA with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 SQA with Hammming distance at least 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 20 40 60 80 100 120 140

Yazhen (at UW-Madison) 38 / 40

slide-91
SLIDE 91

Shape Pattern Analysis by Regression

SQA

SQA with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 SQA with Hammming distance at least 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 20 40 60 80 100 120 140

SSSV

SSSV with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 SSSV with Hammming distance at least 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150

Yazhen (at UW-Madison) 38 / 40

slide-92
SLIDE 92

Shape Pattern Analysis by Regression

SQA

SQA with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 SQA with Hammming distance at least 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 20 40 60 80 100 120 140

SA

SA with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 20 40 60 80 100 120 SA with Hammming distance at least 5 Success Probability Frequency 0.2 0.4 0.6 0.8 1.0 10 20 30

SSSV

SSSV with Hammming distance less than 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 SSSV with Hammming distance at least 5 Success Probability Frequency 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150

Yazhen (at UW-Madison) 38 / 40

slide-93
SLIDE 93

Concluding Remarks

Both inference and computing are inportant for big data.

Yazhen (at UW-Madison) 39 / 40

slide-94
SLIDE 94

Concluding Remarks

Both inference and computing are inportant for big data. Interface

  • Computing for conducting statistical inference; and statistics for

analyzing computational algorithms.

  • Statistics for quantum technology (e.g. quantum computing &

tomography), and quantum computing for statistical computing and machine learning.

Yazhen (at UW-Madison) 39 / 40