Statistical Aspects of Quantum Computing Yazhen Wang Department of - PowerPoint PPT Presentation

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of Wisconsin-Madison http://www.stat.wisc.edu/ ∼ yzwang Near-term Applications of Quantum Computing Fermilab, December 6-7, 2017 Yazhen (at UW-Madison) 1 / 40

Outline Statistical learning with quantum annealing Statistical analysis of quantum computing data Yazhen (at UW-Madison) 2 / 40

Statistics and Optimization MLE/M-estimation, Non-parametric smoothing, · · · n L ( θ ; X n ) = 1 � • Stochastic optimization problem: min ℓ ( θ ; X i ) n θ i = 1 • Minimization solution gives an estimator or a classifier. Examples : ℓ ( θ ; X i ) = log pdf ; residual square sum / loss + penalty Yazhen (at UW-Madison) 3 / 40

Statistics and Optimization MLE/M-estimation, Non-parametric smoothing, · · · n L ( θ ; X n ) = 1 � • Stochastic optimization problem: min ℓ ( θ ; X i ) n θ i = 1 • Minimization solution gives an estimator or a classifier. Examples : ℓ ( θ ; X i ) = log pdf ; residual square sum / loss + penalty Take g ( θ ) = E [ L ( θ ; X n )] = E [ ℓ ( θ ; X 1 )] • Optimization problem: min g ( θ ) θ • Minimization solution defines a true parameter value. Yazhen (at UW-Madison) 3 / 40

Statistics and Optimization MLE/M-estimation, Non-parametric smoothing, · · · n L ( θ ; X n ) = 1 � • Stochastic optimization problem: min ℓ ( θ ; X i ) n θ i = 1 • Minimization solution gives an estimator or a classifier. Examples : ℓ ( θ ; X i ) = log pdf ; residual square sum / loss + penalty Take g ( θ ) = E [ L ( θ ; X n )] = E [ ℓ ( θ ; X 1 )] • Optimization problem: min g ( θ ) θ • Minimization solution defines a true parameter value. Goals: Use data X n to do the following (i) Evaluate estimators/classifiers (minimization solutions) Computing (ii) Statistical study of estimators/classifiers – Inference Yazhen (at UW-Madison) 3 / 40

Computer Power Demand Yazhen (at UW-Madison) 4 / 40

Computer Power Demand BIG DATA Yazhen (at UW-Madison) 4 / 40

Computer Power Demand Scientific Studies and BIG DATA Computational Applications Yazhen (at UW-Madison) 4 / 40

Learning examples Machine learning and compressed sensing • Matrix completion, matrix factorization, tensor decomposition, phase retrieval, neural network. Yazhen (at UW-Madison) 5 / 40

Learning examples Machine learning and compressed sensing • Matrix completion, matrix factorization, tensor decomposition, phase retrieval, neural network. History Yazhen (at UW-Madison) 5 / 40

Learning examples Machine learning and compressed sensing • Matrix completion, matrix factorization, tensor decomposition, phase retrieval, neural network. History Dog vs cat Yazhen (at UW-Madison) 5 / 40

Learning examples Machine learning and compressed sensing • Matrix completion, matrix factorization, tensor decomposition, phase retrieval, neural network. Neural network: Layers in a chain structure Each layer is a function of the layer preceded it. Layer j : h j = g j ( a j h j − 1 + b j ) , ( a j , b j ) = weights, g j = activation function (sigmoid, softmax or rectifier) History Dog vs cat Yazhen (at UW-Madison) 5 / 40

Gradient Descent Alorithms: Solve min θ g ( θ ) Gradient descent algorithm • Start at initial value x 0 , x k = x k − 1 − δ ∇ g ( x k − 1 ) , δ = learning rate , ∇ = derivative operator Yazhen (at UW-Madison) 6 / 40

Gradient Descent Alorithms: Solve min θ g ( θ ) Gradient descent algorithm • Start at initial value x 0 , x k = x k − 1 − δ ∇ g ( x k − 1 ) , δ = learning rate , ∇ = derivative operator Accelerated Gradient descent algorithm (Nesterov) • Start at initial values x 0 and y 0 = x 0 , y k = x k + k − 1 x k = y k − 1 − δ ∇ g ( y k − 1 ) , k + 2 ( x k − x k − 1 ) Yazhen (at UW-Madison) 6 / 40

Gradient Descent Alorithms: Solve min θ g ( θ ) Gradient descent algorithm • Start at initial value x 0 , x k = x k − 1 − δ ∇ g ( x k − 1 ) , δ = learning rate , ∇ = derivative operator Continuous curve X t to approximate discrete { x k : k ≥ 0 } X t = derivative = dX t ˙ ˙ Differential equation: X t + ∇ g ( X t ) = 0 , dt Accelerated Gradient descent algorithm (Nesterov) • Start at initial values x 0 and y 0 = x 0 , y k = x k + k − 1 x k = y k − 1 − δ ∇ g ( y k − 1 ) , k + 2 ( x k − x k − 1 ) Yazhen (at UW-Madison) 6 / 40

Gradient Descent Alorithms: Solve min θ g ( θ ) Gradient descent algorithm • Start at initial value x 0 , x k = x k − 1 − δ ∇ g ( x k − 1 ) , δ = learning rate , ∇ = derivative operator Continuous curve X t to approximate discrete { x k : k ≥ 0 } X t = derivative = dX t ˙ ˙ Differential equation: X t + ∇ g ( X t ) = 0 , dt Accelerated Gradient descent algorithm (Nesterov) • Start at initial values x 0 and y 0 = x 0 , y k = x k + k − 1 x k = y k − 1 − δ ∇ g ( y k − 1 ) , k + 2 ( x k − x k − 1 ) Continuous curve X t to approximate discrete { x k : k ≥ 0 } X t = d 2 X t X t + 3 Differential equation: ¨ ˙ ¨ X t + ∇ g ( X t ) = 0 , dt 2 t Yazhen (at UW-Madison) 6 / 40

Gradient Descent Alorithms: Solve min θ g ( θ ) Gradient descent algorithm • Start at initial value x 0 , x k = x k − 1 − δ ∇ g ( x k − 1 ) , δ = learning rate , ∇ = derivative operator Continuous curve X t to approximate discrete { x k : k ≥ 0 } X t = derivative = dX t ˙ ˙ Differential equation: X t + ∇ g ( X t ) = 0 , dt Convergence to the minimization solution at rate= 1 / k or 1 / t ( ↑ ) as t , k → ∞ . For the ccelerated case: Rate = 1 / k 2 or 1 / t 2 ( ↓ ) Accelerated Gradient descent algorithm (Nesterov) • Start at initial values x 0 and y 0 = x 0 , y k = x k + k − 1 x k = y k − 1 − δ ∇ g ( y k − 1 ) , k + 2 ( x k − x k − 1 ) Continuous curve X t to approximate discrete { x k : k ≥ 0 } X t = d 2 X t X t + 3 Differential equation: ¨ ˙ ¨ X t + ∇ g ( X t ) = 0 , dt 2 t Yazhen (at UW-Madison) 6 / 40

Stochastic Gradient Descent Stochastic optimization: min θ L ( θ ; X n ) , X n = ( X 1 , · · · , X n ) • Gradient descent algorithm to compute x k iteratively n x k = x k − 1 − δ ∇L ( x k − 1 ; X n ) , ∇L ( θ ; X n ) = 1 � ∇ ℓ ( θ ; X i ) n i = 1 Yazhen (at UW-Madison) 7 / 40

Stochastic Gradient Descent Stochastic optimization: min θ L ( θ ; X n ) , X n = ( X 1 , · · · , X n ) • Gradient descent algorithm to compute x k iteratively n x k = x k − 1 − δ ∇L ( x k − 1 ; X n ) , ∇L ( θ ; X n ) = 1 � ∇ ℓ ( θ ; X i ) n i = 1 BigData: expensive to evaluate all ∇ ℓ ( θ ; X i ) at each iteration • Replace ∇L ( θ ; X n ) by m m ) = 1 ∇ ˆ L m ( θ ; X ∗ � ∇ ℓ ( θ ; X ∗ j ) , m ≪ n m j = 1 X ∗ m = ( X ∗ 1 , · · · , X ∗ m ) = subsample of X n (minibatch or bootstrap sample). Yazhen (at UW-Madison) 7 / 40

Stochastic Gradient Descent Stochastic optimization: min θ L ( θ ; X n ) , X n = ( X 1 , · · · , X n ) • Gradient descent algorithm to compute x k iteratively n x k = x k − 1 − δ ∇L ( x k − 1 ; X n ) , ∇L ( θ ; X n ) = 1 � ∇ ℓ ( θ ; X i ) n i = 1 BigData: expensive to evaluate all ∇ ℓ ( θ ; X i ) at each iteration • Replace ∇L ( θ ; X n ) by m m ) = 1 ∇ ˆ L m ( θ ; X ∗ � ∇ ℓ ( θ ; X ∗ j ) , m ≪ n m j = 1 X ∗ m = ( X ∗ 1 , · · · , X ∗ m ) = subsample of X n (minibatch or bootstrap sample). Stochastic gradient descent algorithm x ∗ k = x ∗ k − 1 − δ ∇ ˆ L m ( x ∗ k − 1 ; X ∗ m ) Yazhen (at UW-Madison) 7 / 40

Stochastic Gradient Descent Stochastic optimization: min θ L ( θ ; X n ) , X n = ( X 1 , · · · , X n ) • Gradient descent algorithm to compute x k iteratively n x k = x k − 1 − δ ∇L ( x k − 1 ; X n ) , ∇L ( θ ; X n ) = 1 � ∇ ℓ ( θ ; X i ) n i = 1 BigData: expensive to evaluate all ∇ ℓ ( θ ; X i ) at each iteration • Replace ∇L ( θ ; X n ) by m m ) = 1 ∇ ˆ L m ( θ ; X ∗ � ∇ ℓ ( θ ; X ∗ j ) , m ≪ n m j = 1 X ∗ m = ( X ∗ 1 , · · · , X ∗ m ) = subsample of X n (minibatch or bootstrap sample). Stochastic gradient descent algorithm x ∗ k = x ∗ k − 1 − δ ∇ ˆ L m ( x ∗ k − 1 ; X ∗ m ) Continuous curve X ∗ t to approximate discrete { x ∗ k : k ≥ 0 } X ∗ t obeys stochastic differential equation. Yazhen (at UW-Madison) 7 / 40

Gradient Descent vs Stochastic Gradient Descent Gradient Descent Yazhen (at UW-Madison) 8 / 40

Gradient Descent vs Stochastic Gradient Descent Gradient Descent Stochastic gradient descent Yazhen (at UW-Madison) 8 / 40

Statistical Analysis of Gradient Descent (Wang, 2017) Continuous curve model Stochastic differential equation: dX ∗ t + ∇ g ( X ∗ t ) dt + σ ( X ∗ t ) dW t = 0 W t = Brownian motion For the accelerated case: 2nd order stochastic differential equation Yazhen (at UW-Madison) 9 / 40

Statistical Analysis of Gradient Descent (Wang, 2017) Continuous curve model Stochastic differential equation: dX ∗ t + ∇ g ( X ∗ t ) dt + σ ( X ∗ t ) dW t = 0 W t = Brownian motion For the accelerated case: 2nd order stochastic differential equation and their asymptotic distribution as m , n → ∞ via stochastic differential equations Yazhen (at UW-Madison) 9 / 40

Statistical Aspects of Quantum Computing Yazhen Wang Department of - PowerPoint PPT Presentation

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of Wisconsin-Madison http://www.stat.wisc.edu/ yzwang Near-term Applications of Quantum Computing Fermilab, December 6-7, 2017 Yazhen (at UW-Madison)

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Computation Quantum Computing: . . . Potential Use of . . . in Quantum Space-Time Quantum

Quantum Cryptography 1. Fake Quantum Theory. 2. Simple Quantum Protocols. 3. More Fake Quantum

How Quantum Cryptography Quantum . . . and Quantum Computing How Quantum . . . How to Deal with

Quantum Computing Quantum Computing be found at: be found at: Samuel J. Lomonaco, Jr. Samuel

Quantum Weirdness Part 6 Quantum Weirdness in Materials Quantum Cryptography Quantum

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Quantum Computing Superdense Coding Measurement Revisited Quantum Teleportation Sushain

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Quantum Cryptography Lecture 28 Quantum Cryptography Quantum Cryptography Quantum information:

Quantum Computing - An Introduction Cris Cecka April 10, 2006 Cris Cecka Quantum Computing - An

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

How about quantum computing? Bert de Jong wadejong@lbl.gov - 1 - What makes quantum computing

Quantum Integer Programming 47-779 Quantum Annealing 1 William Larimer Mellon, Founder

Statistical Aspects of Quantum State Monitoring and Applications Actually, statistics of

Quantum Quantum Architectures Architectures June 1, 2005 June 1, 2005 Computing? Computing?

5 Pain Points of Agile Development Ralf Gronkowski Principal Product Consultant

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Stock valuation Chapter 10 1 Principles Applied in This Chapter Principle 1: Money Has a

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

2/27/2019 1. 1 2. 2 3. 3 www.hingebrokers.com 1 2/27/2019 4. 4 5. 5 6. 6

cognitive models physical and device architectural Cognitive models Goal and task

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Statistical Aspects of Quantum Computing Yazhen Wang Department of - PowerPoint PPT Presentation

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of Wisconsin-Madison http://www.stat.wisc.edu/ yzwang Near-term Applications of Quantum Computing Fermilab, December 6-7, 2017 Yazhen (at UW-Madison)

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Computation Quantum Computing: . . . Potential Use of . . . in Quantum Space-Time Quantum

Quantum Cryptography 1. Fake Quantum Theory. 2. Simple Quantum Protocols. 3. More Fake Quantum

How Quantum Cryptography Quantum . . . and Quantum Computing How Quantum . . . How to Deal with

Quantum Computing Quantum Computing be found at: be found at: Samuel J. Lomonaco, Jr. Samuel

Quantum Weirdness Part 6 Quantum Weirdness in Materials Quantum Cryptography Quantum

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Quantum Computing Superdense Coding Measurement Revisited Quantum Teleportation Sushain

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

Quantum Cryptography Lecture 28 Quantum Cryptography Quantum Cryptography Quantum information:

Quantum Computing - An Introduction Cris Cecka April 10, 2006 Cris Cecka Quantum Computing - An

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

How about quantum computing? Bert de Jong wadejong@lbl.gov - 1 - What makes quantum computing

Quantum Integer Programming 47-779 Quantum Annealing 1 William Larimer Mellon, Founder

Statistical Aspects of Quantum State Monitoring and Applications Actually, statistics of

Quantum Quantum Architectures Architectures June 1, 2005 June 1, 2005 Computing? Computing?

5 Pain Points of Agile Development Ralf Gronkowski Principal Product Consultant

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer &amp; Angi R osch,

Stock valuation Chapter 10 1 Principles Applied in This Chapter Principle 1: Money Has a

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

2/27/2019 1. 1 2. 2 3. 3 www.hingebrokers.com 1 2/27/2019 4. 4 5. 5 6. 6

cognitive models physical and device architectural Cognitive models Goal and task

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,