Probabilistic Modelling with Tensor Networks: From Hidden Markov - PowerPoint PPT Presentation

Probabilistic Modelling with Tensor Networks: From Hidden Markov Models to Quantum Circuits Ryan Sweke Freie Universität Berlin

The Big Picture “Machine Learning” Classical ML Quantum ML Heuristics Statistical Learning Theory Heuristics Statistical Learning Theory • Abstract settings • Few models • Sophisticated models • Simplified models • Very few results • Very little understanding • Incredible results • Often loose bounds • Q vs C?! • Very little understanding • Hard! Tensor networks TN’s provide a nice language to bridge heuristics with theory, and quantum with classical!

⃗ ⃗ ⃗ ⃗ What is this talk about? This talk is about Probabilistic Modelling… d M } from an unknown discrete multivariate probability distribution P ( X 1 , …, X N ) . Given: Samples { d 1 , …, d j = ( X j 1 , …, X j X i ∈ {1,…, d } N ) Task: “Learn” a parameterized model P ( X 1 , …, X N | θ ) . This may mean many di ff erent things, depending on the task you are interested in… Performing inference (i.e. calculating marginals). Calculating expectation values. Generating samples. Depending on your goal, your model/approach may di ff er significantly!

⃗ ⃗ ⃗ ⃗ ⃗ ⃗ Probabilistic Modelling I like to think of there being three distinct elements: (1) The model P ( X 1 , …, X N | Key Question: Expressivity? θ ) . (2) The learning algorithm: { Model Dependent! d M } → d 1 , …, θ Typically by maximising the (log) likelihood: ℒ = ∑ log[ P ( d i | θ )] i (3) The “task” algorithm. Model Dependent! Performing inference via belief propagation for Probabilistic Graphical Models. Expectation values via sampling for Boltzmann Machines. Generating samples directly via a GAN.

⃗ Probabilistic Modelling This overall picture is summarised quite nicely by the following “hierarchy of generative models”: Maximum Likelihood Explicit Density Implicit Density P ( X 1 , …, X N | θ ) Tractable Density Approximate Density (Some) Probabilistic Graphical Models Boltzmann Machines GANs VAE We focus here!

Probabilistic Graphical Models We will see that tensor networks provide a unifying framework for analyzing probabilistic graphical models: Probabilistic Graphical Models Bayesian Networks Markov Random Fields (Directed Acyclic Graphs) (General Graphs) Factor Graphs Tensor Networks

Tensor Networks Tensor network notation provides a powerful and convenient diagrammatic language for tensor manipulation... We represent tensors as boxes, with an "open leg" for each tensor index - A vector is a 1-tensor: An element of the vector is a scalar ("close" the index) - A matrix is a 2-tensor: "vectorization" is very natural in this notation... - A shared index denotes a contraction over that index:

Tensor Networks A discrete multivariate probability distribution is naturally represented as an N-tensor... P ( X 1 , …, X N ) = d N parameters! A tensor network decomposition of P is a decomposition into a network of contracted tensors... r Matrix Product State P ( X 1 , …, X N ) = parameters! We call the bond dimension - directly related to the underlying correlation structure. r eg: for independent (uncorrelated random variables). These representations are very well understood in the context of many-body quantum physics.

Probabilistic Graphical Models: Bayesian Networks X N X 1 Given a probability distribution P ( X 1 , …, X N ) = A BN models this distribution via a directed acyclic graph expressing the structure of conditional dependencies. For example: A Hidden Markov Model… H 1 H 1 H 1 H 1 H 1 H 2 H 2 H 2 H 2 H 2 H 3 H 3 H 3 H 1 < d N parameters! X 1 X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 1 X 1 P ( H 1 | X 1 ) P ( H 2 | H 1 ) P ( X 2 | H 2 ) P ( H 3 | H 2 ) P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) = P ( X 1 ) P ( X 3 | H 3 ) The probability of “visible” variables is via marginalisation: ∑ P ( X 1 , X 2 , X 3 ) = P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) H 1 , H 2 , H 3

Probabilistic Graphical Models: Markov Random Fields X N X 1 Given a probability distribution P ( X 1 , …, X N ) = A Markov Random Field models the distribution via the product of clique potentials defined by a generic graph. maximal fully-connected subgraph For example: H 1 H 2 X 1 X 2 X 3 1 P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) = g 1 ( X 1 , H 1 ) g 2 ( H 1 , X 2 , H 2 ) g 3 ( H 2 , X 3 ) Z NB - Clique potentials are not normalised - explicit normalisation is necessary!

⃗ Probabilistic Graphical Models: Factor Graphs Bayesian Networks and Markov Random Fields are unified via Factor Graphs… P ( X 1 , …, X N ) = 1 Z ∏ f j ( X j ) j Bayesian Networks: Factors are conditional probability distributions (inherently normalised) Markov Random Fields: Factors are clique potentials (explicit normalisation necessary) Explicitly: H 1 H 2 H 3 H 1 H 2 H 3 f 2 f 4 f 1 f 3 f 5 X 1 X 2 X 3 X 1 X 2 X 3 H 1 H 2 H 1 H 2 f 2 f 1 f 3 X 1 X 2 X 3 X 1 X 2 X 3

Probabilistic Graphical Models: Factor Graphs to Tensor Networks Let’s consider the Hidden Markov Model in more detail… H 1 H 2 H 3 f 4 f 2 Marginalizing out the hidden variables means contracting the connected factor tensors! f 1 f 3 f 5 X 1 X 2 X 3 ∑ H 2 H 1 H 3 f 1 f 5 X 2 X 1 X 3 ∑ H 3 ∑ H 1 The probability distribution over the visible variables is exactly equivalent to an MPS decomposition of the global probability tensor! X 1 X 2 X 3 With non-negative tensors!

Probabilistic Graphical Models: Factor Graphs to Tensor Networks The other direction also holds… X 1 X 2 X 3 Exact non-negative canonical polyadic decomposition H 1 H 2 H 3 X 1 X 2 X 3 Contract H 1 H 2 H 3 Hidden detail: r ′ � ≤ min( dr , r 2 ) X 1 X 2 X 3 Hidden Markov Models and non-negative MPS are almost exactly equivalent

Probabilistic Graphical Models: Factor Graphs to Tensor Networks Take home message - we can use Tensor Networks to study and to generalise probabilistic graphical models! X N X 1 Any tensor network which yields a non-negative tensor when contracted! = Includes all probabilistic graphical models See I. Glasser et al “Supervised Learning with generalised tensor networks” (Formal connection and heuristic algorithms) Goal: By studying MPS based decompositions can we… Make rigorous claims concerning expressivity? Draw connections to quantum circuits? Make claims concerning expressivity of classical vs quantum models? Yes.

Tensor Network Models: HMM are MPS The first model we consider is non-negative MPS - which we already showed are equivalent to HMM… X 1 X N X 1 X N r A 1 A N T NB: All tensors have only non-negative (real) entries! We call the minimal bond dimension r necessary to factorise T exactly the TT − Rank ℝ≥ 0 . ``Tensor-Train” rank The bond-dimension necessary to represent a class of tensors characterises the expressivity of the model!

Tensor Network Models: HMM are MPS Note that for probability distributions over two variables (matrices) the TT − Rank ℝ≥ 0 is the non-negative rank: X 1 X 2 X 1 X 2 r T A B i.e. the smallest r such that T = AB with and non-negative. B A Not such an easy rank to determine! (NP-hard to determine whether rank is equal to non-negative rank.)

Tensor Network Models: Born Machines The second model we consider is Born Machines … X 1 X N X 1 X N r A 1 A N T A † A † 1 N X 1 X N We can use either real or complex tensors! We call the minimal bond dimension r necessary to factorise T exactly the Born − Rank ℝ / ℂ .

⃗ Tensor Network Models: Born Machines In the case of only two variables this is the real/complex Hadamard (entry-wise) square root rank… X 1 X 2 X 1 X 2 r A B AB is an element wise square root! T A † B † r X 1 X 2 such that T = | AB | ∘ 2 i.e. the smallest r In the real case: ± ± t 11 … t 1 d ± [ rank ] 2 d 2 combinations - bad fast! r = min ⋮ ⋮ ± ± t d 1 … t dd In the complex case: e i θ 11 e i θ 1 d … t 11 t 1 d θ [ rank ] even worse :( ⋮ ⋮ r = min e i θ d 1 e i θ dd t d 1 … t dd

Tensor Network Models: Born Machines d D +1 . are described exactly by a BM of bond dimension Outcome probabilities of a 2-local quantum circuit of depth D SVD contract d d 2 d D +1 . d | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ The probability of a measurement outcome is described by the BM defined via the circuit MPS: X N X 1 P ( X 1 , …, X N ) = X 1 X N

Tensor Network Models: Locally Purified States The final model we consider is Locally Purified States … X 1 X N X 1 X N r A 1 A N T μ A † A † N 1 X 1 X N We can use either real or complex tensors! We call the minimal bond dimension r necessary to factorise T exactly the Puri − Rank ℝ / ℂ . In the case of only two variables this is the positive-semidefinite rank

Tensor Network Models: Locally Purified States In the case of only two variables this is positive semidefinite rank… Given a matrix the PSD rank is the smallest for which there exist positive semidefinite matrices M , A i , B j r of size such that r × r M = Tr( A i B j ) . j i j i j j i i B A A † A C D B B † i j M A † B † i j

Probabilistic Modelling with Tensor Networks: From Hidden Markov - PowerPoint PPT Presentation

Probabilistic Modelling with Tensor Networks: From Hidden Markov Models to Quantum Circuits Ryan Sweke Freie Universitt Berlin The Big Picture Machine Learning Classical ML Quantum ML Heuristics Statistical Learning Theory

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Probabilistic Modelling and Reasoning Exam Info Michael Gutmann Probabilistic Modelling

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Renormalization of Tensor Network States II. RG of Tensor Network States Tao Xiang Institute of

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo

Tensor Invariants and Kronecker Coefficients Jiarui Fei University of California, Riverside

Lax Gray tensor product for 2-quasi-categories Yuki Maehara Macquarie University CT 2019 Yuki

dependence of J/ suppression in Pb-Pb collisions at s NN =2.76TeV ALICE Collaboration, May

A Discrete Strategy Improvement Algorithm for Solving Parity Games Jens V oge Marcin

From Object Fields to Local Variables: a Practical Approach to Field-Sensitive Analysis Elvira

Stable finiteness and pure infiniteness of the C -algebras of higher-rank graphs Astrid an

GCF 26th Meeting of the Board (B.26) Diana Dorman PhD Student | Environmental Studies University

Covid-19- Supply Chain Agility CIPS Bahrain Branch Presented by GP Puri Creating Supply Chain

Work-in-Progress Abstract Compiler-Assisted Scientific Workflow Optimization Hadia Ahmed 1 , Peter

Understanding Lighting Data What is the objective of the study? To increase knowledge about