F ACTOR graph [1], [2], or more often referred to as the classical - - PDF document

f
SMART_READER_LITE
LIVE PREVIEW

F ACTOR graph [1], [2], or more often referred to as the classical - - PDF document

SECOND YEAR REPORT 1 Quantum Factor Graphs: Closing-the-Box Operation and Variational Approaches Michael X. CAO, PhD Pre-Candidacy Student Abstract Factor graph model is a popular statistical graphical model, where a number of practical


slide-1
SLIDE 1

SECOND YEAR REPORT 1

Quantum Factor Graphs: Closing-the-Box Operation and Variational Approaches

Michael X. CAO, PhD Pre-Candidacy Student

Abstract Factor graph model is a popular statistical graphical model, where a number of practical problems can be abstracted as marginal problems on factor graphs, including problems from the fields of statistical physics, machine learning, coding theory, and signal

  • processing. The sum-product algorithm is a powerful algorithm to solve the marginal problems on factor graphs. The algorithm

has been justified using a number of different approaches which include the closing-the-box notion and the variational approach. In this report, we consider a generalization of factor graphs known as quantum factor graphs, along with a generalization of the sum-product algorithm known as the quantum sum-product algorithm. Our work is to migrate the notion of the closing-the-box

  • perations and the method of the variational approach to the new quantum setup. In particular, we consider a generalization of

the Bethe free energy and the related concepts on quantum factor graphs. Some expressions that hold exactly in the classical case hold only approximately in the quantum case; we give some analytical and numerical characterizations of these approximations.

  • I. INTRODUCTION

F

ACTOR graph [1], [2], or more often referred to as the classical factor graph (CFG) in this report, is a graphical model representing factorizations of functions with multiple variables in real or complex domain. In particular, serving as a popular variant of probabilistic graphical models [3], factor graphs have been proven useful in describing probability factorizations and solving the related marginal problems. The latter problem represents the essence of many practical problems in a number

  • f scientific/engineering fields including statistical physics, machine learning, coding theory, and signal processing. Famous

applications include the Ising model [4] and LDPC codes [5]. As a brief introduction to CFGs, we associate the factorization below g(x)

  • a∈F

fa(xa) (1) to the CFG with variable node set V, function node set F, and edge set E ⊆ V × F given by E = {(i, a) ∈ V × F : i ∈ ∂a} . (2) Here, x (xi)i∈V, xa (xi)i∈∂a, ∂a ⊆ V, and xi ∈ Xi. A fundamental problem is to calculate the so called partition sum

  • f the CFG, which is defined as

Z

  • x g(x)

XV is finite;

  • x g(x)dx

XV is continuous. (3) In this report, we only consider the finite case with non-negative local functions, i.e., fa(xa) ∈ R≥0 for all a ∈ F. In this case, the global function g is always a measure function of x. In general, calculation of the partition sum is an NP hard problem. However, in the case of acyclic CFGs, Z can always be computed efficiently by the so called sum-product algorithm (SPA). The main idea is to take advantage of the distributive law

  • f multiplication over addition in the filed of real numbers (R). In the following examples, we use rectangle and circle nodes

to represent factor nodes and variable nodes in CFGs. Here, we also introduce the notion of normal CFGs where variables are represented by edges [1], [6], [7]. For example, in Fig. 1, CFGs (b) and (d) are the normal versions of CFGs (a) and (c), respectively. Example 1. Consider the CFG (a) (or (b)) in Fig. 1 with variable node set V = {1, 2, 3, 4} and function node set F = {A, B, C}. This CFG depicts a global function factorized as g(x1, x2, x3, x4) = fA(x1) · fB(x1, x2, x3) · fC(x3, x4). (4) With respect to the above factorization, the corresponding partition sum is given as Z =

  • x1,x2,x3,x4

fA(x1) · fB(x1, x2, x3) · fC(x3, x4) (5) =

  • x3
  • x1
  • fA(x1) ·
  • x2

fB(x1, x2, x3)

  • ·
  • x4

fC(x3, x4). (6) Assuming each alphabet Xi (i = 1, · · · , 4) to be binary, it will take 16 steps of summation in evaluating (5). Whereas, evaluating (6) takes 8 steps of summation, which is clearly more efficient than the direct evaluation of (5).

slide-2
SLIDE 2

2 SECOND YEAR REPORT

fA fB fC

X1 X2 X3 X4

fA fB fC fA fB fC

X1 X2 X3 X4

fA fB fC = (a) (b) (c) (d)

  • Fig. 1. CFGs appeared in Examples 1 and 2.

The process of the above reformulation ((5) to (6)) can be easier understood in terms of the closing-the-box notation [1], where we always “close” the most inner boxes by replacing it with the result of the summing over the interior variable(s). The process ends when the most outer box gets closed, which yields the partition sum Z.(We refer to [1] for details.) Interestingly, such notation can also be applied to graph with cycles. This allows the method of the SPA to be applied to more general setups. Example 2. The CFG (c) (or (d)) in Fig. 1 is not acyclic, and has the global function g(x) = fA(x1, x3) · fB(x1, x2, x3) · fC(x3, x4). (7) However, we can still simplify the expression of the partition sum using the same technique. Z =

  • x1,x2,x3,x4

fA(x1, x3) · fB(x1, x2, x3) · fC(x3, x4) (8) =

  • x3
  • x1
  • fA(x1, x3) ·
  • x2

fB(x1, x2, x3)

  • ·
  • x4

fC(x3, x4). (9)

  • It seems that the closing-the-box notation helps to generalize the SPA such that we can apply the algorithm to CFGs with
  • cycles. However, such technique fails in more general settings, especially in large-scale setups. (Just consider a (nearly) fully

connected normal CFG with n factors.) On the other hand, however, the SPA can also be interpreted as a message-passing algorithm, where the partial results in each step are represented as messages sent along the edges of the CFGs. The rules according to which the messages (i.e., partial results) are combined are called the SPA message update rules. Since the update rules are applied locally in CGFs, such rules can also be applied to a CFG with cycles, yielding a straightforward generalization of the SPA. Despite that the original justification of the algorithm as illustrated in Example 1 and 2 is no longer valid, the SPA and its variations still yield rather promising results in a number of real-life applications including the decoding of LDPC codes. Thus, it has become a focus of research to understand (and possibly improve) such algorithms. Related work includes the variational approach [8], the loop calculus [9], [10] and the graph covers. In this report we are interested in a generalization of CFGs called quantum factor graphs (QFGs) [11]. In QFGs, we consider “factorizations” in the following sense ρ

  • a∈F

ρa = exp

  • a∈F

log(ρa)

  • ,

(10) where {ρa}a∈F are positive definite operators. Whereas the concept of the partial sum is generalized as Z = Tr(ρ). (11) (The formal definition of ⊙ and ρa in (10) will be given later in (12).) In this sense, one can treat the CFGs as a special case of the QFGs where all the involved local operators {ρa}a∈F are diagonal. We are especially interested in suitable generalizations

  • f the aforementioned CFG techniques to QFGs. In particular, we study the closing-the-box notion and the variational approach

in QFG setup. Although there are potentially interesting quantum mechanical uses for these findings, we are more broadly interested in exploring the power of generalizations of CFGs and the sum-product algorithm. Let us note that some other quantum-mechanics-inspired generalizations of factor graphs were studied in [12], [13], [14].

slide-3
SLIDE 3

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 3

The rest of this report is organized as follows. Section II introduces QFGs and studies the corresponding closing-the-box

  • perations. Section III defines the quantum Bethe approximation and derives the quantum sum-product algorithm (QSPA).

Section IV presents a numerical example illustrating the performance of QSPA. Section V concludes the report. Throughout this report, we use the notations L (H), LH (H), L+ (H), L++ (H) to denote the set of linear operators, Hermitian operators, positive semi-definite (PSD), and strict positive definite operators on the Hilbert space H, respectively. Additionally, the set of density operators and strictly positive definite density operators on H are denoted as L+

1 (H) and

L++

1

(H), where the subscript ‘1’ indicates the trace-1 requirement. The trace of ρ ∈ L+ (H) is defined in the standard way. The partial trace of ρ ∈ L+ (H1 ⊗ H2) over H1 will be denoted by Tr1(ρ), with an analogous notation for the partial trace over H2. Note that Tr(ρ) = Tr1

  • Tr2(ρ)
  • = Tr2
  • Tr1(ρ)
  • .

The inner product on operators is defined as A, B Tr

  • AHB
  • , where AH stands for the adjoint/Hermitian of the operator
  • A. Moreover, S (ρ) − ρ, log ρ and S(σ ρ) σ, log σ − σ, log ρ denote the von Neumann entropy and the quantum

relative entropy, respectively. A large portion of the results presented in this report has been published in the conference paper under the same title coauthored with Prof. Pascal Vontobel [15]. The paper has been accepted by the International Symposium of Information Theory and Applications (ISITA), and will be presented in October 2016. In particular, Section II and III in this report contain almost same technical contents compared to the relevant sections in the conference paper. Section IV is identical to Section IV in [15]. Section I of this report (excluding the introduction on the notations and structures) has been rewritten by Mr. Cao, though there might still be a few sentences identical or partially identical to some of the sentences in the conference version. This report contains much more detailed proof compared to the conference paper, including (but not limited to) Appendix A, C, and D. In addition, Appendix B and the outlook in Section V contain some of the latest results.

  • II. QUANTUM FACTOR GRAPHS (QFGS) AND THE CLOSING-THE-BOX NOTATION

The factorization of interest in a QFG, as given in (10), is based on the special matrix/operator “product” ⊙ [16]. For any

  • perators ρA, ρB ∈ L++ (H), ρA ⊙ ρB is defined as follows

ρA ⊙ ρB exp

  • log(ρA) + log(ρB)
  • ,

(12) where exp and log denote the operator exponential and the operator natural logarithm, respectively. By consider the Lie Product formula [17], one can rewrite (12) as ρA ⊙ ρB = lim

n→∞

  • ρ

1 n

1 n

B

n . (13) Equation (13) generalizes the ⊙ product to operators ρA, ρB ∈ L+ (H).1 Notice that the ⊙ product is both associative and commutative, i.e., ρA ⊙ ρB = ρB ⊙ ρA ∀ρA, ρB ∈ L+ (H) , (14) (ρA ⊙ ρB) ⊙ ρC = ρA ⊙ (ρB ⊙ ρC) ∀ρA, ρB, ρC ∈ L+ (H) . (15) In the following text, we will also use the notation ρA ⊙ ρB when ρA and ρB are defined on different Hilbert spaces. For example, if ρA ∈ L++ (H1 ⊗ H2) and ρB ∈ L++ (H2 ⊗ H3), then ρA ⊙ ρB (ρA ⊗ I3) ⊙ (I1 ⊗ ρB), (16) where I1 and I3 are the identity operators on H1 and H3, respectively. Note that ρA ⊙ ρB is an operator on the Hilbert space H1 ⊗ H2 ⊗ H3. Similarly, we also adopt such a convention to expressions like

a∈F ρa, where equation (16) is applied

  • repeatedly. In this case, one should note that log (ρ ⊗ I) = log (ρ) ⊗ I.

Definition 3. A QFG [11] is a bipartite graph with variable node set V, function node set F, and edge set E ⊆ V × F, where with every i ∈ V we associate a Hilbert space Hi, and with every a ∈ F we associate the Hilbert space Ha

i∈∂a Hi and

some local operator ρa ∈ L+ (Ha). The QFG’s global function is then defined to be ρ

  • a∈F

ρa = exp

  • a∈F

log(ρa)

  • ,

(17) and its partition sum is defined to be Z Tr(ρ). (18)

  • Note that the global operator ρ is always a PSD operator on

i∈V Hi (i.e., ρ ∈ L+ i∈V Hi

  • ). Moreover, if all the local
  • perators are strictly positive definite (i.e., ρa ∈ L++ (Ha), we can further conclude ρ ∈ L++

i∈V Hi

  • . In the remaining
  • f this report, all Hilbert spaces Hi, i ∈ V, will be considered to be finite-dimensional.

1To be rigorous, one still need to check the convergence of the limit on the right-hand-side of (13). For such details, we refer the readers to [17] and

Theorem 1.2 in [18].

slide-4
SLIDE 4

4 SECOND YEAR REPORT

ρA ρB ρC

H1 H2

  • Fig. 2. QFG used in Example 4.

Example 4. Consider the QFG in Fig. 2 with variable node set V = {1, 2}, function node set F = {A, B, C}, and local

  • perators ρA ∈ L+ (H1), ρB ∈ L+ (H1 ⊗ H2), and ρC ∈ L+ (H2). This QFG has the global function ρ = ρA ⊙ ρB ⊙ ρC and

the partition sum Z = Tr(ρA ⊙ ρB ⊙ ρC). Notice that we have adopted the notion of the normal factor graphs in Fig. 2.

  • The price that we pay for going from CFGs to QFGs is that the operation ⊙ does in general not distribute over the partial
  • trace. In terms of the QFG in Example 4, this means that in general

Tr(ρA ⊙ ρB ⊙ ρC) = Tr1

  • Tr2(ρA ⊙ ρB ⊙ ρC)
  • = Tr1
  • ρA ⊙ Tr2(ρB ⊙ ρC)
  • .

(19) Example 5. Consider the Hilbert spaces H1 = H2 C2 and the operators X and Y acting on H1 and H1 ⊗H2, respectively, where X 1 2 ·

  • +1

−1 −1 +1

  • , Y

    1 1     . In this case, Tr1

  • Tr2(X ⊙ Y )
  • = 0 and Tr1
  • X ⊙ Tr2(Y )
  • = 1. Apparently, they are far from being equal.
  • Oftentimes, however, Tr1
  • ρA ⊙ Tr2(ρB ⊙ ρC)
  • approximates Tr1
  • Tr2(ρA ⊙ ρB ⊙ ρC)
  • reasonably well. A central topic of

this report is to understand the cases where this happens, so that an approximate notion of closing-the-box can be salvaged. Lemma 6. Given ρA ∈ L++ (H1), ρB ∈ L++ (H1 ⊗ H2) and ρC ∈ L++ (H2), the quantities appearing in (19) satisfy S

  • κ(ρA)

−1 Tr(ρA ⊙ ρB ⊙ ρC) Tr1

  • ρA ⊙ Tr2(ρB ⊙ ρC)

S

  • κ(ρA)
  • ,

(20) where κ(ρA) 1 is the condition number of the operator ρA, and S(·) is the Specht ratio function defined as S(r) (r − 1) · r

1 r−1

e · log r . (21)

  • Proof. Consider the Golden–Thompson inequality and its reverse [19], namely,

Tr

  • eV +W

Tr

  • eV eW

S(α) · Tr

  • eV +W

, (22) where V and W are Hermitian operators, and α is the condition number of eV . Notice that for (strict) positive definite operators ρ1, ρ2, operators log ρ1 and log ρ2 are Hermitian. Thus, by substituting V log ρ1, W log ρ2 into (22), we obtain Tr (ρ1 ⊙ ρ2) Tr (ρ1ρ2) S(κ(ρ1)) · Tr (ρ1 ⊙ ρ2) . (23) The first inequality in (20) follows from Tr1 (ρA ⊙ Tr2 (ρB ⊙ ρC)) Tr1 (ρA · Tr2 (ρB ⊙ ρC)) (24) = Tr (ρA · (ρB ⊙ ρC)) (25) S

  • κ(ρA)
  • · Tr (ρA ⊙ (ρB ⊙ ρC))

(26) = S

  • κ(ρA)
  • · Tr (ρA ⊙ ρB ⊙ ρC) ,

(27) where we apply the left inequality of (23) in (24) by substituting ρ1 = ρA, ρ2 = Tr2 (ρB ⊙ ρC), and the right inequality in (26) by substituting ρ1 = ρA, ρ2 = ρB ⊙ ρC. Similarly, the second inequality in (20) can be justified via Tr (ρA ⊙ ρB ⊙ ρC) Tr (ρA · (ρB ⊙ ρC)) (28) = Tr1 (ρA · Tr2 (ρB ⊙ ρC)) (29) S

  • κ(ρA)
  • · Tr1 (ρA ⊙ Tr2 (ρB ⊙ ρC)) ,

(30) where the use of (23) appears in (28) and (30), respectively. Finally, equation (22) can be obtained by taking divisions on both sides of inequality (27) and (30) w.r.t. Tr1 (ρA ⊙ Tr2 (ρB ⊙ ρC)) . Note that Tr1 (ρA ⊙ Tr2 (ρB ⊙ ρC)) > 0 since ρA, ρB, and ρC are strictly positive definite.

slide-5
SLIDE 5

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 5

ρ1 ρ2 ρ3 ρN−1 ρN · · ·

  • Fig. 3. A Chain QFG. (N 3)

Lemma 6 indicates that Tra (ρA ⊙ Tr2 (ρB ⊙ ρC)) should approximate Tr (ρA ⊙ ρB ⊙ ρC) reasonably well when ρA (or ρB ⊙ρC) is close to the identity matrix. Following theorem identifies such approximation given that ρA (or ρB ⊙ρC) is close to the identity matrix in a linear fashion, i.e., ρA = I +tX and ρB ⊙ρC = I +tY for some number t close to 0. Another approach to study such approximation is to assume ρA = etX and ρB ⊙ ρC = etY . We present the second approach in Appendix B. Theorem 7. Consider finite-dimensional Hilbert spaces H1 and H2. Given X ∈ LH (H1), and Y ∈ LH (H1 ⊗ H2), it holds that Tr

  • (I + tX) ⊙ (I + tY )
  • = Tr1
  • (I + tX) ⊙ Tr2(I + tY )
  • + O(t4),

(31) where the real number t is in a neighborhood of 0 such that I + tX and I + tY are always positive definite. In other words, Tr1

  • (I + tX) ⊙ Tr2(I + tY )
  • approximates Tr
  • (I + tX) ⊙ (I + tY )
  • when t is small, and the error is of 4-th order of t.
  • Proof. The theorem statement can be justified by the Taylor series expansion. Due to the tediousness and the calculation nature
  • f the proof, we skip the proof here. Interested readers may find the detailed proof in Appendix A.

Similarly, we also have the the following approximation. Corollary 8. Following the same setup as in Theorem 7, we have Tr2

  • (I + tX) ⊙ (I + tY )
  • = (I + tX) ⊙ Tr2(I + tY ) + O(t3).

(32) The proof of Corollary 8 can also be found in Appendix A. Corollary 9. Let N 3. Consider a chain QFG as in Fig. 3 where ρ1 ∈ L++ (H1), ρN ∈ L++ (HN−1) and ρk ∈ L++ (Hk−1 ⊗ Hk) for each k = 2, · · · , N − 1. Suppose all {ρk}N

k=1 are close to the identity matrix I in the sense that

ρk = I + t · χk for some real number t close to 0 and some Hermitian operator χk, then Theorem 7 implies the following estimation: Tr [ρ1 ⊙ ρ2 ⊙ · · · ⊙ ρN−1 ⊙ ρN] = TrN−1 {TrN−2 [TrN−3 (· · · Tr1 (ρ1 ⊙ ρ2) · · · ) ⊙ ρN−1] ⊙ ρN} + O(t4). (33)

  • Proof. We can justify (33) by mathematical induction w.r.t. N. Note that for N = 3, (33) is nothing but an instance of (31).

Now, suppose (33) is true for N = K 3. Then, for N = K + 1, we have, Tr [ρ1 ⊙ · · · ⊙ ρK+1] = Tr>1 [Tr1 (ρ1 ⊙ ρ2) ⊙ (ρ3 ⊙ ρ4 ⊙ · · · ⊙ ρK+1)] + O(t4) (34) = TrK {TrK−1 [TrK−2 (· · · Tr2 (Tr1 (ρ1 ⊙ ρ2) ⊙ ρ3) · · · ) ⊙ ρK] ⊙ ρK+1} + O(t4), (35) where (34) is due to Theorem 7 and (35) is due to the induction hypothesis. Thus, the corollary is proven. Theorem 7 and Corollary 8 establish the following approximate distributive laws of the ⊙ operation over the (partial) trace functions: Tr (A ⊙ B) ≈ Tr∂b\I (A ⊙ TrI (B)) , (36) TrI (A ⊙ B) ≈ A ⊙ TrI (B) , (37) where A ∈ L+

i∈∂a Hi

  • , B ∈ L+

i∈∂b Hi

  • are close to the identity matrix I; and the index sets I ⊆ ∂b \ ∂a, ∂a ∂b.

For the rest of this report, we shall use the notation “

≈” or “

∝” whenever (36) or (37) is used to derive an approximate equality or an approximate proportionality result, respectively. It is worthwhile to do a brief numerical comparison between Tr1

  • ρA ⊙ Tr2(ρB)
  • and Tr1
  • Tr2(ρA ⊙ ρB)
  • . Namely, we

randomly generate ρA ∈ L+ C2 and ρB ∈ L+ C4 and plot in Fig. 4 the statistical distribution of η

  • Tr1
  • ρA ⊙ Tr2(ρB)
  • − Tr1
  • Tr2(ρA ⊙ ρB)
  • Tr1
  • Tr2(ρA ⊙ ρB)
  • .

(38) Here, ρA U H

A ΛAUA and ρB U H B ΛBUB, where the unitary matrices UA and UB contain random orthonormal vectors

uniformly distributed on the corresponding complex unit spheres, and each of the diagonal entries of the diagonal matrices

slide-6
SLIDE 6

6 SECOND YEAR REPORT

2 · 10−2 4 · 10−2 6 · 10−2 8 · 10−2 0.1 0.12 0.14 0.16 0.18 0.2 10−4 10−3 10−2 10−1 100 101 102 103 Relative Error η Frequency Density =

Frequency Interval Length

  • N
  • µ, σ2

distributed Eigenvalues (µ = 1, σ = 0.25)

  • N
  • µ, σ2

distributed Eigenvalues (µ = 1, σ = 0.5)

  • N
  • µ, σ2

distributed Eigenvalues (µ = 1, σ = 1) Uniformly distributed Eigenvalues (a = 0, b = 1)

  • Fig. 4. Distribution of η for different random PSDs ρA and ρB.

ΛA and ΛB are independently generated according to different distributions as marked in the legend. Note that in Fig. 4, |N(µ, σ2)| stands for the random variable distributed according to the absolute value of a Gaussian random variable with parameters (µ, σ2). An important result following from Theorem 7 and Corollary 8 is the ability to construct global density operators from the local ones under suitable condition. This is given as the lemma below. Notice that this lemma serves as a quantum analog of Lemma 2 in [9] Lemma 10. Consider a QFG with no cycles. Assume that some density operators

  • σa ∈ L+

1 (Ha)

  • a∈F and
  • σi ∈ L+

1 (Hi)

  • i∈V

satisfy the local marginal conditions σi = Tr∂a\i (σa) ∀ (i, a) ∈ E. (39) Then, there exists a global density operator σ such that TrV\∂a (σ)

≈ σa ∀a ∈ F, (40) TrV\i (σ)

≈ σi ∀i ∈ V. (41)

  • Proof. Define the density operator σ ∈ L+

1 (⊗i∈VHi) by letting

σ ∝ exp

  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

  • ,

(42) where di is the degree of variable node i ∈ V. To prove (40), we have TrV\∂a(σ) ∝ TrV\∂a

  • exp
  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

  • (43)

= TrV\∂a

  • exp
  • log(σa) +

N

  • n=0
  • i∈∂na
  • c∈∂∗i

[log (σc) − log (σi)]

  • (44)

= TrV\∂a   exp

  • log(σa) +

N−1

  • n=0
  • i∈∂na
  • c∈∂∗i

[log (σc) − log (σi)]

  • i∈∂Na, c∈∂∗i
  • σc ⊙ σ−1

i

  (45)

≈ TrV\∂a   exp

  • log(σa) +

N−1

  • n=0
  • i∈∂na
  • c∈∂∗i

[log (σc) − log (σi)]

  • i∈∂Na, c∈∂∗i
  • Tr∂c\i(σc) ⊙ σ−1

i

  (46) = TrV\∂a

  • exp
  • log(σa) +

N−1

  • n=0
  • i∈∂na
  • c∈∂∗i

[log (σc) − log (σi)]

  • (47)

≈ · · ·

≈ TrV\∂a {exp [log(σa)]} = σa (48)

slide-7
SLIDE 7

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 7

where ∂na denotes the set of variables reachable from a after walking through n factors, and ∂∗i denotes the neighbor set of i excluding the factor through which a reaches i. The expansion (44) is due to the tree structure of the QFG, and (46) follows directly from Corollary 8. Here, the number N in (44) is the depth/height of the tree rooting from the factor node a. The justification for (41) is similar, and is omitted. Notice that if we consider ˜ σ exp

  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

  • ,

(49) we may not have Tr (˜ σ) = 1; even though a similar results holds on acyclic CFGs, namely

  • x

  

  • a∈F

ba(x∂a)

  • i∈∂a

bi(xi)

  • i∈V

bi(xi)    = 1, (50) where ba, bi are marginal distributions on x∂a and xi, respectively, which satisfy the local marginal constrains, i.e.,

  • x∂a\{i}

ba (x∂a) = bi (xi) ∀i ∈ Xi, ∀(i, a) ∈ E. (51) However, by (48), we can still conclude Tr (˜ σ) = Tr∂a

  • TrV\∂a
  • exp
  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

≈ Tr∂a {σa} = 1. (52) In this case, we often regard ˜ σ as an approximation of the corresponding global density operator.

  • III. VARIATIONAL APPROACHES

In the case of CFGs, the negative log partition sum can be written as the minimum of the so-called Gibbs free energy function [8]. Although this reformulation does in general not yield a tractable optimization problem, it suggests the introduction

  • f other free energy functions (like the Bethe free energy function) which approximate the Gibbs free energy function. The

Bethe free energy function is particularly interesting because stationary points of the Bethe free energy function correspond to fixed points of the SPA [8]. The minimum of the Bethe free energy function can be used as an approximation of the negative log partition function. In this section, we present a QFG analog of the Bethe free energy function and use it then to derive the quantum SPA (QSPA), called quantum belief propagation in [11]. We start by defining a quantum analog of the Gibbs free energy function. Definition 11. Given a QFG G with variable set V, factor operators {ρa}a∈F, and global Hilbert space H =

i∈V Hi, we

define the quantum Helmholtz free energy and the quantum Gibbs free energy function w.r.t. the density operator σ ∈ L+

1 (H)

to be, respectively,2 FH − log(Z), (53) FGibbs(σ) −

  • a∈F

σ, log ρa − S(σ) (54) = −

  • a∈F
  • TrV\∂a(σ), log ρa
  • − S (σ) .

(55) Here, we recall the definition of S(·) and the operator inner product ·, · from the end of Section I.

  • Theorem 12. We have the following relationship between the quantum Gibbs free energy function and the quantum Helmholtz

free energy, namely, FGibbs (σ) = FH + S(σ ˜ ρ), (56) where σ and ˜ ρ ρ/Z = Z−1 · exp

  • a∈F log ρa
  • are density operators.

2If ρA and ρB in ρA, ρB are over different Hilbert spaces, then both ρA and ρB are implicitly embedded in the smallest Hilbert space that contains

both Hilbert spaces.

slide-8
SLIDE 8

8 SECOND YEAR REPORT

  • Proof. The proof is straightforward. Namely, we have

FGibbs (σ) − FH = −

  • a∈F

σ, log ρa + σ, log σ + log(Z) = σ, log σ −

  • σ,
  • a∈F

log ρa

  • − σ, − log(Z) · I

= σ, log σ − σ, log ρ − log(Z) · I = σ, log σ − σ, log (ρ/Z) = σ, log σ − σ, log ˜ ρ = S(σ ˜ ρ). It is a well-known result [20] that for density operators ρA, ρB ∈ L+

1 (H), the quantum relative entropy satisfies

S(ρA ρB) 0, (57) where equality holds if and only if ρA = ρB. In this case, the optimization problem min FGibbs (σ) s.t. σ ∈ L+

1 (H)

(58) has a unique minimizer σ∗ = ˜ ρ, and the minimum value turns out to be FGibbs (σ∗) = FH. Thus, (58) can be viewed as a re-formulation of the partition sum as defined in (18). However, such a reformulation does in general not yield a tractable minimization problem because of the number of involved dimensions. Therefore, we introduce the following quantum Bethe free energy function as an approximation to the Gibbs free energy function. Definition 13. We define the quantum Bethe free energy function of a QFG to be FBethe

  • (σa)a∈F, (σi)i∈V)
  • a∈F

σa, log ρa −

  • a∈F

S(σa) +

  • i∈V

(di − 1) · S(σi), (59) where (σa)a∈F and (σi)i∈V are density operators.

  • Indeed, the Bethe free energy function can be viewed as an approximation of the Gibbs free energy function as shown in

next theorem. Theorem 14. Consider a QFG with no cycles. For some global density operator σ, let σa TrV\∂a(σ) for all a ∈ F, and σi = TrV\i(σ) for all i ∈ V. Then, using the approximation notation as in (36), we have FGibbs (σ)

≈ FBethe

  • (σa)a∈F, (σi)i∈V
  • .

(60) Or, to be precise, if σ is t-close to the identity matrix I in a linear fashion, i.e., σ = I + t · χ for some real number t close to 0 and some Hermitian operator χ, then FGibbs (σ) = FBethe

  • (σa)a∈F, (σi)i∈V
  • + O(t3).

(61)

  • Proof. Consider the definition of FGibbs. Firstly, we have
  • a∈F

σa, log ρa =

  • a∈F
  • TrV\∂a σ, log ρa
  • =
  • a∈F

σ, log ρa . (62) Now, let ˜ σ exp

  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

  • .

(63)

slide-9
SLIDE 9

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 9

Rearranging the terms, we have

  • a∈F

S(σa) −

  • i∈V

(di − 1) · S(σi) = −

  • a∈F

Tr (σa log (σa)) +

  • i∈V

(di − 1) · Tr (σi log (σi)) (64) = − Tr

  • σ ·
  • a∈F

log(σa) −

  • i∈V

(di − 1) log(σi)

  • (65)

= − Tr (σ · log (˜ σ)) (66) = − Tr

  • σ · log
  • ˜

σ Tr (˜ σ)

  • − log (Tr (˜

σ)) (67) = − Tr

  • σ · log
  • ˜

σ Tr (˜ σ)

  • + O(t3),

(68) where we applied (52) in deriving (68). On the other hand, Lemma 10 indicates Tri ˜ σ

≈ σi and Tr∂a ˜ σ

≈ σa for each i ∈ V and a ∈ F. In terms of their dual variables, this can be denoted as η(σ) = η(˜ σ) + O(t3) = η(˜ σ) + t3 · ∆η + O(t4) for some real vector ∆η (see Appendix C). In this case, denoting the function f(η) − Tr

  • σ · log
  • ρ(η)
  • , we can further conclude,

lim

t→0

− Tr

  • σ · log
  • ˜

σ Tr(˜ σ)

  • − (− Tr (σ · log (σ)))

t3 = lim

t→0

f (˜ η) − f (η) t3 = lim

t→0

f

  • η + t3 · ∆η + O(t4)
  • − f (η)

t3 (69) = ∇f T · ∆η < ∞. (70) Here, we assume f to be differentiable, which is justified in Appendix D. Thus, we can write − Tr

  • σ · log
  • ˜

σ Tr (˜ σ)

  • = − Tr (σ · log (σ)) + O(t3) = S(σ) + O(t3).

(71) Therefore, combining (62), (68) and (71), we have FBethe

  • (σa)a∈F, (σi)i∈V
  • = −
  • a∈F

σa, log ρa −

  • a∈F

S(σa) +

  • i∈V

(di − 1) · S(σi) = −

  • a∈F

σ, log ρa − S (σ) + O(t3) = FGibbs (σ) + O(t3). Theorem 14 allows us to treat the Bethe free energy FBethe as an approximation to the Gibbs free energy FGibbs. This motivates us to define the following optimization problem as an “approximated” version of the optimization problem (58). Definition 15. We call the optimization problem min FBethe

  • (σa)a∈F , (σi)i∈V
  • s.t.

σa ∈ L+

1 (Ha)

∀a ∈ F (72) σi ∈ L+

1 (Hi)

∀i ∈ V (73) σi = Tr∂a\i(σa) ∀ (i, a) ∈ E (74) the constrained quantum Bethe free energy minimization problem.

  • In the case of CFGs without cycles, the minimum of the Bethe free energy function equals the minimum of the Gibbs free

energy function, and with that the Bethe approximation of the partition sum is exact. However, in the case of QFGs without cycles, the Bethe approximation of the partition sum is in general only approximately equal to the partition sum. In the rest of this section, we derive the QSPA from the constrained quantum Bethe free energy minimization problem. Theorem 16. The positive definite density operators

  • (σ∗

a)a∈F , (σ∗ i )i∈V

  • represent an internal stationary point for the

constrained Bethe approximation if and only if for each a ∈ F and i ∈ V, we have σ∗

a ∝ exp

  • log(ρa) +
  • i∈∂a

log(mi→a)

  • ,

(75) σ∗

i ∝ exp

  • a∈∂i

log(ma→i)

  • ,

(76)

slide-10
SLIDE 10

10 SECOND YEAR REPORT

where {mi→a, ma→i}(i,a)∈E are some positive definite operators satisfying mi→a ∝ exp  

c∈∂i\a

log(mc→i)   , (77) ma→i ∝ Tr∂a\i   exp  log(ρa) +

  • j∈∂a

log(mj→a)      ⊙ m−1

i→a,

(78) for all (i, a) ∈ E

  • Proof. This theorem is a quantum analog to its classical version in [8], which provides part of the ideas in this proof.

Suppose

  • (σ∗

a)a∈F , (σ∗ i )i∈V

  • is an interior stationary point of the constrained quantum Bethe approximation problem. Since

we only consider interior points, the Lagrangian can be written as L FBethe +

  • a∈F

γa · (Tr (σa) − 1) +

  • i∈V

γi · (Tr (σi) − 1) +

  • (i,a)∈E

Tr

  • λa,i ·
  • σi − Tr∂a\i (σa)
  • ,

(79) with the dual variables {γa}a∈F, {γi}i∈V ∈ R, {λa,i}(i,a)∈E ∈ L (Hi). Thus, there must exist some {γ∗

a}a∈F, {γ∗ i }i∈V and

  • λ∗

a,i

  • (i,a)∈E such that L satisfies the following conditions

∂L ∂γa = 0, ∀a ∈ F; (80) ∂L ∂γi = 0, ∀i ∈ V; (81) d dtL

  • λ∗

a,i + tC

  • t=0

= 0, ∀C ∈ LH (Hi) , ∀ (i, a) ∈ E; (82) d dtL (σ∗

a + tC)

  • t=0

= 0, ∀C ∈ LH (Ha) , ∀a ∈ F; (83) d dtL (σ∗

i + tC)

  • t=0

= 0, ∀C ∈ LH (Hi) , ∀i ∈ V. (84) Notice that (80), (81), and (82) are equivalent to (72), (73), and (74). Also, by the Spectral Theorem and first-order perturbation theory [21], Equations (83) and (84) can be expanded as − Tr (C · log ρa) + Tr (C · (I + log σ∗

a)) + Tr (C · γ∗ aI) −

  • i∈∂a

Tr

  • λ∗

i,a · Tr∂a\i C

  • = 0,

(85) (1 − di) · Tr (C · (I + log σ∗

i )) + Tr (C · γ∗ i I) +

  • a∈∂i

Tr

  • C · λ∗

a,i

  • = 0.

(86) Solving the above equations for (σ∗

a)a∈F and (σ∗ i )i∈V, respectively, we have

σ∗

a = exp

  • log ρa +
  • i∈∂a

λ∗

a,i − (1 + γ∗ a) I

  • ∀a ∈ F,

(87) σ∗

i = exp

  • 1

di − 1 ·

  • (1 + γ∗

i ) I +

  • a∈∂i

λ∗

a,i

  • ∀i ∈ V.

(88) Now, define {mi→a}(i,a)∈E and {ma→i}(i,a)∈E inexplicitly by satisfying λ∗

a,i = log mi→a

λ∗

a,i =

  • c∈∂i\a

log mc→i ∀ (i, a) ∈ E. (89) In this case, (75) and (76) are direct results of (87) and (88), respectively; and (78) follows from (74). Additionally the positive definite properties of {mi→a}(i,a)∈E and {ma→i}(i,a)∈E follow form the same properties of {σa}a∈F and {σi}i∈V. As the reverse part of the proof, suppose there exists some {mi→a}(i,a)∈E and {ma→i}(i,a)∈E satisfying (75), (76), (77), and (78). By choosing {γ∗

a}a∈F , {γ∗ i }i∈V, and

  • λ∗

a,i

  • (i,a)∈E satisfying (87), (88), and (89), respectively, one can easily check (85)

and (86) (or (83) and (84)). Given that (80), (81), and (82) are equivalent to (72), (73), and (74), we have verified that the chosen

  • (σ∗

a)a∈F , (σ∗ i )i∈V

  • is a stationary point.
slide-11
SLIDE 11

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 11

ρ1 ρ2 ρ3 ρ4 ρ5 ρ6

  • Fig. 5. QFG for Section IV.

Notice that, by Theorem 7, Eq. (78) can be approximated by ma→i

∝ Tr∂a\i   exp  log(ρa) +

  • j∈∂a\i

log(mj→a)      . (90) Thus, the QSPA is defined as follows. Definition 17. The Quantum Sum-Product Algorithm (QSPA) [11] is an iterative method defined by the following message update rules m(t+1)

i→a ∝ exp

 

c∈∂i\a

log(m(t)

c→i)

  , (91) m(t+1)

a→i ∝ Tr∂a\i

  exp  log(ρa) +

  • j∈∂a\i

log(m(t)

j→a)

     . (92) Here, we assume the initial messages

  • m(0)

i→a, m(0) a→i

  • (i,a)∈E to be the identity matrices on the corresponding Hilbert spaces.
  • It may seem that Theorem 16 provides a good lead in solving the constrained Bethe approximation problem, but the situation

is more complicated. On the one hand, solving Equations (77) and (78) for messages {mi→a, ma→i}(i,a)∈E is intrinsically of the same complexity as the original problem. By the time of this writing, we know of no explicit solution/practical algorithm in solving (77) and (78) with exact, which is also the case for CFGs. Though the QSPA given by Definition 17 provides an iterative method to estimate {mi→a, ma→i}(i,a)∈E, there is no guarantee of convergence of such algorithm. On the other hand, even if we can find such messages, the obtained density operators

  • (σ∗

a)a∈F , (σ∗ i )i∈V

  • are merely an approximate stationary

point of the constrained Bethe approximation problem, which is not necessarily a minimizer or an approximate minimizer of the problem. Despite such concerns, as illustrated in next section, the QSPA still shows rather promising performance in a number of the numerical applications.

  • IV. NUMERICAL EXAMPLE

In this section, we apply the QSPA on the QFG in Fig. 5, and consider the relative error between the QSPA-based estimate and the true partition sum, i.e., η ZQSPA − Z Z , (93) where ZQSPA is calculated via the Bethe free energy function (59) at the estimated density operators

  • (σ∗

a)a∈F, (σ∗ i )i∈V

  • given

by (75) and (76). Here, for each test, the factors {ρa}a=1,...,6 are generated randomly in a similar fashion as described in the example depicted in Fig. 4. Fig. 6 plots the statistics of η based on 104 simulated cases for different eigenvalue distributions. Note that the y-axis in Fig. 6 is scaled to match the definition of a density function (i.e., the area under each curve is 1).

  • V. CONCLUSIONS AND OUTLOOK

In this report, we have considered the generalization of the closing-the-box notion and the variational approach under a quantum setup known as quantum factor graphs (QFGs). In particular, we justified the closing-the-box operations on QFGs as an approximate method to calculate the quantum partition sum. We also studied the relationship between the Bethe free energy minimization problem and the quantum sum-product algorithm (QSPA). It turns out that the fixed-points of the QSPA are the interior stationary point of the Bethe free energy minimization problem.

slide-12
SLIDE 12

12 SECOND YEAR REPORT

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 Relative Error η Frequency Density

|N (µ, σ)| distributed Eigenvalues (µ = 1, σ = 1) Uniformly distributed Eigenvalues (a = 0, b = 1)

  • Fig. 6. Relative Error of the QSPA partition sum estimate.

Currently, we are also considering the generalization of the method of loop calculus to the QFGs. As an example, Theorem 7 is also useful in deriving a quantum version of the Holant Theorem [22], [23], namely: Theorem 18 (Holant Theorem for QFGs). Consider a QFG with variable node set V, function node set F, edge set E and local operators ρa ∈ L++

i∈∂a Hi

  • for each a ∈ F, τi ∈ L++ (Hi) for each i ∈ V. Suppose for each (i, a) ∈ E, there

exist some Hilbert spaces Ji,a, Ki(= Hi), and some operators φi,a ∈ L (Hi ⊗ Ji,a) and ˆ φi,a ∈ L (Ji,a ⊗ Ki) satisfying TrJi,a

  • φi,a ⊙ ˆ

φi,a

  • = ιi,

(94) where ιi is the identification mapping from Hi to Ki. Furthermore, define operators ˆ ρa = TrK∂a

  • ρa ⊙
  • i∈∂a

ˆ φi,a

  • ∈ L
  • i∈∂a

Ji,a

  • ∀a ∈ F,

(95) ˆ τi = TrX∂a

  • τi ⊙
  • a∈∂i

φi,a

  • ∈ L
  • a∈∂i

Ji,a

  • ∀i ∈ V,

(96) where we treat ρa as an operator on

i∈∂a Ki in (95). In this case, we have,

Z Tr

  • a∈F

ρa ⊙

  • i∈V

τi

≈ Tr

  • a∈F

ˆ ρa ⊙

  • i∈V

ˆ τi

  • .

(97) Proof of above theorem is beyond the scope of this report, and we do not list it here. For the future work, it will be interesting to see more techniques from classical factor graphs to be generalized to quantum factor graphs in order to better understand the obtained approximations. Also, we are happy to see further studies on practical applications based on our work. REFERENCES

[1] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal Process. Mag., vol. 21, no. 1, pp. 28–41, 2004. [2] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, 2001. [3] M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends R in Machine Learning,

  • vol. 1, no. 1-2, pp. 1–305, 2008.

[4] M. Molkaraie and H.-A. Loeliger, “Partition function of the Ising model via factor graph duality,” in Proc. IEEE Int. Symp. Inf. Theory. IEEE, 2013,

  • pp. 2304–2308.

[5] R. G. Gallager, “Low-density parity-check codes,” IRE* Trans. Inf. Theory, vol. 8, no. 1, pp. 21–28, 1962. [6] G. D. Forney Jr, “Codes on graphs: Normal realizations,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 520–548, 2001. [7] A. Al-Bashabsheh, Y. Mao, and P. O. Vontobel, “Normal factor graphs: A diagrammatic approach to linear algebra,” in Proc. IEEE Int. Symp. Inf. Theory. IEEE, 2011, pp. 2178–2182. [8] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2282–2312, 2005. [9] R. Mori, “Loop calculus for nonbinary alphabets using concepts from information geometry,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1887–1904, 2015.

slide-13
SLIDE 13

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 13

[10] V. Y. Chernyak and M. Chertkov, “Loop calculus and belief propagation for q-ary alphabet: Loop tower,” in Proc. IEEE Int. Symp. Inf. Theory. IEEE, 2007, pp. 316–320. [11] M. Leifer and D. Poulin, “Quantum graphical models and belief propagation,” Annals of Physics, vol. 323, no. 8, pp. 1899–1946, 2008. [12] H.-A. Loeliger and P. O. Vontobel, “Factor graphs for quantum probabilities,” 2015, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1508.00689 [13] ——, “A factor-graph representation of probabilities in quantum mechanics,” in Proc. IEEE Int. Symp. Inf. Theory. IEEE, 2012, pp. 656–660. [14] R. Mori, “Holographic transformation, belief propagation and loop calculus for generalized probabilistic theories,” in Proc. IEEE Int. Symp. Inf. Theory. IEEE, 2015, pp. 1099–1103. [Online]. Available: http://arxiv.org/abs/1501.04183v2 [15] M. X. Cao and P. O. Vontobel, “Quantum factor graphs: Closing-the-box operation and variational approach,” to appear in Proc. Int. Symp. Inf. Theory Appl. IEICE, 2016. [16] M. K. Warmuth, “A Bayes rule for density matrices,” in Advances in Neural Information Processing Systems, 2005, pp. 1457–1464. [17] R. Bhatia, Matrix Analysis. Springer Science & Business Media, 2013, vol. 169. [18] B. Simon, Functional Integration and Quantum Physics, ser. AMS Chelsea Publishing Series. AMS Chelsea Pub., American Mathematical Society, 2005. [19] J.-C. Bourin and Y. Seo, “Reverse inequality to Golden–Thompson type inequalities: Comparison of eA+B and eAeB,” Linear Algebra and its Applications, vol. 426, no. 2-3, pp. 312–316, 2007. [20] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, 10th Anniversary ed. UK: Cambridge University Press, 2011. [21] T. Kato, Perturbation Theory for Linear Operators, 1st ed. New York: Springer Science+Business Media, 1966. [22] A. Al-Bashabsheh and Y. Mao, “Normal factor graphs and holographic transformations,” IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 752–763, 2011. [23] L. G. Valiant, “Holographic algorithms,” SIAM Journal on Computing, vol. 37, no. 5, pp. 1565–1594, 2008. [24] E. Carlen, “Trace inequalities and quantum entropy: an introductory course,” Entropy and the quantum: Arizona School of Analysis with Applications,

  • vol. 529, pp. 73–140, 2010.

APPENDIX A PROOF OF THEOREM 7 AND COROLLARY 8 Proof of Theorem 7. We justify the theorem by the Taylor series expansion. Note that, for a Hermitian matrix V and a positive definite matrix W with spectral radius less than 1, we have exp(V ) = I + V + 1 2V 2 + 1 3!V 3 + · · · + 1 n!V n + · · · (98) log(I + W) = B − 1 2W 2 + 1 3W 3 − · · · + (−1)n−1 n W n + · · · (99) To simplify the proof, we introduce the normalized (partial) trace functions: Tr(A) Tr(A) Tr(I) , Tri(A) Tri(A) Tri(I) , Tr∂a(A) Tr∂a(A) Tr∂a(I) . (100) Given such notations, it is straightforward to see that (31) can be rewritten as Tr

  • (I + tX) ⊙ (I + tY )
  • = Tr1
  • (I + tX) ⊙ Tr2(I + tY )
  • + O(t4).

(101) Now consider the Taylor series expansion of the both sides of (101). Let ˜ X X ⊗ I ∈ LH (H1 ⊗ H2). We have, Tr

  • (I + tX) ⊙ (I + tY )
  • = Tr(I) + t · Tr( ˜

X + Y ) + t2 · Tr( ˜ XY + Y ˜ X) 2 + t3 · Tr

  • 2 ˜

XY ˜ X + 2Y ˜ XY − ˜ X2Y − Y 2 ˜ X − ˜ XY 2 − Y ˜ X2 12 + t4 · Tr ˜ XY ˜ XY + Y ˜ XY ˜ X + ˜ X3Y + Y 3 ˜ X + ˜ XY 3 + Y ˜ X3 − ˜ X2Y ˜ X − ˜ XY 2 ˜ X − ˜ XY ˜ X2 − Y 2 ˜ XY − Y ˜ X2Y − Y ˜ XY 2 24

  • + O(t5)

=1 + t · Tr( ˜ X + Y ) + t2 · Tr( ˜ XY + Y ˜ X) 2 + t4 · Tr

  • ˜

XY ˜ XY − ˜ X2Y 2 12 + O(t5); (102)

slide-14
SLIDE 14

14 SECOND YEAR REPORT

Tr1

  • (I + tX) ⊙ Tr2(I + tY )
  • =Tr1
  • (I + tX) ⊙ (I + tTr2(Y ))
  • = Tr1(I) + t · Tr1(X + Tr2(Y )) + t2 · Tr1
  • XTr2(Y ) + Tr2(Y )X
  • 2

+ t3 · Tr1 2XTr2(Y )X + 2Tr2(Y )XTr2(Y ) − X2Tr2(Y ) − Tr2(Y )2X − XTr2(Y )2 − Tr2(Y )X2 12

  • + t4 · Tr1

XTr2(Y )XTr2(Y ) + Tr2(Y )XTr2(Y )X + X3Tr2(Y ) + Tr2(Y )3X + XTr2(Y )3 + Tr2(Y )X3 24 −X2Tr2(Y )X + XTr2(Y )2X + XTr2(Y )X2 + Tr2(Y )2XTr2(Y ) + Tr2(Y )X2Tr2(Y ) + Tr2(Y )XTr2(Y )2 24

  • + O(t5)

=1 + t·Tr1(X + Tr2(Y )) + t2· Tr1

  • XTr2(Y ) + Tr2(Y )X
  • 2

+ t4· Tr1

  • XTr2(Y )XTr2(Y ) − X2Tr2(Y )2

12 + O(t5). (103) Notice that Tr2( ˜ X · Z) = X · Tr2(Z) for any operator Z ∈ LH (H1 ⊗ H2). Thus, one can easily check that (102) and (103) agree up to t3. (Note that the coefficients of t3 are both 0 for (102) and (103).) For matrices V, W, denote [V, W] V W −WV . Using such notation, we have Tr

  • (I + tX) ⊙ (I + tY )
  • = Tr1
  • (I + tX) ⊙ Tr2(I + tY )
  • +

Tr1

  • X · [Tr2(Y ), X] · Tr2(Y )
  • − Tr
  • ˜

X · [ ˜ X, Y ] · Y

  • 12

· t4 + O(t5), (104) which justifies (101), and therefore also justifies (31). Proof of Corollary 8. Applying the same method used to derive (102) and (103) in above proof, we obtain the following Taylor series expansions. Tr2

  • (I + tX) ⊙ (I + tY )
  • = I + t · Tr2( ˜

X + Y ) + t2 · Tr2( ˜ XY + Y ˜ X) 2 + t3 · Tr2

  • 2 ˜

XY ˜ X + 2Y ˜ XY − ˜ X2Y − Y 2 ˜ X − ˜ XY 2 − Y ˜ X2 12 + O(t4), (105) (I + tX) ⊙ Tr2(I + tY ) = I + t · (X + Tr2(Y )) + t2 · XTr2(Y ) + Tr2(Y )X 2 + t3 · 2XTr2(Y )X + 2Tr2(Y )XTr2(Y ) − X2Tr2(Y ) − Tr2(Y )2X − XTr2(Y )2 − Tr2(Y )X2 12 + O(t4). (106) Combining (105) and (106), we have Tr2

  • (I + tX) ⊙ (I + tY )
  • =(I + tX) ⊙ Tr2(I + tY )

+ t3 · 2

  • Tr2(Y ˜

XY ) − Tr2(Y )XTr2(Y )

  • +
  • Tr2(Y )2 − Tr(Y 2)
  • X + X
  • Tr2(Y )2 − Tr(Y 2)
  • 12

+ O(t4), (107) which proves Corollary 8. APPENDIX B APPROXIMATION OF Tr

  • etX ⊙ etY

In this appendix, we consider the relationship between Tr

  • etX ⊙ etY

and Tr1

  • etX ⊙ Tr2(etY )
  • . The results we present

here are similar to those of Theorem 7 and Corollary 8. Theorem 19. Consider finite-dimensional Hilbert spaces H1 and H2. Given X ∈ LH (H1), and Y ∈ LH (H1 ⊗ H2), it holds that Tr

  • etX ⊙ etY

= Tr1

  • etX ⊙ Tr2(etY )
  • + O(t4),

(108) where the real number t is in a neighborhood of 0.

  • Proof. We also apply the Taylor series expansions to prove this theorem. Similar to the proof of Theorem 7, we can rewrite (108)

as Tr

  • etX ⊙ etY

= Tr1

  • etX ⊙ Tr2(etY )
  • + O(t4).

(109)

slide-15
SLIDE 15

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 15

Again, consider the Taylor series expansion of the both sides of (109); and let ˜ X X ⊗ I ∈ LH (H1 ⊗ H2). We have, Tr

  • etX ⊙ etY

=Tr

  • I + t · ( ˜

X + Y ) + 1 2!t2 · ( ˜ X + Y )2 + 1 3!t3 · ( ˜ X + Y )3 + · · ·

  • =1 + t · Tr( ˜

X + Y ) + 1 2!t2 · Tr( ˜ X + Y )2 + 1 3!t3 · Tr( ˜ X + Y )3 + 1 4!t4 · Tr( ˜ X + Y )4 + O(t5); (110) Tr1

  • etX ⊙ Tr2(etY )
  • =Tr1
  • exp
  • t · X + log
  • 1 + t · Tr2(Y ) + t2

2! · Tr2(Y 2) + t3 3! · Tr2(Y 3) + t4 4! · Tr2(Y 4) + · · ·

  • = Tr1
  • exp
  • t · X + t · Tr2(Y ) + t2

2 · Tr2(Y 2) + t3 3! · Tr2(Y 3) + t4 4! · Tr2(Y 4) − 1 2

  • t · Tr2(Y ) + t2

2 · Tr2(Y 2) + t3 3! · Tr2(Y 3) + t4 4! · Tr2(Y 4) 2 + 1 3

  • t · Tr2(Y ) + t2

2 · Tr2(Y 2) + t3 3! · Tr2(Y 3) + t4 4! · Tr2(Y 4) 3 −1 4

  • t · Tr2(Y ) + t2

2 · Tr2(Y 2) + t3 3! · Tr2(Y 3) + t4 4! · Tr2(Y 4) 4 + · · ·

  • = Tr1
  • exp
  • t ·
  • X + Tr2(Y )
  • + t2

2 ·

  • Tr2(Y 2) − Tr

2 2(Y )

  • + t3

12

  • 2 · Tr2(Y 3) − 3 · Tr2(Y ) · Tr2(Y 2) − 3 · Tr2(Y 2) · Tr2(Y ) + 4 · Tr

3 2(Y )

  • + t4

24 · Tr1

  • Tr2(Y 4) − 3 · Tr

2 2(Y 2) − 2 · Tr2(Y ) · Tr2(Y 3) − 2 · Tr2(Y 3) · Tr2(Y )

+4 · Tr

2 2(Y ) · Tr2(Y 2) + 4 · Tr2(Y ) · Tr2(Y 2) · Tr2(Y ) + 4 · Tr2(Y 2) · Tr 2 2(Y ) − 6 · Tr 4 2(Y )

  • + · · ·
  • = 1 + t · Tr1
  • X + Tr2(Y )
  • + t2

2 · Tr1

  • X2 + XTr2(Y ) + Tr2(Y )X + Tr2(Y 2)
  • + t3

6 · Tr1

  • X3 + 3 · X2 · Tr2(Y ) + 3 · X · Tr2(Y 2) + Tr2(Y 3)
  • + O(t4)

(111) Notice that (110) and (111) agree up to t3. Thus, (109) is justified and so is (108). We can also prove the following corollary in a similar way. Corollary 20. Given the same setup as in Theorem 19, we have Tr2

  • etX ⊙ etY

= etX ⊙ Tr2(etY ) + O(t3). (112)

  • Proof. Based on the proof of Theorem 19, we have

Tr2

  • etX ⊙ etY

= 1 + t · Tr2( ˜ X + Y ) + 1 2!t2 · Tr2( ˜ X + Y )2 + 1 3!t3 · Tr2( ˜ X + Y )3 + O(t4), (113) etX ⊙ Tr2(etY ) = I + t ·

  • X + Tr2(Y )
  • + t2

2 ·

  • X2 + XTr2(Y ) + Tr2(Y )X + Tr2(Y 2)
  • + O(t3),

(114) which justified the corollary since the above two expressions agree up to t2. APPENDIX C QUANTUM EXPONENTIAL FAMILY Definition 21 (Quantum Exponential Family [14]). Similar to classical exponential families, a quantum exponential family (of degree d) is a parametric family of quantum operators in form of ρθ exp d

  • k=1

θk · T k − Ψ(θ)

  • (115)
slide-16
SLIDE 16

16 SECOND YEAR REPORT

for natural parameter θ in some open subset Θ ∈ Rd, where T k ∈ LH (H∂k) = LH

i∈∂k Hi

  • are some given Hermitian
  • perators (∂k ⊆ {1, · · · , N}), and conventions as in (16) are applied in the summation in (115). Moreover,

Ψ(θ) log

  • Tr
  • exp

d

  • k=1

θk · T k

  • .

(116) As a result, ρθ is a density operator on the global Hilbert space H∪∂k.

  • Note that if {T k}d

k=1 is linearly independent then the mapping θ → ρθ is injective. In this report, we always assume

{T k}d

k=1 to be linearly independent. In this case, the (strict) convexity of the function Ψ follows naturally from the (strict)

convexity of the exponential function.3 Example 22. As an example, consider the following quantum exponential family σθ = exp

  • a∈F
  • k

θ(a)

k

· T (a)

k

− Ψ(θ)

  • ,

(117) where θ ∈ Rd,

  • T (a)

k

  • k form a basis of LH (H∂a). Comparing (117) with (17), it is straightforward to see that the

parametrization of σθ corresponds to all the density operators that can be decomposed as σ ∝

  • a∈F

σa = exp

  • a∈F

log(σa)

  • .

(118)

  • Definition 23. The dual parameter η = (ηl)d

l=1 (w.r.t. θ) of a quantum exponential family is defined as

ηl = ∂ ∂θl Ψ(θ) = ∂ ∂θl log

  • Tr
  • exp

d

  • k=1

θk · T k

  • = Tr
  • exp

d

  • k=1

θkT k −1 ∂ ∂θl Tr

  • exp

d

  • k=1

θkT l

  • = Tr
  • exp

d

  • k=1

θkT k −1 Tr

  • exp

d

  • k=1

θkT k

  • · T l
  • (119)

= Tr (ρθ · T l) = ρθ, T l ∀l = 1, 2, · · · , d, (120) where (119) was obtained by applying the first-order perturbation theory [21].

  • Due to the strict convexity of Ψ, the mapping η(θ) : θ → (ρθ, T k)k is always injective. On the other hand by considering

the conjugate function of Ψ given as (which is also strictly convex) Φ(η) sup

θ

d

  • k=1

θkηk − Ψ(θ)

  • ,

(121) the inverse mapping can be written as θ(η) : η → ∂ ∂ηk Φ(η)

  • k

. (122) In other words, the correspondence between the natural parameters and the dual parameters is bijective or one-to-one. Example 24. We continue Example 22. In this case, the dual variables can be written as η(a)

k

= Tr

  • σθ · T (a)

k

  • = Tr∂a
  • TrV\∂a
  • σθ · T (a)

k

  • = Tr
  • σa · T (a)

k

  • ,

(123) where σa = TrV\∂a (σθ). Since

  • T (a)

k

  • k is a basis of LH (H∂a), (123) establishes an injection from σa to η(a). In other

words, marginal densities fix the global density if such a global density exists.

  • 3This can be easily derived if one is familiar with the results on trace functions as in [24]
slide-17
SLIDE 17

CAO: QUANTUM FACTOR GRAPHS: CLOSING-THE-BOX OPERATION AND VARIATIONAL APPROACHES 17

APPENDIX D JUSTIFICATION OF THE DIFFERENTIABILITY OF f(η) − Tr

  • σ · log
  • ρ(η)
  • Firstly, we verify the bijective mapping η : θ → (ρθ, T i)i to be differentiable. Notice that,

∂ηi ∂θj = ∂ Tr (ρθ · T i) ∂θj = d dt Tr

  • exp

d

  • k=1

θk · T k + t · T i − Ψ(θ + t · ei)

  • · T i
  • t=0

(124) = d dt Tr

  • exp

d

  • k=1

θk · T k + t · T i

  • · T i
  • exp (Ψ(θ + t · ei))
  • t=0

. (125) Since function exp ◦Ψ is obviously differentiable, it suffice to justify the differentiability of t → Tr

  • exp(d

k=1 θk · T k + t · T i) · T i

  • at t = 0. However, by the Taylor series expansion, we can write

Tr

  • exp(

d

  • k=1

θk · T k + t · T i) · T i

  • − Tr
  • exp(

d

  • k=1

θk · T k) · T i

  • (126)

= Tr         

  • n=0

d

  • k=1

θk · T k + t · T i n n! · T i          − Tr         

  • n=0

d

  • k=1

θk · T k n n! · T i          (127) = Tr         

  • n=0

d

  • k=1

θk · T k + t · T i n − d

  • k=1

θk · T k n n! · T i          (128) = Tr           

  • n=1

t · d

  • k=1

θk · T k n−1 · T i + · · · + T i · d

  • k=1

θk · T k n−1 + O(t2) n! · T i            (129) =t ·

  • n=1

n−1

  • l=0

Tr d

  • k=1

θk · T k n−1−l · T i · d

  • k=1

θk · T k l · T i

  • n!

+ O(t2). (130) Notice that

  • n=1
  • n−1
  • l=0

Tr d

  • k=1

θk · T k n−1−l · T i · d

  • k=1

θk · T k l · T i

  • n!
  • n=1

dim ·

  • Tr(

d

  • k=1

θk · T k)

  • n−1

· Tr(T )2 (n − 1)! (131) = dim · exp

  • Tr(

d

  • k=1

θk · T k)

  • · Tr(T )2 .

(132) where, for any matrix A, A stands for the spectral radius of A, and dim is the dimension of the global Hilbert space of the

  • problem. Here, we apply the following inequality in deriving (131):

|Tr (A1 · A2 · · · Am)| dim H · A1 · A2 · · · Am ∀A1, A2, · · · , Am ∈ L (H) , ∀m ∈ Z+. (133) Equation (132) implies that the series

  • n=1

n−1

  • l=0

Tr d

  • k=1

θk · T k n−1−l · T i · d

  • k=1

θk · T k l · T i

  • n!
slide-18
SLIDE 18

18 SECOND YEAR REPORT

is absolute convergent, and thus is convergent. Therefore, we know that the following limit exists: lim

t→0

Tr

  • exp(

d

  • k=1

θk · T k + t · T i) · T i

  • − Tr
  • exp(

d

  • k=1

θk · T k) · T i

  • t

, (134) which justifies the the differentiability of t → Tr

  • exp(d

k=1 θk · T k + t · T i) · T i

  • at t = 0, and also justifies the differen-

tiability of η : θ → (ρθ, T i)i. Secondly, we consider the function ˆ f(θ) − Tr

  • σ · log
  • ρ(θ)
  • , where ρ(θ) is the density operator defined by (115). By

rewriting ˆ f as (note that Tr(σ) = 1) ˆ f(θ + h · ej) = Tr

  • σ ·

d

  • k=1

θk · T k + h · T i − Ψ(θ + h · ei) · I

  • (135)

= Tr

  • σ ·

d

  • k=1

θk · T k

  • + h · Tr (σ · T i) − Ψ(θ + h · ei),

(136) it is then straightforward to verify the differentiability of ˆ f. Now, consider that f(η) = ˆ f(θ(η)). Since the mapping θ → η is differentiable, then so the the inverse mapping η → θ. Therefore, the differentiability of f follows right away from the differentiability of ˆ f.