Kantorovich optimal transport problem and Shannons optimal channel - - PowerPoint PPT Presentation

kantorovich optimal transport problem and shannon s
SMART_READER_LITE
LIVE PREVIEW

Kantorovich optimal transport problem and Shannons optimal channel - - PowerPoint PPT Presentation

Kantorovich optimal transport problem and Shannons optimal channel problem Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 13 June 2016 In honor of Shun-ichi Amari Roman Belavkin (Middlesex


slide-1
SLIDE 1

Kantorovich optimal transport problem and Shannon’s

  • ptimal channel problem

Roman V. Belavkin

School of Science and Technology Middlesex University, London NW4 4BT, UK

13 June 2016

In honor of Shun-ichi Amari

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 1 / 30

slide-2
SLIDE 2

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 2 / 30

slide-3
SLIDE 3

Optimal transportation problems (OTPs)

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 3 / 30

slide-4
SLIDE 4

Optimal transportation problems (OTPs)

Kantorovich’s OTP

Optimal Transportation Problem (Kantorovich, 1939, 1942; Vasershtein, 1969; Dobrushin, 1970)

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } where c ∶ X × Y → ℝ is a cost function (e.g. a metric). Γ(q, p) ∶= {w ∈ (X ⊗ Y) ∶ 휋Xw = q, 휋Yw = p}

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 4 / 30

slide-5
SLIDE 5

Optimal transportation problems (OTPs)

Kantorovich’s OTP

Optimal Transportation Problem (Kantorovich, 1939, 1942; Vasershtein, 1969; Dobrushin, 1970)

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } where c ∶ X × Y → ℝ is a cost function (e.g. a metric). Γ(q, p) ∶= {w ∈ (X ⊗ Y) ∶ 휋Xw = q, 휋Yw = p} q p (X) ↔ (X ⊗ Y)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 4 / 30

slide-6
SLIDE 6

Optimal transportation problems (OTPs)

Kantorovich’s OTP

Optimal Transportation Problem (Kantorovich, 1939, 1942; Vasershtein, 1969; Dobrushin, 1970)

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } where c ∶ X × Y → ℝ is a cost function (e.g. a metric). Γ(q, p) ∶= {w ∈ (X ⊗ Y) ∶ 휋Xw = q, 휋Yw = p} q p (X) ↔ (X ⊗ Y) 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 4 / 30

slide-7
SLIDE 7

Optimal transportation problems (OTPs)

Optimal Transport Plan

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 5 / 30

slide-8
SLIDE 8

Optimal transportation problems (OTPs)

Optimal Transport Plan

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p }

Optimal Transport Plan

Linear operator (Markov morphism) T ∶ (X) → (Y): q ↦ Tq = p = ∫X p(⋅ ∣ x) dq 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 5 / 30

slide-9
SLIDE 9

Optimal transportation problems (OTPs)

Optimal Transport Plan

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p }

Optimal Transport Plan

Linear operator (Markov morphism) T ∶ (X) → (Y): q ↦ Tq = p = ∫X p(⋅ ∣ x) dq p(⋅ ∣ x) — Markov transition kernel 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 5 / 30

slide-10
SLIDE 10

Optimal transportation problems (OTPs)

Optimal Transport Plan

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p }

Optimal Transport Plan

Linear operator (Markov morphism) T ∶ (X) → (Y): q ↦ Tq = p = ∫X p(⋅ ∣ x) dq p(⋅ ∣ x) — Markov transition kernel T is determined by w ∈ (X ⊗ Y): w = p(⋅ ∣ x) ⊗ q 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 5 / 30

slide-11
SLIDE 11

Optimal transportation problems (OTPs)

Monge OTP

Optimal Transportation Problem (Monge, 1781)

Kc(q, p) ∶= inf { ∫X c(x, f(x)) dq ∶ f ∶ p = q◦f −1 } where p = q◦f −1 is push-forward under measurable mapping f ∶ X → Y: p(E) = q◦f −1(E) = q{x ∶ f(x) ∈ E}

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 6 / 30

slide-12
SLIDE 12

Optimal transportation problems (OTPs)

Monge OTP

Optimal Transportation Problem (Monge, 1781)

Kc(q, p) ∶= inf { ∫X c(x, f(x)) dq ∶ f ∶ p = q◦f −1 } where p = q◦f −1 is push-forward under measurable mapping f ∶ X → Y: p(E) = q◦f −1(E) = q{x ∶ f(x) ∈ E}

Optimal Transport

p(⋅ ∣ x) has the form: 훿f(x)(E) = { 1 if f(x) ∈ E

  • therwise

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 6 / 30

slide-13
SLIDE 13

Optimal transportation problems (OTPs)

Monge OTP

Optimal Transportation Problem (Monge, 1781)

Kc(q, p) ∶= inf { ∫X c(x, f(x)) dq ∶ f ∶ p = q◦f −1 } where p = q◦f −1 is push-forward under measurable mapping f ∶ X → Y: p(E) = q◦f −1(E) = q{x ∶ f(x) ∈ E}

Optimal Transport

p(⋅ ∣ x) has the form: 훿f(x)(E) = { 1 if f(x) ∈ E

  • therwise

wf ∈ 휕(X ⊗ Y): wf (X, Y ⧵ f(X)) = 0 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 6 / 30

slide-14
SLIDE 14

Information and entropy

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 7 / 30

slide-15
SLIDE 15

Information and entropy

Shannon’s Information and Entropy

KL-divergence (Kullback & Leibler, 1951)

DKL(p, q) = ∫ [ln p − ln q] dp

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 8 / 30

slide-16
SLIDE 16

Information and entropy

Shannon’s Information and Entropy

KL-divergence (Kullback & Leibler, 1951)

DKL(p, q) = ∫ [ln p − ln q] dp 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 8 / 30

slide-17
SLIDE 17

Information and entropy

Shannon’s Information and Entropy

KL-divergence (Kullback & Leibler, 1951)

DKL(p, q) = ∫ [ln p − ln q] dp

Shannon’s information (Shannon, 1948)

For w ∈ Γ(q, p) ⊂ (X ⊗ Y): Iw{x, y} ∶= DKL(w, q ⊗ p) = H(q) − H(q(x ∣ y)) = H(p) − H(p(y ∣ x)) 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 8 / 30

slide-18
SLIDE 18

Information and entropy

Shannon’s Information and Entropy

KL-divergence (Kullback & Leibler, 1951)

DKL(p, q) = ∫ [ln p − ln q] dp

Shannon’s information (Shannon, 1948)

For w ∈ Γ(q, p) ⊂ (X ⊗ Y): Iw{x, y} ∶= DKL(w, q ⊗ p) = H(q) − H(q(x ∣ y)) = H(p) − H(p(y ∣ x))

Entropy H(p) = − ∫ ln p dp

H(p) ∶= sup

w∶휋Yw=p

Iw{x, y} = Iw{y, y} 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 8 / 30

slide-19
SLIDE 19

Information and entropy

w

D(w,q⊗p)

  • q ⊗ q D(p,q)
  • D(w,q⊗q)
  • q ⊗ p

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

slide-20
SLIDE 20

Information and entropy

Theorem (Shannon-Pythagorean)

w ∈ (X ⊗ Y), 휋Xw = q, 휋Yw = p DKL(w, q ⊗ q) = DKL(w, q ⊗ p) + DKL(p, q) (Belavkin, 2013a) w

D(w,q⊗p)

  • q ⊗ q D(p,q)
  • D(w,q⊗q)
  • q ⊗ p

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

slide-21
SLIDE 21

Information and entropy

Theorem (Shannon-Pythagorean)

w ∈ (X ⊗ Y), 휋Xw = q, 휋Yw = p DKL(w, q ⊗ q) = DKL(w, q ⊗ p) + DKL(p, q) (Belavkin, 2013a) w

D(w,q⊗p)

  • q ⊗ q D(p,q)
  • D(w,q⊗q)
  • q ⊗ p

Proof.

D(w, q⊗q) = D(w, q ⊗ p) ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟

Iw{x,y}

+ D(q ⊗ p, q ⊗ q) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟

D(p,q)

− ⟨ln q ⊗ p − ln q ⊗ q, q ⊗ p − w⟩ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ln q ⊗ p − ln q ⊗ q = 1X ⊗ (ln p − ln q)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

slide-22
SLIDE 22

Information and entropy

Theorem (Shannon-Pythagorean)

w ∈ (X ⊗ Y), 휋Xw = q, 휋Yw = p DKL(w, q ⊗ q) = DKL(w, q ⊗ p) + DKL(p, q) (Belavkin, 2013a) w

D(w,q⊗p)

  • q ⊗ q D(p,q)
  • D(w,q⊗q)
  • q ⊗ p

Proof.

D(w, q⊗q) = D(w, q ⊗ p) ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟

Iw{x,y}

+ D(q ⊗ p, q ⊗ q) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟

D(p,q)

− ⟨ln q ⊗ p − ln q ⊗ q, q ⊗ p − w⟩ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ln q ⊗ p − ln q ⊗ q = 1X ⊗ (ln p − ln q)

Cross-Information (Belavkin, 2013a)

DKL(w, q ⊗ q) = −⟨ln q, p⟩ ⏟⏞⏞⏟⏞⏞⏟

Cross-entropy

− [H(p) − DKL(w, q ⊗ p)] ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

H(p(y∣x))

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

slide-23
SLIDE 23

Optimal channel problem (OCP)

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 10 / 30

slide-24
SLIDE 24

Optimal channel problem (OCP)

Shannon’s OCP

Optimal Channel Problem (Shannon, 1948)

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

slide-25
SLIDE 25

Optimal channel problem (OCP)

Shannon’s OCP

Optimal Channel Problem (Shannon, 1948)

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 } 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

slide-26
SLIDE 26

Optimal channel problem (OCP)

Shannon’s OCP

Optimal Channel Problem (Shannon, 1948)

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Exponential family solutions

Optimal T ∶ (X) → (Y) is defined by w = e−훽c−ln Z q⊗p , 훽−1 = −dSc(q, 휆)∕d휆 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

slide-27
SLIDE 27

Optimal channel problem (OCP)

Shannon’s OCP

Optimal Channel Problem (Shannon, 1948)

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Exponential family solutions

Optimal T ∶ (X) → (Y) is defined by w = e−훽c−ln Z q⊗p , 훽−1 = −dSc(q, 휆)∕d휆 Observe that w ∉ 휕(X ⊗ Y), unless q ⊗ p ∈ 휕(X ⊗ Y) or 훽 → ∞. 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

slide-28
SLIDE 28

Optimal channel problem (OCP)

Shannon’s OCP

Optimal Channel Problem (Shannon, 1948)

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Exponential family solutions

Optimal T ∶ (X) → (Y) is defined by w = e−훽c−ln Z q⊗p , 훽−1 = −dSc(q, 휆)∕d휆 Observe that w ∉ 휕(X ⊗ Y), unless q ⊗ p ∈ 휕(X ⊗ Y) or 훽 → ∞. 훿21 훿22 훿12 훿11

Value of Information (Stratonovich, 1965)

V(휆) ∶= Sc(q, 0) − Sc(q, 휆) = sup{피w{u} ∶ Iw{x, y} ≤ 휆}

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

slide-29
SLIDE 29

Optimal channel problem (OCP)

Relation to Kantorovich OTP

Optimal Channel Problem

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

slide-30
SLIDE 30

Optimal channel problem (OCP)

Relation to Kantorovich OTP

Optimal Channel Problem

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p }

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

slide-31
SLIDE 31

Optimal channel problem (OCP)

Relation to Kantorovich OTP

Optimal Channel Problem

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } q and p have entropies H(q) and H(p) and 0 ≤ Iw{x, y} ≤ min[H(q), H(p)]

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

slide-32
SLIDE 32

Optimal channel problem (OCP)

Relation to Kantorovich OTP

Optimal Channel Problem

Sc(q, 휆) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, Iw{x, y} ≤ 휆 }

Optimal Transportation Problem

Kc(q, p) ∶= inf { ∫X×Y c(x, y) dw ∶ 휋Xw = q, 휋Yw = p } q and p have entropies H(q) and H(p) and 0 ≤ Iw{x, y} ≤ min[H(q), H(p)] Kc(q, p) has implicit constraint Iw{x, y} ≤ 휆 = min[H(q), H(p)] and Sc(q, 휆) ≤ Kc(q, p)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

slide-33
SLIDE 33

Optimal channel problem (OCP)

Inverse Optimal Values

Inverse of the OCP Value

S−1

c (q, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, ∫ c dw ≤ 휐 }

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

slide-34
SLIDE 34

Optimal channel problem (OCP)

Inverse Optimal Values

Inverse of the OCP Value

S−1

c (q, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, ∫ c dw ≤ 휐 }

Inverse of the OTP Value

K−1

c (q, p, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, 휋Yw = p, ∫ c dw ≤ 휐 }

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

slide-35
SLIDE 35

Optimal channel problem (OCP)

Inverse Optimal Values

Inverse of the OCP Value

S−1

c (q, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, ∫ c dw ≤ 휐 }

Inverse of the OTP Value

K−1

c (q, p, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, 휋Yw = p, ∫ c dw ≤ 휐 } These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

slide-36
SLIDE 36

Optimal channel problem (OCP)

Inverse Optimal Values

Inverse of the OCP Value

S−1

c (q, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, ∫ c dw ≤ 휐 }

Inverse of the OTP Value

K−1

c (q, p, 휐) ∶= inf

{ Iw{x, y} ∶ 휋Xw = q, 휋Yw = p, ∫ c dw ≤ 휐 } These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐. If 휐 = Kc(q, p), then S−1

c (q, 휐) ≤ K−1 c (q, p, 휐)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

slide-37
SLIDE 37

Geometry of information divergence and optimization

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 14 / 30

slide-38
SLIDE 38

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휔3 휔2 q 피p{f} ≥ 휐 피p{ln(p∕q)} ≤ 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-39
SLIDE 39

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휐u(휆) = −S−u(q, 휆) utility of information 휆: 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} 휔3 휔2 q 피p{f} ≥ 휐 피p{ln(p∕q)} ≤ 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-40
SLIDE 40

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휐u(휆) = −S−u(q, 휆) utility of information 휆: 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} 휆u(휐) = 휐−1

u (휐) information of

utility 휐: 휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐} 휔3 휔2 q 피p{f} ≥ 휐 피p{ln(p∕q)} ≤ 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-41
SLIDE 41

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휐u(휆) = −S−u(q, 휆) utility of information 휆: 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} 휆u(휐) = 휐−1

u (휐) information of

utility 휐: 휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐} p(훽) optimal solutions: p(훽) ∈ 휕F∗(훽u, q) , F(p(훽), q) = 휆 휔3 휔2 q p훽 피p{f} ≥ 휐 피p{ln(p∕q)} ≤ 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-42
SLIDE 42

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휐u(휆) = −S−u(q, 휆) utility of information 휆: 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} 휆u(휐) = 휐−1

u (휐) information of

utility 휐: 휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐} p(훽) optimal solutions: p(훽) ∈ 휕F∗(훽u, q) , F(p(훽), q) = 휆 휔3 휔2 q p훽 피p{f} ≥ 휐 피p{ln(p∕q)} ≤ 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-43
SLIDE 43

Geometry of information divergence and optimization

Problems on Conditional Extremum

피p{u} = ⟨u, p⟩ expected utility 휐u(휆) = −S−u(q, 휆) utility of information 휆: 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} 휆u(휐) = 휐−1

u (휐) information of

utility 휐: 휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐} p(훽) optimal solutions: p(훽) ∈ 휕F∗(훽u, q) , F(p(훽), q) = 휆 휔3 휔2 q p훽 피p{f} ≥ 휐 ‖p − q‖1 ≤ 휆 피p{w} ≥ 휐

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

slide-44
SLIDE 44

Geometry of information divergence and optimization

General Solution

Lagrange function for 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} (휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐}): L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − F(p, q)] ( L(p, 훽) = F(p, q) + 훽[휐 − ⟨u, p⟩] )

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

slide-45
SLIDE 45

Geometry of information divergence and optimization

General Solution

Lagrange function for 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} (휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐}): L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − F(p, q)] ( L(p, 훽) = F(p, q) + 훽[휐 − ⟨u, p⟩] ) Necessary and sufficient conditions 휕L ∋ 0: 휕pL(p, 훽−1) = {훽u} − 휕pF(p, q) ∋ 0 휕훽−1L(p, 훽−1) = 휆 − F(p, q) = 0

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

slide-46
SLIDE 46

Geometry of information divergence and optimization

General Solution

Lagrange function for 휐u(휆) ∶= sup{⟨u, p⟩ ∶ F(p, q) ≤ 휆} (휆u(휐) ∶= inf{F(p, q) ∶ ⟨u, p⟩ ≥ 휐}): L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − F(p, q)] ( L(p, 훽) = F(p, q) + 훽[휐 − ⟨u, p⟩] ) Necessary and sufficient conditions 휕L ∋ 0: 휕pL(p, 훽−1) = {훽u} − 휕pF(p, q) ∋ 0 휕훽−1L(p, 훽−1) = 휆 − F(p, q) = 0 Optimal solutions are subgradients of F∗(u, q) = sup{⟨u, p⟩ − F(p, q)}: p(훽) ∈ 휕F∗(훽u) , F(p, q) = 휆 ( ⟨u, p(훽)⟩ = 휐 )

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

slide-47
SLIDE 47

Geometry of information divergence and optimization

Example: Exponential Solution

For DKL(p, q) = ⟨ln(p∕q), p⟩ − ⟨1, p − q⟩: L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − ⟨ln(p∕q), p⟩ + ⟨1, p − q⟩]

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

slide-48
SLIDE 48

Geometry of information divergence and optimization

Example: Exponential Solution

For DKL(p, q) = ⟨ln(p∕q), p⟩ − ⟨1, p − q⟩: L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − ⟨ln(p∕q), p⟩ + ⟨1, p − q⟩] Necessary and sufficient conditions ∇L(p, 훽−1) = 0: ∇pL(p, 훽−1) = u − 훽−1 ln(p∕q) = 0 휕훽−1L(p, 훽−1) = 휆 − DKL(p, q) = 0

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

slide-49
SLIDE 49

Geometry of information divergence and optimization

Example: Exponential Solution

For DKL(p, q) = ⟨ln(p∕q), p⟩ − ⟨1, p − q⟩: L(p, 훽−1) = ⟨u, p⟩ + 훽−1[휆 − ⟨ln(p∕q), p⟩ + ⟨1, p − q⟩] Necessary and sufficient conditions ∇L(p, 훽−1) = 0: ∇pL(p, 훽−1) = u − 훽−1 ln(p∕q) = 0 휕훽−1L(p, 훽−1) = 휆 − DKL(p, q) = 0 Optimal solutions are gradients of D∗

KL(u, q) = ln⟨eu, q⟩:

p(훽) = e훽 u−ln Z(훽u) q , DKL(p(훽), q) = 휆

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

slide-50
SLIDE 50

Geometry of information divergence and optimization

Solution to Shannon’s OCP

The solution for Iw{x, y} = DKL(w, q ⊗ p) ≤ 휆: w(훽) = e훽u−ln Z(훽u) q ⊗ p 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

slide-51
SLIDE 51

Geometry of information divergence and optimization

Solution to Shannon’s OCP

The solution for Iw{x, y} = DKL(w, q ⊗ p) ≤ 휆: w(훽) = e훽u−ln Z(훽u) q ⊗ p w ∉ 휕(X ⊗ Y). 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

slide-52
SLIDE 52

Geometry of information divergence and optimization

Solution to Shannon’s OCP

The solution for Iw{x, y} = DKL(w, q ⊗ p) ≤ 휆: w(훽) = e훽u−ln Z(훽u) q ⊗ p w ∉ 휕(X ⊗ Y). T ∶ (X) → (Y) cannot have kernel 훿f(x)(⋅). 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

slide-53
SLIDE 53

Geometry of information divergence and optimization

Solution to Shannon’s OCP

The solution for Iw{x, y} = DKL(w, q ⊗ p) ≤ 휆: w(훽) = e훽u−ln Z(훽u) q ⊗ p w ∉ 휕(X ⊗ Y). T ∶ (X) → (Y) cannot have kernel 훿f(x)(⋅). The dual is strictly convex: D∗

KL(u, q ⊗ p) = ln ∫ eu q ⊗ p

훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

slide-54
SLIDE 54

Geometry of information divergence and optimization

Solution to Kantorovich’s OTP

Γ(q, p) is convex: 휋X((1 − t)w1 + tw2) = (1 − t)q + tq = q 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

slide-55
SLIDE 55

Geometry of information divergence and optimization

Solution to Kantorovich’s OTP

Γ(q, p) is convex: 휋X((1 − t)w1 + tw2) = (1 − t)q + tq = q There exists closed convex functional F: Γ(q, p) = {w ∶ F(w, q ⊗ p) ≤ 1} 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

slide-56
SLIDE 56

Geometry of information divergence and optimization

Solution to Kantorovich’s OTP

Γ(q, p) is convex: 휋X((1 − t)w1 + tw2) = (1 − t)q + tq = q There exists closed convex functional F: Γ(q, p) = {w ∶ F(w, q ⊗ p) ≤ 1} Then the solution to OTP is: w(훽) ∈ 휕F∗(−훽c, q ⊗ p) 훿21 훿22 훿12 훿11

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

slide-57
SLIDE 57

Geometry of information divergence and optimization

Solution to Kantorovich’s OTP

Γ(q, p) is convex: 휋X((1 − t)w1 + tw2) = (1 − t)q + tq = q There exists closed convex functional F: Γ(q, p) = {w ∶ F(w, q ⊗ p) ≤ 1} Then the solution to OTP is: w(훽) ∈ 휕F∗(−훽c, q ⊗ p) 훿21 훿22 훿12 훿11

Monge-Amper equation

q = p◦∇휑|∇2휑| where 휑 ∶ X → ℝ ∪ {∞} is convex, and ∇휑 ∶ X → Y is such that p = q◦(∇휑)−1 (McCann, 1995; Villani, 2009).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

slide-58
SLIDE 58

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-59
SLIDE 59

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w) F ∶ (X ⊗ Y) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕F∗(0) ⊂ Int((X ⊗ Y))

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-60
SLIDE 60

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w) F ∶ (X ⊗ Y) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕F∗(0) ⊂ Int((X ⊗ Y)) If F∗ is strictly convex, then

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-61
SLIDE 61

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w) F ∶ (X ⊗ Y) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕F∗(0) ⊂ Int((X ⊗ Y)) If F∗ is strictly convex, then

1

w(훽) ∈ 휕(X ⊗ Y) iff 휆 ≥ sup F (i.e. 훽 → ∞).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-62
SLIDE 62

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w) F ∶ (X ⊗ Y) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕F∗(0) ⊂ Int((X ⊗ Y)) If F∗ is strictly convex, then

1

w(훽) ∈ 휕(X ⊗ Y) iff 휆 ≥ sup F (i.e. 훽 → ∞).

2

For any v ∈ 휕(X ⊗ Y) with F(v) = F(w(훽)) = 휆 피v{u} < 피w(훽){u}

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-63
SLIDE 63

Geometry of information divergence and optimization

Strict Inequalities

Theorem (Belavkin, 2013b)

Let {w(훽)}u be a family of w(훽) ∈ (X ⊗ Y) maximizing 피w{u} on sets {w ∶ F(w) ≤ 휆} , ∀ 휆 = F(w) F ∶ (X ⊗ Y) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕F∗(0) ⊂ Int((X ⊗ Y)) If F∗ is strictly convex, then

1

w(훽) ∈ 휕(X ⊗ Y) iff 휆 ≥ sup F (i.e. 훽 → ∞).

2

For any v ∈ 휕(X ⊗ Y) with F(v) = F(w(훽)) = 휆 피v{u} < 피w(훽){u}

3

For any v ∈ 휕(X ⊗ Y) with 피v{u} = 피w(훽){u} = 휐 F(v) > F(w(훽))

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

slide-64
SLIDE 64

Geometry of information divergence and optimization

Strict Bounds for Monge OTP

Corollary

Let wf ∈ Γ(q, p) be a solution to Monge OTP Kc(q, p).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

slide-65
SLIDE 65

Geometry of information divergence and optimization

Strict Bounds for Monge OTP

Corollary

Let wf ∈ Γ(q, p) be a solution to Monge OTP Kc(q, p). Let w(훽) is a solution to Shannon’s OCP Sc(q, 휆).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

slide-66
SLIDE 66

Geometry of information divergence and optimization

Strict Bounds for Monge OTP

Corollary

Let wf ∈ Γ(q, p) be a solution to Monge OTP Kc(q, p). Let w(훽) is a solution to Shannon’s OCP Sc(q, 휆). If wf and w(훽) have equal Iwf {x, y} = Iw(훽){x, y} = 휆 < sup Iw{x, y}, then Kc(q, p) > Sc(q, 휆) > 0

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

slide-67
SLIDE 67

Geometry of information divergence and optimization

Strict Bounds for Monge OTP

Corollary

Let wf ∈ Γ(q, p) be a solution to Monge OTP Kc(q, p). Let w(훽) is a solution to Shannon’s OCP Sc(q, 휆). If wf and w(훽) have equal Iwf {x, y} = Iw(훽){x, y} = 휆 < sup Iw{x, y}, then Kc(q, p) > Sc(q, 휆) > 0 If wf and w(훽) achieve equal values Kc(q, p) = Sc(q, 휆) = 휐 > 0, then K−1

c (q, p, 휐) > S−1 c (q, 휐)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

slide-68
SLIDE 68

Geometry of information divergence and optimization

Optimal Transport and the Expected Utility Principle

Let (X × Y, ≲) be a set with preference relation (total pre-order).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

slide-69
SLIDE 69

Geometry of information divergence and optimization

Optimal Transport and the Expected Utility Principle

Let (X × Y, ≲) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v, w ∈ (X ⊗ Y): v ≲ w ⟺ 피v{u} ≤ 피w{u} ∃ u ∶ X × Y → ℝ

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

slide-70
SLIDE 70

Geometry of information divergence and optimization

Optimal Transport and the Expected Utility Principle

Let (X × Y, ≲) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v, w ∈ (X ⊗ Y): v ≲ w ⟺ 피v{u} ≤ 피w{u} ∃ u ∶ X × Y → ℝ Based on linear and Archimedian axioms: v ≲ w ⟺ 휆v ≲ 휆w , ∀ 휆 > 0 (1) v ≲ w ⟺ v + r ≲ w + r , ∀ r ∈ L (2) nv ≲ w , ∀ n ∈ ℕ ⇒ v ≲ 0 (3)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

slide-71
SLIDE 71

Geometry of information divergence and optimization

Optimal Transport and the Expected Utility Principle

Let (X × Y, ≲) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v, w ∈ (X ⊗ Y): v ≲ w ⟺ 피v{u} ≤ 피w{u} ∃ u ∶ X × Y → ℝ Based on linear and Archimedian axioms: v ≲ w ⟺ 휆v ≲ 휆w , ∀ 휆 > 0 (1) v ≲ w ⟺ v + r ≲ w + r , ∀ r ∈ L (2) nv ≲ w , ∀ n ∈ ℕ ⇒ v ≲ 0 (3)

Theorem

Pre-ordered linear space (L, ≲) satisfies (1), (2) and (3) if and only if (L, ≲) has a utility representation by a closed linear functional u ∶ L → ℝ.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

slide-72
SLIDE 72

Dynamical OTP: Optimization of evolution

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 23 / 30

slide-73
SLIDE 73

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-74
SLIDE 74

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-75
SLIDE 75

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-76
SLIDE 76

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-77
SLIDE 77

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-78
SLIDE 78

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-79
SLIDE 79

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-80
SLIDE 80

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-81
SLIDE 81

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-82
SLIDE 82

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-83
SLIDE 83

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-84
SLIDE 84

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-85
SLIDE 85

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-86
SLIDE 86

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-87
SLIDE 87

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-88
SLIDE 88

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-89
SLIDE 89

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-90
SLIDE 90

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-91
SLIDE 91

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-92
SLIDE 92

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-93
SLIDE 93

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-94
SLIDE 94

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-95
SLIDE 95

Dynamical OTP: Optimization of evolution

Dynamical Problems

q(t) ↦ Tq(t) = q(t + 1) q(t) ↦ Tsq(t) = q(t + s) q(t) → 훿⊤ Expected utility: 피q(t){u} = ∫X u dq(t) Why not directly q ↦ Tq = 훿⊤? q

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

slide-96
SLIDE 96

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-97
SLIDE 97

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l. Hamming metric dH(x, y) = ‖y − x‖H = l − ∑l

i=1 훿xiyi

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-98
SLIDE 98

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l. Hamming metric dH(x, y) = ‖y − x‖H = l − ∑l

i=1 훿xiyi

|{1, … , 훼}l| = 훼l

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-99
SLIDE 99

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l. Hamming metric dH(x, y) = ‖y − x‖H = l − ∑l

i=1 훿xiyi

|{1, … , 훼}l| = 훼l To find ⊤ in one step we need l log2 훼 bits.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-100
SLIDE 100

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l. Hamming metric dH(x, y) = ‖y − x‖H = l − ∑l

i=1 훿xiyi

|{1, … , 훼}l| = 훼l To find ⊤ in one step we need l log2 훼 bits. dH(⊤, x) communicates no more than log2(l + 1) bits.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-101
SLIDE 101

Dynamical OTP: Optimization of evolution

Example: Search in Hamming space {1, … , 훼}l

Find ⊤ ∈ {1, … , 훼}l. Hamming metric dH(x, y) = ‖y − x‖H = l − ∑l

i=1 훿xiyi

|{1, … , 훼}l| = 훼l To find ⊤ in one step we need l log2 훼 bits. dH(⊤, x) communicates no more than log2(l + 1) bits. For l ≥ 2 log2(l + 1) < l log2 훼

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

slide-102
SLIDE 102

Dynamical OTP: Optimization of evolution

Optimal Control of Mutation Rate

Optimal transition kernels are P훽(y ∣ x) = e−훽 ‖y−x‖H [1 + (훼 − 1)e−훽]l =

l

i=1

e−훽 (1−훿xiyi) 1 + (훼 − 1)e−훽

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 26 / 30

slide-103
SLIDE 103

Dynamical OTP: Optimization of evolution

Optimal Control of Mutation Rate

Optimal transition kernels are P훽(y ∣ x) = e−훽 ‖y−x‖H [1 + (훼 − 1)e−훽]l =

l

i=1

e−훽 (1−훿xiyi) 1 + (훼 − 1)e−훽 훽 = ln (휇−1 − 1) + ln(훼 − 1), where 휇 = 휐∕l is the mutation rate, defined by the constraint 피{‖y − x‖H} ≤ 휐.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 26 / 30

slide-104
SLIDE 104

Dynamical OTP: Optimization of evolution

Optimal Control of Mutation Rate

Optimal transition kernels are P훽(y ∣ x) = e−훽 ‖y−x‖H [1 + (훼 − 1)e−훽]l =

l

i=1

e−훽 (1−훿xiyi) 1 + (훼 − 1)e−훽 훽 = ln (휇−1 − 1) + ln(훼 − 1), where 휇 = 휐∕l is the mutation rate, defined by the constraint 피{‖y − x‖H} ≤ 휐. Optimal randomization corresponds to independent substitution of letters with probability 휇 (mutation rate): P휇(r ∣ n) = ( l r ) 휇(n)r(1 − 휇(n))l−r

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 26 / 30

slide-105
SLIDE 105

Dynamical OTP: Optimization of evolution

Optimal Control of Mutation Rate

Optimal transition kernels are P훽(y ∣ x) = e−훽 ‖y−x‖H [1 + (훼 − 1)e−훽]l =

l

i=1

e−훽 (1−훿xiyi) 1 + (훼 − 1)e−훽 훽 = ln (휇−1 − 1) + ln(훼 − 1), where 휇 = 휐∕l is the mutation rate, defined by the constraint 피{‖y − x‖H} ≤ 휐. Optimal randomization corresponds to independent substitution of letters with probability 휇 (mutation rate): P휇(r ∣ n) = ( l r ) 휇(n)r(1 − 휇(n))l−r Which 휇(n)?

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 26 / 30

slide-106
SLIDE 106

Dynamical OTP: Optimization of evolution

Optimal Control of Mutation Rate

Optimal transition kernels are P훽(y ∣ x) = e−훽 ‖y−x‖H [1 + (훼 − 1)e−훽]l =

l

i=1

e−훽 (1−훿xiyi) 1 + (훼 − 1)e−훽 훽 = ln (휇−1 − 1) + ln(훼 − 1), where 휇 = 휐∕l is the mutation rate, defined by the constraint 피{‖y − x‖H} ≤ 휐. Optimal randomization corresponds to independent substitution of letters with probability 휇 (mutation rate): P휇(r ∣ n) = ( l r ) 휇(n)r(1 − 휇(n))l−r Which 휇(n)? The CDF method: 휇(n) =

n−1

m=0

( l m ) (훼 − 1)m 훼l

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 26 / 30

slide-107
SLIDE 107

Dynamical OTP: Optimization of evolution

Evolution of Fitness in Information

1 2 3 4 5 6 7 5 10 15 20 Distance to optimum n = d(⊤, a) Information (bits) Constant 1∕l Step Linear n∕l max휇 P휇(m < n ∣ n) P0(m < n ∣ n) 휙∗(n; 휆)

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 27 / 30

slide-108
SLIDE 108

Dynamical OTP: Optimization of evolution

Mutation Rate Control in E. coli

Used strains of Escherichia coli K-12 MG1665

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 28 / 30

slide-109
SLIDE 109

Dynamical OTP: Optimization of evolution

Mutation Rate Control in E. coli

Used strains of Escherichia coli K-12 MG1665 Fluctuation test using media 50휇g∕ml of Rifamipicin

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 28 / 30

slide-110
SLIDE 110

Dynamical OTP: Optimization of evolution

Mutation Rate Control in E. coli

Used strains of Escherichia coli K-12 MG1665 Fluctuation test using media 50휇g∕ml of Rifamipicin Estimated mutation rates 휇 in E.coli strains grown in Davis minimal medium with different amount of glucose.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 28 / 30

slide-111
SLIDE 111

Dynamical OTP: Optimization of evolution

Experimental Results (Krašovec et al., 2014)

1 2 3 4 2 4 6 8 10 12 Mutation rate (10−9 / gen) Population density (108 c.f.u./ml) Strong relationship between 휇 and density of cells (p < .0001).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 29 / 30

slide-112
SLIDE 112

Dynamical OTP: Optimization of evolution

Experimental Results (Krašovec et al., 2014)

1 2 3 4 2 4 6 8 10 12 Mutation rate (10−9 / gen) Population density (108 c.f.u./ml) Strong relationship between 휇 and density of cells (p < .0001). No such relationship in the luxS quorum sensing mutant (p = .0234).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 29 / 30

slide-113
SLIDE 113

Dynamical OTP: Optimization of evolution

Experimental Results (Krašovec et al., 2014)

1 2 3 4 2 4 6 8 10 12 Mutation rate (10−9 / gen) Population density (108 c.f.u./ml) Strong relationship between 휇 and density of cells (p < .0001). No such relationship in the luxS quorum sensing mutant (p = .0234).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 29 / 30

slide-114
SLIDE 114

Dynamical OTP: Optimization of evolution

Experimental Results (Krašovec et al., 2014)

1 2 3 4 2 4 6 8 10 12 Mutation rate (10−9 / gen) Population density (108 c.f.u./ml) Strong relationship between 휇 and density of cells (p < .0001). No such relationship in the luxS quorum sensing mutant (p = .0234).

Krašovec, R., Belavkin, R., Aston, J., Channon, A., Aston, E., Rash, B., Kadirvel, M.,

  • Forbes. S., Knight, C. G. (2014, April). Mutation-rate-plasticity in rifampicin resistance

depends on Escherichia coli cell-cell interactions. Nature Communications, Vol. 5 (3742).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 29 / 30

slide-115
SLIDE 115

References

Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 30 / 30

slide-116
SLIDE 116

References

Belavkin, R. V. (2013a). Law of cosines and Shannon-Pythagorean theorem for quantum information. In F. Nielsen & F. Barbaresco (Eds.), Geometric science of information (Vol. 8085, pp. 369–376). Heidelberg: Springer. Belavkin, R. V. (2013b). Optimal measures and Markov transition kernels. Journal of Global Optimization, 55, 387–416. Dobrushin, R. L. (1970). Prescribing a system of random variables by conditional distributions. Theory Probab. Appl., 15(3), 458âĂŞ-486. Kantorovich, L. V. (1939). Mathematical methods in the organization and planning of production (Tech. Rep.). Leningrad Univ. (English translation: Management Science, 6, 4 (1960), 363âĂŞ422) Kantorovich, L. V. (1942). On translocation of masses. USSR AS Doklady, 37(7–8), 227–229. ((in Russian). English translation: J. Math. Sci., 133, 4 (2006), 1381âĂŞ1382) Krašovec, R., Belavkin, R. V., Aston, J. A. D., Channon, A., Aston, E., Rash, B. M., et al. (2014, April). Mutation rate plasticity in rifampicin resistance depends on escherichia coli cell-cell interactions. Nature Communications, 5(3742).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 30 / 30

slide-117
SLIDE 117

References

Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. McCann, R. J. (1995). Existence and uniqueness of monotone measure-preserving maps. Duke Math. J., 80(2), 309âĂŞ-323. Monge, G. (1781). Mémoire sur la théorie des déblais et de remblais. Paris: Histoire de l’AcadÂťemie Royale des Sciences avec les Mémoires de Mathématique & de Physique. Neumann, J. von, & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press. Shannon, C. E. (1948, July and October). A mathematical theory of

  • communication. Bell System Technical Journal, 27, 379–423 and

623–656. Stratonovich, R. L. (1965). On value of information. Izvestiya of USSR Academy of Sciences, Technical Cybernetics, 5, 3–12. (In Russian) Vasershtein, L. N. (1969). Markov processes over denumerable products of spaces describing large system of automata. Problems Inform. Transmission, 5(3), 47âĂŞ-52. Villani, C. (2009). Optimal transport. old and new (Vol. 338).

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 30 / 30

slide-118
SLIDE 118

Dynamical OTP: Optimization of evolution

Springer-Verlag.

Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 30 / 30