Coalgebraic Tools for Randomness-Conserving Protocols Matvey - - PowerPoint PPT Presentation
Coalgebraic Tools for Randomness-Conserving Protocols Matvey - - PowerPoint PPT Presentation
Coalgebraic Tools for Randomness-Conserving Protocols Matvey Soloviev (Cornell University) RAMiCS 2018, Groningen joint work with Dexter Kozen 1 This Talk A coalgebraic model for constructing and reasoning about state-based protocols that
SLIDE 1
SLIDE 2
This Talk
A coalgebraic model for constructing and reasoning about state-based protocols that implement effjcient reductions among random processes (effjcient = conserve randomness) Basic tools that allow effjcient protocols to be constructed compositionally Tradeofgs between latency and effjciency Several examples of effjcient reductions Toward a general coalgebraic semantics of reductions
2
SLIDE 3
Randomness as a Computational Resource
Randomness is a resource to be conserved Information and coding [Shannon] Probabilistic complexity and derandomization [Luby] Pseudo-random number generation [Yao, Nisan, Wigderson] Extracting strong randomness from weak sources [von Neumann, Elias, Blum] A recent application: Routing in networks Randomized routing, gossip protocols, load balancing Desirable to minimize local state to achieve high throughput
3
SLIDE 4
Measuring Randomness
Discrete reduction protocol A procedure that maps an input stream to an output stream
abcbbbcabbaccbacababbcba discrete reduction protocol 011000010110010101111010
If the input sequence comes from a random process, then the statistical properties of the input stream impart statistical properties to the output stream We can think of the process as a reduction between random sources But randomness can be lost …
4
SLIDE 5
Shannon Entropy
Entropy of a discrete distribution µ = p1, . . . , pn H(µ) = −∑
i
pi log pi Usually described as a measure of uncertainty or information content Represents an absolute limit on lossless compression (Shannon source coding theorem, 1948) = the number of fair coin fmips is worth
5
SLIDE 6
Shannon Entropy
Entropy of a discrete distribution µ = p1, . . . , pn H(µ) = −∑
i
pi log pi Usually described as a measure of uncertainty or information content Represents an absolute limit on lossless compression (Shannon source coding theorem, 1948) H(µ) = the number of fair coin fmips µ is worth
5
SLIDE 7
Entropy as a Measure of Randomness
The entropy of µ is the number of fair coin fmips it is worth
0110010101111010 discrete reduction protocol stream of digits distributed as µ
1/H(µ) is an upper bound on the rate of production achievable asymptotically (requires unbounded latency)
stream of digits distributed as µ discrete reduction protocol 0110010101111010
H(µ) is an upper bound on the rate of production achievable asymptotically (requires unbounded latency)
6
SLIDE 8
Effjciency of a Simulation
discrete reduction protocol stream of digits over Σ distributed as µ stream of digits over Γ distributed as ν
Effjciency = Eprod · H(ν) Econs · H(µ) ≤ 1 Econs = expected number of digits consumed Eprod = expected number of digits produced H(µ) = entropy of input distribution H(ν) = entropy of output distribution
7
SLIDE 9
Effjciency of a Simulation
discrete reduction protocol stream of digits over Σ distributed as µ stream of digits over Γ distributed as ν
Effjciency = Eprod · H(ν) Econs · H(µ) ≤ 1 Measures the amount of randomness lost in the conversion May vary with time Cannot exceed unity [Shannon] Unity is achievable asymptotically [Elias, Cover & Thomas]; requires unbounded latency
8
SLIDE 10
Sometimes Perfect Effjciency is Achievable
10 110 111 a : 1
2
b : 1
4
c : 1
8
d : 1
8
a b c d H : 1
2
T : 1
2
H : 1
2
T : 1
2
H : 1
2
T : 1
2
H(1 2, 1 4, 1 8, 1 8) = 7/4 H(1 2, 1 2) = 1
9
SLIDE 11
The von Neumann Trick [1951]
H T p 1 − p 1 − p p p 1 − p
To simulate a fair coin with a bias-p coin: fmip the bias-p coin twice 01 ⇒ H 10 ⇒ T 00 or 11 ⇒ fmip again Oblivious to the bias of the input coin, but effjciency is poor: for p = 1/3, Econs/Eprod = 4.5 Shannon says 1/(−(1/3) log(1/3) − (2/3) log(2/3)) ≈ 1.083 · · ·
10
SLIDE 12
A More Effjcient Protocol
H T H
1 3 2 3 1 3 2 3 2 3 1 3
Not oblivious to the bias p = 1/3, but effjciency is better: Econs/Eprod = 2 This is optimal for single-digit-output protocols
11
SLIDE 13
Other Direction (1
2, 1 2 ⇒ 1 3, 2 3)
T H
1 2 1 2 1 2 1 2
Pr(H) = 1
4 + 1 16 + 1 64 + · · · = 1 3
Pr(T) = 1
2 + 1 8 + 1 32 + · · · = 2 3
Q: Is this optimal? A: No! , but
12
SLIDE 14
Other Direction (1
2, 1 2 ⇒ 1 3, 2 3)
T H
1 2 1 2 1 2 1 2
Pr(H) = 1
4 + 1 16 + 1 64 + · · · = 1 3
Pr(T) = 1
2 + 1 8 + 1 32 + · · · = 2 3
Q: Is this optimal? A: No! Econs/Eprod = 2, but − 1
3 log 1 3 − 2 3 log 2 3 ≈ .92 · · · 12
SLIDE 15
How to Do Better?
H H H H H H H H H H T T T T T T T T T T T T T T T T T T T T H H H T T T T T T H H H H H H T T T T T T T T T T T T H T T H H T T T T H H T T T T H H H H T T T T T T T T
Q: Is this optimal? A: No, but better!
13
SLIDE 16
How to Do Better?
H H H H H H H H H H T T T T T T T T T T T T T T T T T T T T H H H T T T T T T H H H H H H T T T T T T T T T T T T H T T H H T T T T H H T T T T H H H H T T T T T T T T
Q: Is this optimal? A: No, but better! Econs/Eprod = 5/2.625 = 1.905
13
SLIDE 17
Latency
This protocol has many more states, and we’ll have to read in at least 4 symbols before we output anything. Defjne: Latency = expected consumption before producing at least one output symbol. Generally, higher latency = longer input = higher-grained probability (sub)space = more leeway to carve it up into “correctly sized” chunks for better effjciency. This tradeofg is inevitable whenever no perfect protocol exists.
14
SLIDE 18
Asymptotic optimality is not everything
It’s known that asymptotically optimal families of reductions exist. Now we can say that some are better than others: it matters whether effjciency 1 − ε would require latency O(1/ε), O(1/ε2) or even worse.
15
SLIDE 19
Notation
Σ, Γ fjnite alphabets Σ∗ = fjnite words over Σ x, y, . . . ∈ Σ∗, Γ∗ Σω = ω-words (streams) over Σ α, β, . . . ∈ Σω, Γω ⪯ prefjx, ≺ proper prefjx µ is a probability measure on Σ, endow Σω with the product measure – each symbol independent and distributed as µ The measurable sets of Σω are the Borel sets of the Cantor space topology whose basic open sets are the intervals {α ∈ Σω | x ≺ α} for x ∈ Σ∗ µ(a1a2 · · · an) = µ(a1)µ(a2) · · · µ(an) µ({α ∈ Σω | x ≺ α}) = µ(x)
16
SLIDE 20
Protocols
A protocol is a coalgebra (S, δ) where δ : S × Σ → S × Γ∗ (a form
- f Mealy automaton)
Extend δ to domain S × Σ∗ by coinduction: δ(s, ε) = (s, ε) δ(s, ax) = let (t, y) = δ(s, a) in let (u, z) = δ(t, x) in (u, yz) It follows that δ(s, xy) = let (t, z) = δ(s, x) in let (u, w) = δ(t, y) in (u, zw)
17
SLIDE 21
Extension to Streams
A protocol δ also induces a partial map δω : S × Σω ⇀ Γω by coinduction: δω(s, aα) = let (t, z) = δ(s, a) in z · δω(t, α) It follows that δω(s, xα) = let (t, z) = δ(s, x) in z · δω(t, α) Given α ∈ Σω, this defjnes a unique infjnite string in δω(s, α) ∈ Γω except in the degenerate case in which only fjnitely many output letters are ever produced A protocol is productive (wrt a given probability measure on input streams) if, starting in any state, an output symbol is produced within fjnite expected time (therefore w.p. 1)
18
SLIDE 22
Reductions
Let ν be a probability measure on Γ, ν(a1 · · · an) = ν(a1) · · · ν(an) (S, δ, s) with start state s ∈ S is a reduction from µ to ν if ∀y ∈ Γ∗ µ({α | y ⪯ δω(s, α)}) = ν(y) This implies that the symbols of δω(s, α) are independent and distrinbuted as ν
19
SLIDE 23
Advantages of the Coalgebraic View
Many constructions in the information theory literature are expressed in terms of trees – but Protocols are coalgebras δ : S × Σ → S × Γ∗, a form of Mealy automata, i.e. not trees This class admits a fjnal coalgebra D : (Γ∗)Σ+ × Σ → (Γ∗)Σ+ × Γ∗, where D( f, a) = ( f @a, f (a)) f @a(x) = f (ax), a ∈ Σ, x ∈ Σ+ Extension to streams Dω : (Γ∗)Σ+ × Σω ⇀ Γω Dω( f, aα) = f (a) · Dω( f @a, α)
20
SLIDE 24
Advantages of the Coalgebraic View
A state f : Σ+ → Γ∗ can be viewed as a labeled tree with nodes Σ∗ and edge labels Γ∗ The nodes xa are the children of x for x ∈ Σ∗ and a ∈ Σ The label on the edge (x, xa) is f (xa) The tree f @x is the subtree rooted at x ∈ Σ∗, where f @x(y) = f (xy) For any coalgebra (S, δ), there is a unique coalgebra morphism h : (S, δ) → ((Γ∗)Σ+, D) defjned coinductively by (h(s)@a, h(s)(a)) = let (t, z) = δ(s, a) in (h(t), z) Protocols can inherit structure from the fjnal coalgebra under h−1, thereby providing a mechanism for transferring results on trees to state transition systems
21
SLIDE 25
Restart Protocols
A prefjx code is a subset A ⊆ Σ∗ such that every element of Σω has at most one prefjx in A The elements of a prefjx code are ⪯-incomparable A prefjx code is exhaustive (wrt µ) if α ∈ Σω has a prefjx in A w.p. 1 A restart protocol (S, δ, s) is determined by a function f : A ⇀ Γ∗, where A is an exhaustive prefjx code Intuitively, starting in s, read symbols of Σ from the input stream until encountering a string x ∈ A, output f (x), repeat
22
SLIDE 26
Restart Protocols
Formally, S = {u ∈ Σ∗ | x ̸⪯ u for any x ∈ A} δ(u, a) = (ua, ε), ua ̸∈ A, (ε, z), ua ∈ A and f (ua) = z with start state ε.
23
SLIDE 27
Convergence
We say that the sequence Xn converges to X in probability and write Xn
Pr
− → X if ∀ε > 0 Pr(|Xn − X| > ε) = o(1).
24
SLIDE 28
Effjciency, revisited
Effjciency = the long-term ratio of entropy production to entropy consumption Formally, En(α) = |δ(s, αn)| n · H(ν) H(µ) where H is the Shannon entropy H(p1, . . . , pn) = −
n
∑
i=1
pi log pi Intuitively, En measures the ratio of entropy production to consumption after n steps of δ starting in state s
25
SLIDE 29
Effjciency, revisited
In most cases of interest, En converges to a unique constant value En
Pr
− → Effδ independent of start state and history Well-defjned for fjnite-state protocols For restart protocols, it is enough to measure the ratio for one iteration of the protocol.
26
SLIDE 30
Properties of δω
Theorem (i) The partial function δω(s, −) : Σω ⇀ Γω is continuous, thus Borel measurable (ii) If δ is productive, then δω(s, α) is almost surely infjnite; that is, µ(dom δω(s, −)) = 1 (iii) The measure ν on Γω is the push-forward measure ν = µ ◦ δω(s, −)−1
27
SLIDE 31
Properties of En
Theorem If δ is a reduction from µ to ν, then the random variables En are continuous and uniformly bounded by an absolute constant R > 0 depending only on µ and ν
28
SLIDE 32
Sequential Composition
Given δ1 : S × Σ → S × Γ∗ δ2 : T × Γ → T × ∆∗ defjne (δ1 ; δ2) : S × T × Σ → S × T × ∆∗ (δ1 ; δ2)((s, t), a) = let (u, y) = δ1(s, a) in let (v, z) = δ2(t, y) in ((u, v), z) Run δ1 for one step, then run δ2 on the output of δ1
29
SLIDE 33
Correctness of Sequential Composition
Theorem The partial maps (δ1 ; δ2)ω((s, t), −) δω
2 (t, δω 1 (s, −))
- f type Σω ⇀ ∆ω are defjned and agree on all but a µ-
nullset The map on infjnite strings induced by the sequential composition
- f protocols is almost everywhere equal to the functional
composition of the induced maps of the component protocols
30
SLIDE 34
Correctness of Sequential Composition
Proof idea Show that the binary relation βRγ ⇔ ∃α ∈ Σω ∃s ∈ S ∃t ∈ T β = (δ1 ; δ2)ω((s, t), α) ∧ γ = δω
2 (t, δω 1 (s, α))
- n ∆ω is a bisimulation
31
SLIDE 35
Reductions Compose
Theorem If δ1(s, −) is a reduction from µ to ν and δ2(t, −) is a re- duction from ν to o, then (δ1 ; δ2)((s, t), −) is a reduction from µ to o Proof. Follows from the previous theorem and ν = µ ◦ δω
1 (s, −)−1
- = ν ◦ δω
2 (t, −)−1 32
SLIDE 36
Theorem If δ1(s, −) is a reduction from µ to ν and δ2(t, −) is a reduction from ν to o, and if Effδ1 and Effδ2 exist, then Effδ1;δ2 exists and Effδ1;δ2 = Effδ1 · Effδ2
33
SLIDE 37
Serial Protocols
Consider a sequence (S0, δ0, s0), (S1, δ1, s1), . . . of positive recurrent restart protocols defjned in terms of maps fk : Ak → Γ∗, where the Ak are exhaustive prefjx codes These can be combined into a single serial protocol δ that executes
- ne iteration of each δk, then goes on to the next
Formally, the states of δ are the disjoint union of the Sk, and δ is defjned so that δ(sk, x) = (sk+1, fk(x)) for x ∈ Ak, and within Sk behaves like δk.
34
SLIDE 38
Serial Protocols
Theorem Let δ be a serial protocol with fjnite-state components δ0, δ1, . . . having subexponential growth: max
x∈An |x| = o( n−1
∑
i=0
ci) Let cn and pn be the expected consumption and produc- tion, respectively, of one iteration of δn. If the limit ℓ = lim
n
∑n
i=0 pi
∑n
i=0 ci
exists, then the effjciency of the serial protocol exists and is equal to ℓ.
35
SLIDE 39
A Reduction
d-Uniform ⇒ c-Uniform Let m = ⌊k logc d⌋. Let the c-ary expansion of dk be dk =
m
∑
i=0
aici Do k calls on the d-uniform distribution. For each 0 ≤ i ≤ m, for aici of the possible outcomes, emit a c-ary string
- f length i, every possible such string occurring exactly ai
- times. For a0 outcomes, nothing is emitted (and this is lost
entropy), but this occurs with probability a0d−k. Restart. latency = k, effjciency = 1 − Θ(k−1) can combine these into a serial protocol with asymptotically optimal effjciency (at the cost of unbounded latency)
36
SLIDE 40
Other Reductions
Uniform ⇒ Rational with effjciency 1 − Θ(k−1) Uniform ⇒ Arbitrary with effjciency 1 − Θ(k−1) Arbitrary ⇒ Uniform with effjciency 1 − Θ(log k/k) ( 1
r , r−1 r ) ⇒ (r − 1)-Uniform with effjciency 1 − Θ(k−1)
Uses Dirichlet approximation
37
SLIDE 41
What We Did and Didn’t Do
What we did A coalgebraic model for constructing and reasoning about state-based protocols that implement entropy-conserving reductions between random processes Provided provide basic tools that allow effjcient protocols to be constructed in a compositional way analyzed tradeofg between latency and loss of entropy illustrated the use of the model in various reductions
38
SLIDE 42
What We Did and Didn’t Do
What we didn’t do We have considered only homogeneous measures on Σω and Γω, those induced by Bernoulli processes in which the probabilistic choices are i.i.d., for fjxed fjnite Σ and Γ However, the coalgebraic defjnitions of protocol and reduction make sense even if Σ and Γ are countably infjnite and even if the measures are non-homogeneous
39
SLIDE 43
Open questions
Further open questions Can we do better when converting from non-uniform distributions? The notion of latency doesn’t feel quite right: e.g. can only guarantee k ≤ latency of composition ≤ k · k′. Can we do better? Infjnite alphabets? Continuous space?
40
SLIDE 44
Non-Homogeneous Processes
A fjxed measure µ on Σ induces a homogeneous measure (the product measure) on Σω But in the fjnal coalgebra, we can go the other direction: For an arbitrary µ on Σω and state f : Σ+ → Γ∗, there is a unique assignment of transition probabilities on Σ+ compatible with µ: f (xa) = µ({α | xa ≺ α}) µ({α | x ≺ α}) This determines the probabilistic behavior of the fjnal coalgebra as a protocol starting in state f when the input stream is distributed as µ Any measure µ on Σω induces a push-forward measure µ ◦ (Dω)−1 on Γω. This gives a notion of reduction even in the non-homogeneous case
41
SLIDE 45
Non-Homogeneous Processes
This behavior would also be refmected in any protocol (S, δ) starting in any state s ∈ h−1( f ) under the same measure on input streams, thus providing a semantics for (S, δ) even under non-homogeneous conditions Thus we can lift the entire theory to Mealy automata that
- perate probabilistically relative to an arbitrary µ on Σω
These are essentially discrete Markov transition systems with
- bservations in Γ∗.
42
SLIDE 46
Continuous Space
The state set S and alphabets Σ and Γ need not be discrete The appropriate generalization would give reductions between discrete-time, continuous-space Markov transition systems [Panangaden 2009, Doberkat 2007] Let S, Σ, and Γ be measurable spaces. A reduction protocol is a measurable function δ : S × Σ → S × Γ∗ The fjnal coalgebra is D : (Γ∗)Σ+ × Σ → (Γ∗)Σ+ × Γ∗ D( f, a) = ( f @a, f (a)) where f @x is the subtree of f at x, f @x(y) = f (xy)
43
SLIDE 47
Continuous Space
Given a probability measure µ on Σω, the transition kernel at x ∈ Σ∗ is the function Kµ(x, B) = µ(π−1
| x |(B) | x)
where B is a measurable subset of Σ µ(A | x) is the conditional probability of A given x, A ⊆ Σω Can be obtained by disintegration from the joint distribution θ(B × A) = µ(π−1
n (B) ∩ A), B ⊆ Σn, A ⊆ Σω
The partial function Dω : Σω ⇀ Γω is measurable and induces a push-forward measure µ ◦ (Dω)−1 on Γω. This is a reduction from µ to µ ◦ (Dω)−1
44
SLIDE 48
Continuous Space
Martingales Let B and Bω be the Borel sets of Σ and Σω, respectively For any n and A ∈ Bω, consider the measurable function Xn(α) = µ(A | αn) and σ-subalgebra Bn ⊆ Bω generated by {π−1
i
(B) | i ≤ n, B ∈ B} The sequence (Xn, Bn) forms a martingale that by the Lévy 0,1-law converges almost surely to the characteristic function of A
45
SLIDE 49