Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces
Preetum Nakkiran
Harvard University Joint works with: Venkatesan Guruswami, Madhu Sudan + Jarosław Błasiok, Atri Rudra
Ho How to to Co Compr mpress ss Hid Hidden en Markov Source - - PowerPoint PPT Presentation
Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces Preetum Nakkiran Harvard University Joint works with: Venkatesan Guruswami, Madhu Sudan + Jarosaw Basiok, Atri Rudra Compression Problem: Given ! symbols from a
Harvard University Joint works with: Venkatesan Guruswami, Madhu Sudan + Jarosław Błasiok, Atri Rudra
down to < ! symbols (ideally to “entropy” of the source)
(s.t. decompression succeeds with high probability)
1 1 1 1 1 1 1 Compress B(p) B(p) B(p) … (Symbol alphabet can be arbitrary) ≈ $!%&'() 0.9
B(0.5) B(0.1)
0.9 0.1 0.1
For a source distribution on ("#, "%, … , "'): 1. How much can we compress?
E.g. for iid Bernoulli(p): entropy = *)(+).
2. Efficiency?
* ,-./01, ↦ *) + + *#45 ,-./01,
Achieves within 6 of entropy rate ( * ,-./01, ↦ *[) + + 6] ) at blocklength * ≥ +01-(#
:)
Compression/decompression algorithms which, given the HMM source, achieve:
( symbols ↦ 0 !" + 23 $ ⋅ ($56 symbols
Non-explicit: ( ↦ 0 !" + ( [Lempel-Ziv]: ( ↦ 0 !" + 7 ( . Nonlinear. But, works for unknown HMM.
(for HMM with mixing time 2)
Alice sends $ ∈ &'
(
Bob receives ) = $ + , for - = -., -/, … -( ∼ "
error-correcting code for "-channel:
( → &' (56 be compression matrix. Pe can be
decoded to e whp when - ∼ "
Efficiency: compression which rapidly approaches entropy rate ⇒ code which rapidly approaches capacity
P e Alice Bob ⊕
correcting codes for Markovian errors.
“noisy” and “nice”, and transitions between them.
0.9
BSC(0.5) BSC(0.1)
0.9 0.1 0.1 “Noisy” channel “Nice” channel
The plan:
extended in [BGNRS ’18]
blocklengths’’: within ! of capacity at blocklengths " ≥ $%&'()
*)
bits
P such that, on input " # $, first block
can guess &) whp, then invert P to decompress.
P "(#) "(#) … "(#) Set S: ( &' ≈ . Set /: ( &) | &' ≈ 0
!"“polarizes” entropies:
#
"
X Y X + Y Y
H(X+Y) > H(X) H(X) = H(Y) H(Y| X+Y) < H(Y)
H(X) H(X + Y ) H(Y |X + Y ) 1 t = 1 t = 0
X1 X2 X3 X4 !
"
!
"
W1 W2 W3 W4 Consider #$ iid B(p), for % ∈ (0, 1)
H(Xi) H(W1) H(W2|W1) 1 t = 1 t = 0
H(Xi) 1 t = 1 t = 0 t = 2
X1 X2 X3 X4 !
"
!
"
!
"
W1 W2 W3 W4 Z1 Z2 Consider #$ iid B(p), for % ∈ (0, 1)
X1 X2 X3 X4 !
"
!
"
!
"
!
"
W1 W2 W3 W4 Z1 Z2 Z3 Z4 Consider #$ iid B(p), for % ∈ (0, 1)
H(Xi) 1 t = 1 t = 0 t = 2
X1 X2 X3 X4 !
"
!
"
!
"
!
"
W1 W2 W3 W4 Z1 Z2 Z3 Z4 Consider #$ iid B(p), for % ∈ (0, 1) Consider , -$ -.$): Hope: most of these entropies eventually close to 0 or 1
H(Xi) 1 t = 1 t = 0 t = 2
Equivalent to: !"# ≝ !
" ⊗&
conditioned on wires above it: !" = $ %" & %"[< &])
because entropy conserved
X1 X2 X3 X4 1
2
1
2
1
2
1
2
W1 W2 W3 W4 Z1 Z2 Z3 Z4
H(Xi) 1 t = 1 t = 0 t = 2
We want fast convergence: To achieve !-close to entropy rate efficiently, ie with blocklength - = 20 = 1234(
6 7) , we need:
1/-; X1 X2 X3 X4 <
;
<
;
<
;
<
;
W1 W2 W3 W4 Z1 Z2 Z3 Z4
H(Xi) 1 t = 1 t = 0 t = 2
fast convergence: ``Local Polarization”
H(Xi) 1 t = 1 t = 0 t = 2
Recall, we want to show:
Properties of the Martingale:
and symmetrically for the upper end.
(easy to show these properties)
code of blocklength , = 2. = /012
3 4 has a set T of indices s.t:
< ;=<) ≈ 0
P E(/) E(/) … E(/) Set S: : ;C ≈ F Set 8: : ;D | ;C ≈ 0
21 /26
Theorem: For every distribution ! over ", $ , where " ∈ &', Let " = "), "*, … ", and $ = $
), $ *, … $ , where ("., $ .) ∼ ! 112
Then, entropies of 3 ≔ 5
,(") are polarized:
∀7: if 9 ≥ ;<=>
) ? , then all but 7-fraction of indices 1 ∈ [9]
have entropies B 3
. 3 C., $) ∉ (9EF, 1 − 9EF)
Inputs Auxiliary Info P ") "* … ", $
)
$
*
$
,
All B(3
. 3 C., $ ≈ 0 <K 1
7-fraction bad 3
)
neighboring symbols
independent, identical
!" !$ … !&
( blocks of (.
, conditioned on
independent across blocks
clear
!" !$ … !& P P P …
)-fract.
Output high- entropy set
!" is uniform bit
entropy
lower entropy, conditioned on P1
!# !$ … !% P1 P2 P3 … 0.9 0.1
B(0.9) B(0.1) B(0.9) B(0.1) B(0.5)
Entire set
Smaller set
Polar-decoder Black Box: Input:
Output:
Markov decoding: 1. Decompress P1 outputs 2. Compute distribution of P2 inputs, conditioned on P1 3. Decompress P2 outputs 4. …
!" !# … !$ P1 P2 P3 …
1 ? 1 1 ? ? 1 1 1
Note: Could have done this with any black-box compression scheme for independent, non- identically distributed symbols. But: non-linear (and messy)
fixed distribution on symbols ⇏ overall linear compression Polar codes are particularly suited for this
"# "$ … "% C C’ C’’ …