Trace reconstruction for deletion channels Yuval Peres Microsoft - - PowerPoint PPT Presentation

trace reconstruction for deletion channels
SMART_READER_LITE
LIVE PREVIEW

Trace reconstruction for deletion channels Yuval Peres Microsoft - - PowerPoint PPT Presentation

Trace reconstruction for deletion channels Yuval Peres Microsoft Research Based on joint work with Alex Zhai Fedor Nazarov Stanford University Kent State University December 24, 2017 Y. Peres (MSR) Trace reconstruction for deletion


slide-1
SLIDE 1

Trace reconstruction for deletion channels

Yuval Peres

Microsoft Research

Based on joint work with Fedor Nazarov

Kent State University

Alex Zhai

Stanford University

December 24, 2017

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 1 / 28

slide-2
SLIDE 2

Problem statement

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 2 / 28

slide-3
SLIDE 3

Deletion channel

Suppose Alice wants to send to Bob an n-bit string x = (x0, . . . , xn−1) ∈ {0, 1}n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 3 / 28

slide-4
SLIDE 4

Deletion channel

Suppose Alice wants to send to Bob an n-bit string x = (x0, . . . , xn−1) ∈ {0, 1}n. Alice transmits the bits one by one, but each bit has some probability q of being deleted.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 3 / 28

slide-5
SLIDE 5

Deletion channel

Suppose Alice wants to send to Bob an n-bit string x = (x0, . . . , xn−1) ∈ {0, 1}n. Alice transmits the bits one by one, but each bit has some probability q of being deleted. Bob doesn’t know which positions were deleted; all he sees is a shortened string (y0, y1, . . . , yℓ−1)

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 3 / 28

slide-6
SLIDE 6

Deletion channel

Suppose Alice wants to send to Bob an n-bit string x = (x0, . . . , xn−1) ∈ {0, 1}n. Alice transmits the bits one by one, but each bit has some probability q of being deleted. Bob doesn’t know which positions were deleted; all he sees is a shortened string (y0, y1, . . . , yℓ−1)

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 3 / 28

slide-7
SLIDE 7

Questions

Notation: Dq(x) denotes the distribution over strings that Bob receives after passing through deletion channel.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 4 / 28

slide-8
SLIDE 8

Questions

Notation: Dq(x) denotes the distribution over strings that Bob receives after passing through deletion channel. Given T i.i.d. samples (“traces”) y1, y2, . . . , yT with each yt ∼ Dq(x), can Bob reconstruct x (with probability 3/4, say) ?

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 4 / 28

slide-9
SLIDE 9

Questions

Notation: Dq(x) denotes the distribution over strings that Bob receives after passing through deletion channel. Given T i.i.d. samples (“traces”) y1, y2, . . . , yT with each yt ∼ Dq(x), can Bob reconstruct x (with probability 3/4, say) ? Closely related hypothesis testing problem: Given two strings x and x′, determine if samples came from Dq(x) or Dq(x′). If T traces suffice for this (with probability 3/4, say), then O(nT) traces suffice for reconstruction.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 4 / 28

slide-10
SLIDE 10

Questions

Notation: Dq(x) denotes the distribution over strings that Bob receives after passing through deletion channel. Given T i.i.d. samples (“traces”) y1, y2, . . . , yT with each yt ∼ Dq(x), can Bob reconstruct x (with probability 3/4, say) ? Closely related hypothesis testing problem: Given two strings x and x′, determine if samples came from Dq(x) or Dq(x′). If T traces suffice for this (with probability 3/4, say), then O(nT) traces suffice for reconstruction. Can ask for worst case x or for “average case” x (where x is chosen uniformly at random).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 4 / 28

slide-11
SLIDE 11

Questions

Notation: Dq(x) denotes the distribution over strings that Bob receives after passing through deletion channel. Given T i.i.d. samples (“traces”) y1, y2, . . . , yT with each yt ∼ Dq(x), can Bob reconstruct x (with probability 3/4, say) ? Closely related hypothesis testing problem: Given two strings x and x′, determine if samples came from Dq(x) or Dq(x′). If T traces suffice for this (with probability 3/4, say), then O(nT) traces suffice for reconstruction. Can ask for worst case x or for “average case” x (where x is chosen uniformly at random). Arises naturally in various contexts: sensor networks, DNA sequencing. Problem raised in this form by Batu, Kannan, Khanna and McGregor (2004), who proved a lower bound: For all n > 1 there exist strings x, x′ of n bits such that Ω(n) traces are needed to distinguish whether the input was x or x′.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 4 / 28

slide-12
SLIDE 12

Results

Observation: x and x′ are any two n-bit strings with different Hamming weights, then T = O(n) traces suffice to distinguish them, using Hamming weight of the output as test statistic. Previous upper bounds: eO(√n) in worst case and nO(1) in random case for q < 1/100 (Holenstein-Mitzenmacher-Panigrahy-Wieder ’08).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 5 / 28

slide-13
SLIDE 13

Results

Observation: x and x′ are any two n-bit strings with different Hamming weights, then T = O(n) traces suffice to distinguish them, using Hamming weight of the output as test statistic. Previous upper bounds: eO(√n) in worst case and nO(1) in random case for q < 1/100 (Holenstein-Mitzenmacher-Panigrahy-Wieder ’08). (Nazarov-P., STOC 2017): For worst case x, we can reconstruct using eO(n1/3) traces. Moreover, this is optimal for linear (mean-based) tests. Same result obtained simultaneously and independently by De, O’Donnell and Servedio (STOC 2017).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 5 / 28

slide-14
SLIDE 14

Results

Observation: x and x′ are any two n-bit strings with different Hamming weights, then T = O(n) traces suffice to distinguish them, using Hamming weight of the output as test statistic. Previous upper bounds: eO(√n) in worst case and nO(1) in random case for q < 1/100 (Holenstein-Mitzenmacher-Panigrahy-Wieder ’08). (Nazarov-P., STOC 2017): For worst case x, we can reconstruct using eO(n1/3) traces. Moreover, this is optimal for linear (mean-based) tests. Same result obtained simultaneously and independently by De, O’Donnell and Servedio (STOC 2017). New result (P.-Zhai, FOCS 2017): For q < 1/2, we can reconstruct a uniform random input x with probability 1 − o(1) using T = eC√log n = no(1) traces.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 5 / 28

slide-15
SLIDE 15

Results

Observation: x and x′ are any two n-bit strings with different Hamming weights, then T = O(n) traces suffice to distinguish them, using Hamming weight of the output as test statistic. Previous upper bounds: eO(√n) in worst case and nO(1) in random case for q < 1/100 (Holenstein-Mitzenmacher-Panigrahy-Wieder ’08). (Nazarov-P., STOC 2017): For worst case x, we can reconstruct using eO(n1/3) traces. Moreover, this is optimal for linear (mean-based) tests. Same result obtained simultaneously and independently by De, O’Donnell and Servedio (STOC 2017). New result (P.-Zhai, FOCS 2017): For q < 1/2, we can reconstruct a uniform random input x with probability 1 − o(1) using T = eC√log n = no(1) traces.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 5 / 28

slide-16
SLIDE 16

Lower bounds

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 6 / 28

slide-17
SLIDE 17

Lower bounds

For worst case, consider x = 0000 · · · 000

  • n/2 zeroes

111 · · · 1111

  • n/2 ones

x′ = 0000 · · · 0000

  • n/2 + 1 zeroes

11 · · · 1111

  • n/2 − 1 ones

.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 6 / 28

slide-18
SLIDE 18

Lower bounds

For worst case, consider x = 0000 · · · 000

  • n/2 zeroes

111 · · · 1111

  • n/2 ones

x′ = 0000 · · · 0000

  • n/2 + 1 zeroes

11 · · · 1111

  • n/2 − 1 ones

. Trace reconstruction is basically equivalent to distinguishing Binom(n/2, p) and Binom(n/2 + 1, p) = ⇒ need Ω(n) traces.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 6 / 28

slide-19
SLIDE 19

Lower bounds

For worst case, consider x = 0000 · · · 000

  • n/2 zeroes

111 · · · 1111

  • n/2 ones

x′ = 0000 · · · 0000

  • n/2 + 1 zeroes

11 · · · 1111

  • n/2 − 1 ones

. Trace reconstruction is basically equivalent to distinguishing Binom(n/2, p) and Binom(n/2 + 1, p) = ⇒ need Ω(n) traces. For random case, need at least Ω(log2 n) traces (McGregor-Price-Vorotnikova ’14).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 6 / 28

slide-20
SLIDE 20

Reconstruction with bit statistics

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 7 / 28

slide-21
SLIDE 21

Bit statistics: the first bit

For simplicity, take q = 1/2 (general case is similar).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 8 / 28

slide-22
SLIDE 22

Bit statistics: the first bit

For simplicity, take q = 1/2 (general case is similar). Natural first attempt: suppose y ∼ Dq(x) and y′ ∼ Dq(x′). Does first bit of y look different from first bit of y′?

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 8 / 28

slide-23
SLIDE 23

Bit statistics: the first bit

For simplicity, take q = 1/2 (general case is similar). Natural first attempt: suppose y ∼ Dq(x) and y′ ∼ Dq(x′). Does first bit of y look different from first bit of y′? Ey0 = 1 2x0 + 1 4x1 + 1 8x2 + · · · Ey′

0 = 1

2x′

0 + 1

4x′

1 + 1

8x′

2 + · · ·

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 8 / 28

slide-24
SLIDE 24

Bit statistics: the first bit

For simplicity, take q = 1/2 (general case is similar). Natural first attempt: suppose y ∼ Dq(x) and y′ ∼ Dq(x′). Does first bit of y look different from first bit of y′? Ey0 = 1 2x0 + 1 4x1 + 1 8x2 + · · · Ey′

0 = 1

2x′

0 + 1

4x′

1 + 1

8x′

2 + · · ·

If x and x′ agree in first k digits, then |Ey0 − Ey′

0| is only ≈ 2−k.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 8 / 28

slide-25
SLIDE 25

Bit statistics: the first bit

For simplicity, take q = 1/2 (general case is similar). Natural first attempt: suppose y ∼ Dq(x) and y′ ∼ Dq(x′). Does first bit of y look different from first bit of y′? Ey0 = 1 2x0 + 1 4x1 + 1 8x2 + · · · Ey′

0 = 1

2x′

0 + 1

4x′

1 + 1

8x′

2 + · · ·

If x and x′ agree in first k digits, then |Ey0 − Ey′

0| is only ≈ 2−k.

Exponentially many samples needed: Requires at least 2k traces to distinguish.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 8 / 28

slide-26
SLIDE 26

The key identity

We can try other output bits yj besides y0. For yj to come from xk, this bit and exactly j bits among x0, . . . , xk−1 should be retained, so

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 9 / 28

slide-27
SLIDE 27

The key identity

We can try other output bits yj besides y0. For yj to come from xk, this bit and exactly j bits among x0, . . . , xk−1 should be retained, so Eyj = 1 2

  • k≥j

1 2k k j

  • xk .
  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 9 / 28

slide-28
SLIDE 28

The key identity

We can try other output bits yj besides y0. For yj to come from xk, this bit and exactly j bits among x0, . . . , xk−1 should be retained, so Eyj = 1 2

  • k≥j

1 2k k j

  • xk .

Formula for Eyj is best summarized by a generating function identity: E  

n−1

  • j=0

yjwj   = 1 2

n−1

  • k=0

xk w + 1 2 k .

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 9 / 28

slide-29
SLIDE 29

Using the key identity

Ψy(w) := E  

n−1

  • j=0

yjwj   = 1 2

n−1

  • k=0

xk w + 1 2 k . Goal: find small w so that Ψy(w) and Ψy′(w) differ substantially.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 10 / 28

slide-30
SLIDE 30

Using the key identity

Ψy(w) := E  

n−1

  • j=0

yjwj   = 1 2

n−1

  • k=0

xk w + 1 2 k . Goal: find small w so that Ψy(w) and Ψy′(w) differ substantially. Letting z = w+1

2 , we have

Ψy(w) − Ψy′(w) = 1 2

n−1

  • k=0

(xk − x′

k)zk .

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 10 / 28

slide-31
SLIDE 31

Using the key identity

Ψy(w) := E  

n−1

  • j=0

yjwj   = 1 2

n−1

  • k=0

xk w + 1 2 k . Goal: find small w so that Ψy(w) and Ψy′(w) differ substantially. Letting z = w+1

2 , we have

Ψy(w) − Ψy′(w) = 1 2

n−1

  • k=0

(xk − x′

k)zk .

Suffices to find small z so that RHS of above expression is large.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 10 / 28

slide-32
SLIDE 32

The maximum of a polynomial on a small arc

Theorem (Borwein-Erd´ elyi) Let f (z) =

n−1

  • k=0

akzk be a polynomial with coefficients a0 = 1 and |ak| ≤ 1. For any arc of length 1/L on the unit circle, there is a point z on the arc such that |f (z)| ≥ e−cL, where c is a universal constant.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 11 / 28

slide-33
SLIDE 33

The maximum of a polynomial on a small arc

Theorem (Borwein-Erd´ elyi) Let f (z) =

n−1

  • k=0

akzk be a polynomial with coefficients a0 = 1 and |ak| ≤ 1. For any arc of length 1/L on the unit circle, there is a point z on the arc such that |f (z)| ≥ e−cL, where c is a universal constant. Apply to f (z) =

n−1

  • j=0

(xj − x′

j)zj,

dividing out by a power of z if needed.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 11 / 28

slide-34
SLIDE 34

The maximum of a polynomial on a small arc

Theorem (Borwein-Erd´ elyi) Let f (z) =

n−1

  • k=0

akzk be a polynomial with coefficients a0 = 1 and |ak| ≤ 1. For any arc of length 1/L on the unit circle, there is a point z on the arc such that |f (z)| ≥ e−cL, where c is a universal constant. Apply to f (z) =

n−1

  • j=0

(xj − x′

j)zj,

dividing out by a power of z if needed. Can find z in given arc so that |Ψy(w) − Ψy′(w)| ≥ e−cL .

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 11 / 28

slide-35
SLIDE 35

How to make w small? Choose z near 1.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 12 / 28

slide-36
SLIDE 36

How to make w small? Choose z near 1. If z = eiθ, then |w| = 1 + O(θ2).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 12 / 28

slide-37
SLIDE 37

How to make w small? Choose z near 1. If z = eiθ, then |w| = 1 + O(θ2). With θ = O(1/L), we obtain |w| = 1 + O(1/L2).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 12 / 28

slide-38
SLIDE 38

Using the key identity (cont’d)

Conclusion:

  • n−1
  • j=0
  • Eyj − Ey′

j

  • wj
  • ≥ e−cL

where |w| = 1 + O(1/L2) = ⇒ |wj| ≤ eCn/L2. We may assume C > c.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 13 / 28

slide-39
SLIDE 39

Using the key identity (cont’d)

Conclusion:

  • n−1
  • j=0
  • Eyj − Ey′

j

  • wj
  • ≥ e−cL

where |w| = 1 + O(1/L2) = ⇒ |wj| ≤ eCn/L2. We may assume C > c. Thus there is some j such that |Eyj − Ey′

j | ≥ 1

ne−CL−Cn/L2 ≥ e−3Cn1/3 =: ǫ. (taking L = n1/3 to minimize L + n/L2 and absorbing 1/n term)

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 13 / 28

slide-40
SLIDE 40

Using the key identity (cont’d)

Conclusion:

  • n−1
  • j=0
  • Eyj − Ey′

j

  • wj
  • ≥ e−cL

where |w| = 1 + O(1/L2) = ⇒ |wj| ≤ eCn/L2. We may assume C > c. Thus there is some j such that |Eyj − Ey′

j | ≥ 1

ne−CL−Cn/L2 ≥ e−3Cn1/3 =: ǫ. (taking L = n1/3 to minimize L + n/L2 and absorbing 1/n term) = ⇒ T = e7cn1/3 samples suffice to detect the difference in means: Probability of choosing wrongly between x and x′ is e−Ω(Tǫ2) which is much smaller than 2−n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 13 / 28

slide-41
SLIDE 41

Reducing complexity

To avoid enumerating over all 2n possible input strings, one can use linear programming, following Holenstein et al (2008). Suppose that x0, . . . , xm−1 have been reconstructed and we wish to determine xm. Write yj for the empirical average of the output bits 1

T

T

t=1 yt j . Let L := n1/3

and consider two linear programs (one where xm = 0 and one where xm = 1) in the relaxed variables xm+1, . . . , xn in [0, 1]: |E(yj) − yj| < e−cL where E(yj) = 1 2

  • k≥j

1 2k k j

  • xk .

Only one of these programs (either the LP determined by xm = 0 or by xm = 1) will be feasible if C is large enough and T = e7cn1/3.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 14 / 28

slide-42
SLIDE 42

Borwein-Erd´ elyi theorem: sketch of proof

Take Γ to be a curve overlapping with the unit circle in an arc of length 1/L, as shown.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 15 / 28

slide-43
SLIDE 43

Borwein-Erd´ elyi theorem: sketch of proof

Take Γ to be a curve overlapping with the unit circle in an arc of length 1/L, as shown. Since f is analytic, log |f (x)| is

  • subharmonic. Thus,

0 = log |f (0)| ≤

  • z∈Γ

log |f (z)| dω(z).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 15 / 28

slide-44
SLIDE 44

Borwein-Erd´ elyi theorem: sketch of proof

Rearranging yields

  • z blue

log |f (z)| dω(z) ≥ −

  • z green

log |f (z)| dω(z).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 16 / 28

slide-45
SLIDE 45

Borwein-Erd´ elyi theorem: sketch of proof

Rearranging yields

  • z blue

log |f (z)| dω(z) ≥ −

  • z green

log |f (z)| dω(z). For |z| < 1, we have |f (z)| ≤

  • j=0

|z|j = 1 1 − |z|.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 16 / 28

slide-46
SLIDE 46

Borwein-Erd´ elyi theorem: sketch of proof

Rearranging yields

  • z blue

log |f (z)| dω(z) ≥ −

  • z green

log |f (z)| dω(z). For |z| < 1, we have |f (z)| ≤

  • j=0

|z|j = 1 1 − |z|. Can show that this implies green part is O(1).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 16 / 28

slide-47
SLIDE 47

Borwein-Erd´ elyi theorem: sketch of proof

Rearranging yields

  • z blue

log |f (z)| dω(z) ≥ −

  • z green

log |f (z)| dω(z). For |z| < 1, we have |f (z)| ≤

  • j=0

|z|j = 1 1 − |z|. Can show that this implies green part is O(1). This means log |f (z)| must be at least e−O(L) somewhere on blue part, or else the integral over blue part is too negative.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 16 / 28

slide-48
SLIDE 48

The Borwein-Erdelyi theorem is sharp. As shown in [NP] and [DOS], this implies that for some c > 0 and all n large enough, there exist input strings x, x′ of length n such that the corresponding outputs satisfy |Eyj − Ey′

j | < e−cn1/3 for all j. Thus if T = eo(n1/3), then we cannot

distinguish between x, x′ by a linear test. However, the existence of such a pair x, x′ is proved via a pigeonhole argument, and we are unable to produce them explicitly.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 17 / 28

slide-49
SLIDE 49

Reconstruction of random strings

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 18 / 28

slide-50
SLIDE 50

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-51
SLIDE 51

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-52
SLIDE 52

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

Greedy matching: try to fit y as a subsequence of x, gets within log n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-53
SLIDE 53

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

Greedy matching: try to fit y as a subsequence of x, gets within log n. Aligning subsequences: analyze subsequences more carefully to align within log1/2 n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-54
SLIDE 54

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

Greedy matching: try to fit y as a subsequence of x, gets within log n. Aligning subsequences: analyze subsequences more carefully to align within log1/2 n.

Use bit statistics as before to reconstruct next several bits.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-55
SLIDE 55

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

Greedy matching: try to fit y as a subsequence of x, gets within log n. Aligning subsequences: analyze subsequences more carefully to align within log1/2 n.

Use bit statistics as before to reconstruct next several bits. However, alignment is not exact! But approach can be modified to tolerate random shifts.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-56
SLIDE 56

Overview of strategy

From now on, fix q < 1/2 and write p = 1 − q > 1/2. Given a trace y, figure out roughly which position in y corresponds to last reconstructed position so far. Two steps:

Greedy matching: try to fit y as a subsequence of x, gets within log n. Aligning subsequences: analyze subsequences more carefully to align within log1/2 n.

Use bit statistics as before to reconstruct next several bits. However, alignment is not exact! But approach can be modified to tolerate random shifts. Can only tolerate a small amount of shifting, hence need to align accurately.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 19 / 28

slide-57
SLIDE 57

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-58
SLIDE 58

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-59
SLIDE 59

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-60
SLIDE 60

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-61
SLIDE 61

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-62
SLIDE 62

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-63
SLIDE 63

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-64
SLIDE 64

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-65
SLIDE 65

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-66
SLIDE 66

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-67
SLIDE 67

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-68
SLIDE 68

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-69
SLIDE 69

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-70
SLIDE 70

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-71
SLIDE 71

Greedy matching

Suppose we see both the input x and the output y. We still don’t know which bits came from where. Nevertheless, we can try to fit y as a subsequence of x. Simple approach: just map bits in y “greedily” to the first possible match in x.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 20 / 28

slide-72
SLIDE 72

Greedy matching

The “true location” (gray arrows) advances like a geometric with mean 1/p.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 21 / 28

slide-73
SLIDE 73

Greedy matching

The “true location” (gray arrows) advances like a geometric with mean 1/p. The location given by greedy algorithm (red arrows) advances like a geometric with mean 2 > 1/p, capped at hitting the true location.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 21 / 28

slide-74
SLIDE 74

Greedy matching

The “true location” (gray arrows) advances like a geometric with mean 1/p. The location given by greedy algorithm (red arrows) advances like a geometric with mean 2 > 1/p, capped at hitting the true location. Gap between true and greedy location is like a random walk biased towards zero

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 21 / 28

slide-75
SLIDE 75

Greedy matching

The “true location” (gray arrows) advances like a geometric with mean 1/p. The location given by greedy algorithm (red arrows) advances like a geometric with mean 2 > 1/p, capped at hitting the true location. Gap between true and greedy location is like a random walk biased towards zero = ⇒ stays O(log n) over the course of the length-n string.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 21 / 28

slide-76
SLIDE 76

Aligning by subsequences

To get subpolynomial, need to align more precisely than log n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 22 / 28

slide-77
SLIDE 77

Aligning by subsequences

To get subpolynomial, need to align more precisely than log n. Consider a block of length log n and focus on the middle a := log1/2(n) bits.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 22 / 28

slide-78
SLIDE 78

Aligning by subsequences

To get subpolynomial, need to align more precisely than log n. Consider a block of length log n and focus on the middle a := log1/2(n) bits. After deletion channel, it becomes a subsequence of length ≈ pa.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 22 / 28

slide-79
SLIDE 79

Aligning by subsequences

To get subpolynomial, need to align more precisely than log n. Consider a block of length log n and focus on the middle a := log1/2(n) bits. After deletion channel, it becomes a subsequence of length ≈ pa. But could this subsequence come from elsewhere (bad event) ?

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 22 / 28

slide-80
SLIDE 80

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-81
SLIDE 81

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa. Bad event covered by two unlikely events (of probability ≈ e−const·a)

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-82
SLIDE 82

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa. Bad event covered by two unlikely events (of probability ≈ e−const·a)

  • 1. Only pa bits are retained from a block of length > b.
  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-83
SLIDE 83

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa. Bad event covered by two unlikely events (of probability ≈ e−const·a)

  • 1. Only pa bits are retained from a block of length > b.
  • 2. A random string of length < b has a specific length pa string as a

substring.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-84
SLIDE 84

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa. Bad event covered by two unlikely events (of probability ≈ e−const·a)

  • 1. Only pa bits are retained from a block of length > b.
  • 2. A random string of length < b has a specific length pa string as a

substring.

#1 only depends on randomness of deletion, not input

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-85
SLIDE 85

Aligning by subsequences

Pick b such that (1 + ǫ)a < b < (2 − ǫ)pa. Bad event covered by two unlikely events (of probability ≈ e−const·a)

  • 1. Only pa bits are retained from a block of length > b.
  • 2. A random string of length < b has a specific length pa string as a

substring.

#1 only depends on randomness of deletion, not input #2 only depends on randomness of input, not deletion

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 23 / 28

slide-86
SLIDE 86

Aligning by subsequences

By “most” we will mean all but e−const·a.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 24 / 28

slide-87
SLIDE 87

Aligning by subsequences

By “most” we will mean all but e−const·a. We say an input is good if most length-pa subsequences of its middle a bits cannot be found elsewhere as subsequences of blocks of length b.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 24 / 28

slide-88
SLIDE 88

Aligning by subsequences

By “most” we will mean all but e−const·a. We say an input is good if most length-pa subsequences of its middle a bits cannot be found elsewhere as subsequences of blocks of length b. For good input, we can align to the middle a bits by finding a subsequence of length pa.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 24 / 28

slide-89
SLIDE 89

Aligning by subsequences

By “most” we will mean all but e−const·a. We say an input is good if most length-pa subsequences of its middle a bits cannot be found elsewhere as subsequences of blocks of length b. For good input, we can align to the middle a bits by finding a subsequence of length pa. Most inputs are good.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 24 / 28

slide-90
SLIDE 90

Putting it all together

Greedy matching can align to within log n.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 25 / 28

slide-91
SLIDE 91

Putting it all together

Greedy matching can align to within log n. In a typical random block of length log n, can align to within log1/2 n. But this fails in a fraction e−const·log1/2 n ≫ 1

n of blocks.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 25 / 28

slide-92
SLIDE 92

Putting it all together

Greedy matching can align to within log n. In a typical random block of length log n, can align to within log1/2 n. But this fails in a fraction e−const·log1/2 n ≫ 1

n of blocks.

Not all blocks will be good, but among log1/2 n consecutive blocks there will (most likely) be a good one.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 25 / 28

slide-93
SLIDE 93

Putting it all together

Recall: using bit statistics we can recover m bits using eO(m1/3) traces.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 26 / 28

slide-94
SLIDE 94

Putting it all together

Recall: using bit statistics we can recover m bits using eO(m1/3) traces. Modification of proof allows us to tolerate random shifts by O(m1/3).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 26 / 28

slide-95
SLIDE 95

Putting it all together

Recall: using bit statistics we can recover m bits using eO(m1/3) traces. Modification of proof allows us to tolerate random shifts by O(m1/3). Can align to within log1/2 n and want to reconstruct log3/2 n bits ahead.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 26 / 28

slide-96
SLIDE 96

Putting it all together

Recall: using bit statistics we can recover m bits using eO(m1/3) traces. Modification of proof allows us to tolerate random shifts by O(m1/3). Can align to within log1/2 n and want to reconstruct log3/2 n bits ahead. Number of traces used is eO(log1/2 n) = no(1).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 26 / 28

slide-97
SLIDE 97

Random strings and arbitrary deletion probability

Holden-Pemantle-Peres’17: For arbitrary deletion probability q ∈ [0, 1) we can reconstruct random strings with eO(log1/3 n) = no(1) traces. We also allow insertions and substitutions. Further improvement for random strings cannot be obtained without an improvement for worst-case strings.

Nina Holden

MIT

Robin Pemantle

University of Pennsylvania

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 27 / 28

slide-98
SLIDE 98

Alignment with error log(n)

? unknown bits w reconstructed bits

  • w

p log5/3 n log5/3 n

Was w likely obtained by sending w through the deletion channel?

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 28 / 28

slide-99
SLIDE 99

Alignment with error log(n)

? unknown bits w reconstructed bits

  • w

p log5/3 n log5/3 n w

  • w

p log2/3 n log2/3 n

Was w likely obtained by sending w through the deletion channel? Divide w and w into log n blocks. Let S be the number of corresponding blocks in w and w with the same majority bit. Answer YES if S > (1/2 + c) log n; answer NO otherwise.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 28 / 28

slide-100
SLIDE 100

Alignment with error log(n)

? unknown bits w reconstructed bits

  • w

p log5/3 n log5/3 n w

  • w

p log2/3 n log2/3 n

Was w likely obtained by sending w through the deletion channel? Divide w and w into log n blocks. Let S be the number of corresponding blocks in w and w with the same majority bit. Answer YES if S > (1/2 + c) log n; answer NO otherwise. Repeat with all strings w of appropriate length.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 28 / 28

slide-101
SLIDE 101

Alignment with error log(n)

? unknown bits w reconstructed bits

  • w

p log5/3 n log5/3 n w

  • w

O(log n)

Was w likely obtained by sending w through the deletion channel? Divide w and w into log n blocks. Let S be the number of corresponding blocks in w and w with the same majority bit. Answer YES if S > (1/2 + c) log n; answer NO otherwise. Repeat with all strings w of appropriate length. Alignment error O(log n) with probability 1 − exp(−Ω(log1/3 n)).

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 28 / 28

slide-102
SLIDE 102

Alignment with error log(n)

? unknown bits w reconstructed bits

  • w

p log5/3 n log5/3 n w

  • w

O(log n)

Was w likely obtained by sending w through the deletion channel? Divide w and w into log n blocks. Let S be the number of corresponding blocks in w and w with the same majority bit. Answer YES if S > (1/2 + c) log n; answer NO otherwise. Repeat with all strings w of appropriate length. Alignment error O(log n) with probability 1 − exp(−Ω(log1/3 n)). Alignment error improved with second refined alignment step.

  • Y. Peres (MSR)

Trace reconstruction for deletion channels December 24, 2017 28 / 28