Symbolwise MAP Estimation for Multiple-Trace - - PowerPoint PPT Presentation

symbolwise map estimation for multiple trace insertion
SMART_READER_LITE
LIVE PREVIEW

Symbolwise MAP Estimation for Multiple-Trace - - PowerPoint PPT Presentation

Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1


slide-1
SLIDE 1

Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels

Ryo Sakogawa and Haruhiko Kaneko

Tokyo Institute of Technology

ISIT2020

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 1 / 25

slide-2
SLIDE 2

Outline

1

Background

2

Model of multiple-trace IDS channel

3

Symbol-wise MAP estimation for multiple-trace IDS channel

4

Simulation results

5

Conclusion

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 2 / 25

slide-3
SLIDE 3

Outline

1

Background

2

Model of multiple-trace IDS channel

3

Symbol-wise MAP estimation for multiple-trace IDS channel

4

Simulation results

5

Conclusion

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 3 / 25

slide-4
SLIDE 4

Background and objective

Background

symbolwise MAP estimation for multiple-trace channel application: DNA archival storage

high durability due to the biochemical properties of DNA high capacity (e.g., 1015 to 1020 bytes per gram) prone to synchronization errors multiple-trace readout

Objective

symbol wise MAP estimation using m (≥ 2) traces channel: insertion/deletion/substitution (IDS) channel

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 4 / 25

slide-5
SLIDE 5

Related works: DNA storage

major sequencing platforms [1]:

Illumina, Sanger, Nanopore

insertion/deletion error probabilities in DNA storage [3]:

Illumina: around 10−3 Nanopore: around 10−2

channel model and information-theoretic bound for nanopore sequencer [4,5]

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 5 / 25

slide-6
SLIDE 6

Related works: DNA storage model

DNA storage model: coverage m for reliable reconstruction (in DNA storage):

several tens to several hundreds [1]

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 6 / 25

slide-7
SLIDE 7

Related works: IDS error correction coding

example of IDS error correction code (single-trace decoding)

single IDS error correction code [11] LDPC code + watermark [12] LDPC code + marker [13] spatially-coupled code [14] polar code: for deletion channel [15], for IDS channel [16]

coding schemes for DNA storage (multiple-trace decoding)

majority voting [6] Reed-Solomon code [7,8] DNA fountain architecture [9]:

based on Luby transform code soft-decision decoding [17]

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 7 / 25

slide-8
SLIDE 8

Related works: multiple-trace channel

minimum number of traces for perfect reconstruction

various types of channels including IDS channel [18] probabilistic IDS channel [19]

symbolwise MAP estimation using m traces:

calculate the posterior probability from a limited number of traces the calculated probability is used as soft input to outer error correcting code (e.g., LDPC code, polar code) MAP estimation

for deletion channel [21,22] for IDS channel: this work

deletion channel IDS channel perfect reconstruction [18,19] MAP estimation [21,22] (this work)

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 8 / 25

slide-9
SLIDE 9

Outline

1

Background

2

Model of multiple-trace IDS channel

3

Symbol-wise MAP estimation for multiple-trace IDS channel

4

Simulation results

5

Conclusion

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 9 / 25

slide-10
SLIDE 10

Channel model: outline

error probabilities: pi (insertion), pd (deletion), ps (substitution) input: x = (x1, x2, . . . , xn) ∈ Zn

q

  • utput:

Z =      z1 z2 . . . zm      =      (z1

1,

z1

2,

. . . , z1

n1)

(z2

1,

z2

2,

. . . , z2

n2)

. . . (zm

1 ,

zm

2 ,

. . . , zm

nm)

     zk = (zk

1, zk 2, . . . , zk nk) ∈ Znk q : kth trace with length nk

at most one insertion per symbol (as in [13])

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 10 / 25

slide-11
SLIDE 11

Channel model: drift vector

maximum drift value between input and output symbols: D set of drift values: D = {−D, . . . , −1, 0, 1, . . . , D} drift vector of kth output: dk = (dk

1, dk 2, . . . , dk n, dk n+1) ∈ Dn+1

determined according to Markov process (with dk

1 = 0)

p(dk

i+1|dk i ) =

                   pi (dk

i+1 = dk i + 1, dk i < D)

pd (dk

i+1 = dk i − 1, dk i > −D)

1 − pi − pd (dk

i+1 = dk i , −D < dk i < D)

1 − pi (dk

i+1 = dk i , dk i = −D)

1 − pd (dk

i+1 = dk i , dk i = D)

(otherwise) .

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 11 / 25

slide-12
SLIDE 12

Channel model: definition of multiple trace IDS channel

channel input: x = (x1, x2, . . . , xn) ∈ Zn

q

determine drift vector according to the Markov process: dk = (dk

1, dk 2, . . . , dk n, dk n+1) ∈ Dn+1

(k ∈ [m]) drifted vector: yk = (yk

1, yk 2, . . . , yk nk) ∈ Znk q

yk

j = xi

(j ∈ {j′ | i + dk

i ≤ j′ ≤ i + dk i+1})

channel output (kth trace): zk = (zk

1, zk 2, . . . , zk nk) ∈ Znk q

p(zk

i |yk i ) =

{ 1 − ps (zk

i = yk i )

ps/(q − 1) (zk

i ̸= yk i )

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 12 / 25

slide-13
SLIDE 13

Outline

1

Background

2

Model of multiple-trace IDS channel

3

Symbol-wise MAP estimation for multiple-trace IDS channel

4

Simulation results

5

Conclusion

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 13 / 25

slide-14
SLIDE 14

Notations

array of drift values:

D =      d1 d2 . . . dm      =      d1

1

d1

2

. . . d1

n+1

d2

1

d2

2

. . . d2

n+1

. . . . . . . . . dm

1

dm

2

. . . dm

n+1

     = [ d1 d2 . . . dn+1 ] ∈ Dm×(n+1) kth row dk: drift vector of kth trace zk ith column di: drift values corresponding to ith input symbol xi

ith segment of Z (for given D):

Zi+di+1

i+di

=       (z1

i+d1

i , . . . , z1

i+d1

i+1)

(z2

i+d2

i , . . . , z2

i+d2

i+1)

. . . (zm

i+dm

i , . . . , zm

i+dm

i+1)

     

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 14 / 25

slide-15
SLIDE 15

Derivation of factor graph (1/2)

derive p(xi|Z) using factor graph of joint probability p(Z, x, D): p(Z, x, D) = p(Z|x, D)p(x, D) = p(Z|x, D)p(D)p(x) = p(d1)

n

i=1

p ( Zi+di+1

i+di

  • xi, di, di+1

) p(di+1|di)p(xi), where p(d1) =

m

k=1

p(dk

1) =

{ 1 (d1 = (0, . . . , 0)) (otherwise) p ( Zi+di+1

i+di

  • xi, di, di+1

) =

m

k=1

p ( (zk)

i+dk

i+1

i+dk

i

  • xi, dk

i , dk i+1

) p(di+1|di) =

m

k=1

p(dk

i+1|dk i )

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 15 / 25

slide-16
SLIDE 16

Derivation of factor graph (2/2)

likelihood for kth trace = single-trace channel (m = 1) p ( (zk)

i+dk

i+1

i+dk

i

  • xi, dk

i , dk i+1

) =        1 (dk

i+1 = dk i − 1)

f ( xi, zk

i+dk

i

) (dk

i+1 = dk i )

f ( xi, zk

i+dk

i

) f ( xi, zk

i+1+dk

i

) (dk

i+1 = dk i + 1)

substitution error probability: f(x, z) = { 1 − ps (x = z) ps/(q − 1) (x ̸= z)

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 16 / 25

slide-17
SLIDE 17

Factor graph

joint probability:

p(Z, x, D) = p(d1)

n

i=1

p ( Zi+di+1

i+di

  • xi, di, di+1

) p(di+1|di)p(xi)

factor graph: calculation of posterior probability p(xi|Z):

perform sum-product algorithm on the factor graph MAP estimation: ˜ xi = arg max

xi∈Zq

p(xi|Z).

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 17 / 25

slide-18
SLIDE 18

Simple heuristic estimation

computational complexity for the MAP estimation: O(D2m)

impractical for large number of traces

simple heuristic method based on the MAP estimation for m = 3 expressed by ternary tree:

leaf nodes: m′ traces (z0, z1, . . . ) internal/root nodes: MAP estimation for m = 3 traces root node:

  • utputs estimation ˜

x

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 18 / 25

slide-19
SLIDE 19

Outline

1

Background

2

Model of multiple-trace IDS channel

3

Symbol-wise MAP estimation for multiple-trace IDS channel

4

Simulation results

5

Conclusion

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 19 / 25

slide-20
SLIDE 20

Simulation parameters

block length: n = 152 number of traces: m ∈ {3, 4, 11} maximum drift value: D = 4 evaluated error rates:

word error rate error rate by Levenshtein distance: summation of Levenshtein distance between x and ˜ x total number of estimated symbols (x: original word, ˜ x: estimated word)

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 20 / 25

slide-21
SLIDE 21

Simulation results

pd = 5pi (from [17]), ps = 10−3 word error rate algorithm m pi = 1.0 × 10−3 pi = 5.0 × 10−3 MAP estimation 3 5.5 × 10−3 3.1 × 10−1 4 1.4 × 10−3 2.9 × 10−1 heuristic method 4 5.4 × 10−3 2.5 × 10−1 11 ∗ 2.7 × 10−2 ∗ no error for 2500 words error rate by Levenshtein distance algorithm m pi = 1.0 × 10−3 pi = 5.0 × 10−3 MAP estimation 3 5.6 × 10−5 4.2 × 10−3 4 1.4 × 10−5 3.9 × 10−3 heuristic method 4 5.9 × 10−5 3.3 × 10−3 11 ∗ 3.4 × 10−4 ∗ no error for 2500 words

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 21 / 25

slide-22
SLIDE 22

Word error rate1 (pd = 5 × pi, ps = 10−3)

10-4 10-3 10-2 10-1 100 1 2 3 4 5 6 7 8 9 10 word error rate pi (x10-3) m=3 m=4 m=4 (heuristic) m=5 (heuristic)

1this result is not included in proceedings

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 22 / 25

slide-23
SLIDE 23

Word error rate1 (pd = pi, ps = 0)

10-4 10-3 10-2 10-1 100 5 10 15 20 25 30 35 40 word error rate pi=pd (x10-3) m=3 m=4 m=4 (heuristic) m=5 (heuristic)

1this result is not included in proceedings

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 23 / 25

slide-24
SLIDE 24

Word error rate1 (pd = pi = 10−3)

10-3 10-2 10-1 100 1 2 3 4 5 6 word error rate ps (x10-2) m=3 m=4 m=4 (heuristic) m=5 (heuristic)

1this result is not included in proceedings

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 24 / 25

slide-25
SLIDE 25

Conclusion

Conclusion

symbol-wise MAP estimation for m-trace IDS channel insertion/deletion errors: expressed by a set m drift vectors derived factor graph of p(Z, x, D) sum-product algorithm on the factor graph decoded word error rate (pi = 3 × 10−3, pd = 5pi, ps = 10−3):

MAP estimation: 5.7 × 10−2 (m = 3), 3.5 × 10−3 (m = 4) heuristic method: 6.2 × 10−3 (m = 4), 6.6 × 10−4 (m = 5)

Future work

estimation for multiple-trace IDS channels in which error probabilities (pi, pd, ps) depends on input symbol xi input word has constraint (e.g., run-length and GC-balance)

  • R. Sakogawa, H. Kaneko (TokyoTech)

MAP Estimation for Multiple-Trace IDS ISIT2020 25 / 25