symbolwise map estimation for multiple trace insertion
play

Symbolwise MAP Estimation for Multiple-Trace - PowerPoint PPT Presentation

Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1


  1. Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1 / 25

  2. Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 2 / 25

  3. Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 3 / 25

  4. Background and objective Background symbolwise MAP estimation for multiple-trace channel application: DNA archival storage high durability due to the biochemical properties of DNA high capacity (e.g., 10 15 to 10 20 bytes per gram) prone to synchronization errors multiple-trace readout Objective symbol wise MAP estimation using m ( ≥ 2) traces channel: insertion/deletion/substitution (IDS) channel R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 4 / 25

  5. Related works: DNA storage major sequencing platforms [1]: Illumina, Sanger, Nanopore insertion/deletion error probabilities in DNA storage [3]: Illumina: around 10 − 3 Nanopore: around 10 − 2 channel model and information-theoretic bound for nanopore sequencer [4,5] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 5 / 25

  6. Related works: DNA storage model DNA storage model: coverage m for reliable reconstruction (in DNA storage): several tens to several hundreds [1] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 6 / 25

  7. Related works: IDS error correction coding example of IDS error correction code (single-trace decoding) single IDS error correction code [11] LDPC code + watermark [12] LDPC code + marker [13] spatially-coupled code [14] polar code: for deletion channel [15], for IDS channel [16] coding schemes for DNA storage (multiple-trace decoding) majority voting [6] Reed-Solomon code [7,8] DNA fountain architecture [9]: based on Luby transform code soft-decision decoding [17] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 7 / 25

  8. Related works: multiple-trace channel minimum number of traces for perfect reconstruction various types of channels including IDS channel [18] probabilistic IDS channel [19] symbolwise MAP estimation using m traces: calculate the posterior probability from a limited number of traces the calculated probability is used as soft input to outer error correcting code (e.g., LDPC code, polar code) MAP estimation for deletion channel [21,22] for IDS channel: this work deletion channel IDS channel perfect reconstruction [18,19] MAP estimation [21,22] (this work) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 8 / 25

  9. Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 9 / 25

  10. Channel model: outline error probabilities: p i (insertion), p d (deletion), p s (substitution) input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q output: z 1 ( z 1 z 1 z 1    1 , 2 , . . . , n 1 )  z 2 ( z 2 z 2 z 2 1 , 2 , . . . , n 2 )     Z =  =  .   .  . .     . .    z m ( z m z m z m 1 , 2 , . . . , n m ) z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q : k th trace with length n k at most one insertion per symbol (as in [13]) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 10 / 25

  11. Channel model: drift vector maximum drift value between input and output symbols: D D = {− D, . . . , − 1 , 0 , 1 , . . . , D } set of drift values: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 drift vector of k th output: determined according to Markov process (with d k 1 = 0 )  ( d k i +1 = d k i + 1 , d k p i i < D )    ( d k i +1 = d k i − 1 , d k p d i > − D )      ( d k i +1 = d k i , − D < d k 1 − p i − p d i < D )  p ( d k i +1 | d k i ) = . ( d k i +1 = d k i , d k 1 − p i i = − D )    ( d k i +1 = d k i , d k  1 − p d i = D )     0 ( otherwise )  R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 11 / 25

  12. Channel model: definition of multiple trace IDS channel channel input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q determine drift vector according to the Markov process: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 ( k ∈ [ m ]) drifted vector: y k = ( y k 1 , y k 2 , . . . , y k n k ) ∈ Z n k q ( j ∈ { j ′ | i + d k i ≤ j ′ ≤ i + d k y k j = x i i +1 } ) channel output ( k th trace): z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q { ( z k i = y k 1 − p s i ) p ( z k i | y k i ) = ( z k i ̸ = y k p s / ( q − 1) i ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 12 / 25

  13. Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 13 / 25

  14. Notations array of drift values:  d 1   d 1 d 1 d 1  . . . 1 2 n +1 d 2 d 2 d 2 d 2 . . .    1 2 n +1  D =  = . . . .     . . . .     . . . .    d m d m d m d m . . . 1 2 n +1 ∈ D m × ( n +1) [ ] = . . . d 1 d 2 d n +1 drift vector of k th trace z k k th row d k : i th column d i : drift values corresponding to i th input symbol x i i th segment of Z (for given D ): ( z 1 i , . . . , z 1  i +1 )  i + d 1 i + d 1 ( z 2 i , . . . , z 2 i +1 )   i + d 2 i + d 2 Z i + d i +1   = .   i + d i .  .    ( z m i , . . . , z m i +1 ) i + d m i + d m R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 14 / 25

  15. Derivation of factor graph (1/2) derive p ( x i | Z ) using factor graph of joint probability p ( Z , x , D ) : p ( Z , x , D ) = p ( Z | x , D ) p ( x , D ) = p ( Z | x , D ) p ( D ) p ( x ) n ( � ) ∏ Z i + d i +1 = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) , � i + d i i =1 where m { 1 ( d 1 = (0 , . . . , 0)) ∏ p ( d k p ( d 1 ) = 1 ) = 0 ( otherwise ) k =1 m ( � ) i + d k � ( ) Z i + d i +1 ∏ ( z k ) � x i , d k i , d k � p � x i , d i , d i +1 = p i +1 � � i +1 i + d i i + d k i k =1 m ∏ p ( d k i +1 | d k p ( d i +1 | d i ) = i ) k =1 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 15 / 25

  16. Derivation of factor graph (2/2) likelihood for k th trace = single-trace channel ( m = 1 )  ( d k i +1 = d k 1 i − 1)  �  ( ) ( ) i + d k  x i , z k ( d k i +1 = d k ( z k ) � x i , d k � i , d k f i ) i +1 p = � i + d k i +1 i + d k i i ( ) ( )  x i , z k x i , z k ( d k i +1 = d k f f i + 1)   i + d k i +1+ d k i i substitution error probability: { 1 − p s ( x = z ) f ( x, z ) = p s / ( q − 1) ( x ̸ = z ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 16 / 25

  17. Factor graph joint probability: n � ( ) ∏ Z i + d i +1 p ( Z , x , D ) = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) � i + d i i =1 factor graph: calculation of posterior probability p ( x i | Z ) : perform sum-product algorithm on the factor graph MAP estimation: ˜ x i = arg max p ( x i | Z ) . x i ∈ Z q R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 17 / 25

  18. Simple heuristic estimation computational complexity for the MAP estimation: O ( D 2 m ) impractical for large number of traces simple heuristic method based on the MAP estimation for m = 3 expressed by ternary tree: m ′ traces ( z 0 , z 1 , . . . ) leaf nodes: internal/root nodes: MAP estimation for m = 3 traces outputs estimation ˜ root node: x R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 18 / 25

  19. Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 19 / 25

  20. Simulation parameters block length: n = 152 number of traces: m ∈ { 3 , 4 , 11 } maximum drift value: D = 4 evaluated error rates: word error rate error rate by Levenshtein distance: summation of Levenshtein distance between x and ˜ x total number of estimated symbols ( x : original word, ˜ x : estimated word) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 20 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend