Symbolwise MAP Estimation for Multiple-Trace - PowerPoint PPT Presentation

Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1 / 25

Outline Background 1 Model of multiple-trace IDS channel 2 Symbol-wise MAP estimation for multiple-trace IDS channel 3 Simulation results 4 Conclusion 5 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 2 / 25

Background and objective Background symbolwise MAP estimation for multiple-trace channel application: DNA archival storage high durability due to the biochemical properties of DNA high capacity (e.g., 10 15 to 10 20 bytes per gram) prone to synchronization errors multiple-trace readout Objective symbol wise MAP estimation using m ( ≥ 2) traces channel: insertion/deletion/substitution (IDS) channel R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 4 / 25

Related works: DNA storage major sequencing platforms [1]: Illumina, Sanger, Nanopore insertion/deletion error probabilities in DNA storage [3]: Illumina: around 10 − 3 Nanopore: around 10 − 2 channel model and information-theoretic bound for nanopore sequencer [4,5] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 5 / 25

Related works: DNA storage model DNA storage model: coverage m for reliable reconstruction (in DNA storage): several tens to several hundreds [1] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 6 / 25

Related works: IDS error correction coding example of IDS error correction code (single-trace decoding) single IDS error correction code [11] LDPC code + watermark [12] LDPC code + marker [13] spatially-coupled code [14] polar code: for deletion channel [15], for IDS channel [16] coding schemes for DNA storage (multiple-trace decoding) majority voting [6] Reed-Solomon code [7,8] DNA fountain architecture [9]: based on Luby transform code soft-decision decoding [17] R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 7 / 25

Related works: multiple-trace channel minimum number of traces for perfect reconstruction various types of channels including IDS channel [18] probabilistic IDS channel [19] symbolwise MAP estimation using m traces: calculate the posterior probability from a limited number of traces the calculated probability is used as soft input to outer error correcting code (e.g., LDPC code, polar code) MAP estimation for deletion channel [21,22] for IDS channel: this work deletion channel IDS channel perfect reconstruction [18,19] MAP estimation [21,22] (this work) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 8 / 25

Channel model: outline error probabilities: p i (insertion), p d (deletion), p s (substitution) input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q output: z 1 ( z 1 z 1 z 1    1 , 2 , . . . , n 1 )  z 2 ( z 2 z 2 z 2 1 , 2 , . . . , n 2 )     Z =  =  .   .  . .     . .    z m ( z m z m z m 1 , 2 , . . . , n m ) z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q : k th trace with length n k at most one insertion per symbol (as in [13]) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 10 / 25

Channel model: drift vector maximum drift value between input and output symbols: D D = {− D, . . . , − 1 , 0 , 1 , . . . , D } set of drift values: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 drift vector of k th output: determined according to Markov process (with d k 1 = 0 )  ( d k i +1 = d k i + 1 , d k p i i < D )    ( d k i +1 = d k i − 1 , d k p d i > − D )      ( d k i +1 = d k i , − D < d k 1 − p i − p d i < D )  p ( d k i +1 | d k i ) = . ( d k i +1 = d k i , d k 1 − p i i = − D )    ( d k i +1 = d k i , d k  1 − p d i = D )     0 ( otherwise )  R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 11 / 25

Channel model: definition of multiple trace IDS channel channel input: x = ( x 1 , x 2 , . . . , x n ) ∈ Z n q determine drift vector according to the Markov process: d k = ( d k 1 , d k 2 , . . . , d k n , d k n +1 ) ∈ D n +1 ( k ∈ [ m ]) drifted vector: y k = ( y k 1 , y k 2 , . . . , y k n k ) ∈ Z n k q ( j ∈ { j ′ | i + d k i ≤ j ′ ≤ i + d k y k j = x i i +1 } ) channel output ( k th trace): z k = ( z k 1 , z k 2 , . . . , z k n k ) ∈ Z n k q { ( z k i = y k 1 − p s i ) p ( z k i | y k i ) = ( z k i ̸ = y k p s / ( q − 1) i ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 12 / 25

Notations array of drift values:  d 1   d 1 d 1 d 1  . . . 1 2 n +1 d 2 d 2 d 2 d 2 . . .    1 2 n +1  D =  = . . . .     . . . .     . . . .    d m d m d m d m . . . 1 2 n +1 ∈ D m × ( n +1) [ ] = . . . d 1 d 2 d n +1 drift vector of k th trace z k k th row d k : i th column d i : drift values corresponding to i th input symbol x i i th segment of Z (for given D ): ( z 1 i , . . . , z 1  i +1 )  i + d 1 i + d 1 ( z 2 i , . . . , z 2 i +1 )   i + d 2 i + d 2 Z i + d i +1   = .   i + d i .  .    ( z m i , . . . , z m i +1 ) i + d m i + d m R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 14 / 25

Derivation of factor graph (1/2) derive p ( x i | Z ) using factor graph of joint probability p ( Z , x , D ) : p ( Z , x , D ) = p ( Z | x , D ) p ( x , D ) = p ( Z | x , D ) p ( D ) p ( x ) n ( � ) ∏ Z i + d i +1 = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) , � i + d i i =1 where m { 1 ( d 1 = (0 , . . . , 0)) ∏ p ( d k p ( d 1 ) = 1 ) = 0 ( otherwise ) k =1 m ( � ) i + d k � ( ) Z i + d i +1 ∏ ( z k ) � x i , d k i , d k � p � x i , d i , d i +1 = p i +1 � � i +1 i + d i i + d k i k =1 m ∏ p ( d k i +1 | d k p ( d i +1 | d i ) = i ) k =1 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 15 / 25

Derivation of factor graph (2/2) likelihood for k th trace = single-trace channel ( m = 1 )  ( d k i +1 = d k 1 i − 1)  �  ( ) ( ) i + d k  x i , z k ( d k i +1 = d k ( z k ) � x i , d k � i , d k f i ) i +1 p = � i + d k i +1 i + d k i i ( ) ( )  x i , z k x i , z k ( d k i +1 = d k f f i + 1)   i + d k i +1+ d k i i substitution error probability: { 1 − p s ( x = z ) f ( x, z ) = p s / ( q − 1) ( x ̸ = z ) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 16 / 25

Factor graph joint probability: n � ( ) ∏ Z i + d i +1 p ( Z , x , D ) = p ( d 1 ) p � x i , d i , d i +1 p ( d i +1 | d i ) p ( x i ) � i + d i i =1 factor graph: calculation of posterior probability p ( x i | Z ) : perform sum-product algorithm on the factor graph MAP estimation: ˜ x i = arg max p ( x i | Z ) . x i ∈ Z q R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 17 / 25

Simple heuristic estimation computational complexity for the MAP estimation: O ( D 2 m ) impractical for large number of traces simple heuristic method based on the MAP estimation for m = 3 expressed by ternary tree: m ′ traces ( z 0 , z 1 , . . . ) leaf nodes: internal/root nodes: MAP estimation for m = 3 traces outputs estimation ˜ root node: x R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 18 / 25

Simulation parameters block length: n = 152 number of traces: m ∈ { 3 , 4 , 11 } maximum drift value: D = 4 evaluated error rates: word error rate error rate by Levenshtein distance: summation of Levenshtein distance between x and ˜ x total number of estimated symbols ( x : original word, ˜ x : estimated word) R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 20 / 25

Symbolwise MAP Estimation for Multiple-Trace - PowerPoint PPT Presentation

Symbolwise MAP Estimation for Multiple-Trace Insertion/Deletion/Substitution Channels Ryo Sakogawa and Haruhiko Kaneko Tokyo Institute of Technology ISIT2020 R. Sakogawa, H. Kaneko (TokyoTech) MAP Estimation for Multiple-Trace IDS ISIT2020 1

Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort

Insertion Sort Insertion Sort next card? What assumptions do we make at each CSE 680 step?

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Our Hobbies 1B Cindy Chan Trace Chan Yuki Lo All: Good morning ,everybody. Cindy: I am Cindy

Trace Elements in igneous petrology Abundances of trace elements are used to test petrogenetic

Trace and center of the twisted Heisenberg category Michael Reeks June 4, 2018 Michael Reeks

Assessing the Performance of MPI Applications Through Time-Independent Trace Replay . Desprez 1

DIV 26000 AND HEAT TRACE FOR MECHANICAL SYSTEMS ACE/ASM DOS AND DONTS OF HEAT TRACE IN

Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6,

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

Customized IOLs- -Post Insertion Post Insertion Customized IOLs Christian A. Sandstedt, Ph.D.

Fault Insertion using IEEE1149.x Implementation of Fault Insertion in a commercial product and

An Optimal Jumper An Optimal Jumper Insertion Algorithm for Antenna Insertion Algorithm for

Model for U-Insertion RNA Editing Activites needed for U-insertion: Endonuclease to cut the

Global Knot Insertion Algorithms Scott Schaefer Ron Goldman Department of Computer Science

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Substitution-model reasoning for Hofl A Hofl interpreter Theory of Programming Languages Computer

TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions Zhihao Jia , Oded

Making Programs Forget: Enforcing Lifetime for Sensitive Data Jayanthkumar Kannan (Google Inc),

W e can alw a ys rename v ariables whatev er w e lik e, so this step is

Natural Language Processing The Turing Test Eliza State of the art Conversational Agent

Basis for Equalities Implied by a System of LRA Constraints 7/2/2016 Linear Arithmetic / Linear

Update on Alternatives Assessment Under Reach M A R C H 1 4 , 2 0 1 6 F A C I L I T A T E D B