Sample Complexity of Algorithm Configuration for Sequence Alignment
Travis Dick
Nina Balcan Dan DeBlasio Ellen Vitercik Carl Kingsford Tuomas Sandholm
Sample Complexity of Algorithm Configuration for Sequence Alignment - - PowerPoint PPT Presentation
Sample Complexity of Algorithm Configuration for Sequence Alignment Travis Dick Nina Balcan Dan DeBlasio Carl Kingsford Tuomas Sandholm Ellen Vitercik Sequence alignment Goal: Line up pairs of strings ( DNA, RNA, protein, ) Uncover
Nina Balcan Dan DeBlasio Ellen Vitercik Carl Kingsford Tuomas Sandholm
GRTCP---KPDDLPFSTVVPLKTFYEPGEEITYSCKPGYVSRGGMRKFICPLTGLWPINTLKCTP EVKCPFPSRPDN-GFVNYPAKPTLYYK-DKATFGCHDGY-SLDGPEEIECTKLGNWS-AMPSCKA ๐ป๐ = GRTCPKPDDLPFSTVVPLKTFYEPGEEITYSCKPGYVSRGGMRKFICPLTGLWPINTLKCTP ๐ป๐ = EVKCPFPSRPDNGFVNYPAKPTLYYKDKATFGCHDGYSLDGPEEIECTKLGNWSAMPSCKA
[Needleman and Wunsch โ70; Gotoh โ82]
Closest to ground truth, for example
'
'
Kim and Kececioglu โ07; Xu, Hutter, Hoos, Leyton-Brown โ08; Dai, Khalil, Zhang, Dilkina, Song โ17 โฆ
'
'
Gupta and Roughgarden โ16; Kleinberg, Leyton-Brown, Lucier โ17; Weisz, Gyรถrgy, Szepesvรกri โ18 โฆ
'
'
'
'
' , โฆ , ๐), ๐) '
& ) โ89& ) ๐ฃ๐ ๐8, ๐8 ' โ ๐ฝ(<,<=)~๐ [๐ฃ๐ ๐, ๐โฒ ] โค?
Similarity to ground truth ๐& ๐B
โ such that:
Insertion/deletion (indel) Match Mismatch Gap
Insertion/deletion (indel) Match Mismatch Gap
& ๐ต + โฏ + ๐0 L ๐ 0 ๐ต
& ๐ต , โฆ , ๐ 0 ๐ต features of alignment ๐ต (e.g., # matches, โฆ)
E-VKCPFPSRPDNGFVNYPAKPTLYYKDKATFGCHDGYSLDGP-EEIECTKLGNWSAMPSC-KA
E-VKCPFPSRPDNGFVNYPAKPTLYYKDKATFGCHDGYSLDGP-EEIECTKLGNWSAMPSC-KA
GRTCP---KPDDLPFSTVVPLKTFYEPGEEITYSCKPGYVSRGGMRKFICPLTGLWPINTLKCTP EVKCPFPSRPDN-GFVNYPAKPTLYYK-DKATFGCHDGY-SLDGPEEIECTKLGNWS-AMPSCKA
E-VKCPFPSRPDNGFVNYPAKPTLYYKDKATFGCHDGYSLDGP-EEIECTKLGNWSAMPSC-KA
GRTCP---KPDDLPFSTVVPLKTFYEPGEEITYSCKPGYVSRGGMRKFICPLTGLWPINTLKCTP EVKCPFPSRPDN-GFVNYPAKPTLYYK-DKATFGCHDGY-SLDGPEEIECTKLGNWS-AMPSCKA
GRTCPKPDDLPFSTV-VPLKTFYEPGEEITYSCKPGYVSRGGMRKFICPLTGLWPINTLKCTP EVKCPFPSRPDNGFVNYPAKPTLYYKDKATFGCHDGY-SLDGPEEIECTKLGNWSA-MPSCKA
0 YZ[ \ ]^
๐ฃ` ๐ ๐& ๐B
B
๐& ๐B
B
8 ๐ต > โ8 ๐8 โ ๐ 8(๐ต').
๐& ๐B
B
Similarity to ground truth ๐& ๐B
B
0D ]^) is ๐-optimal for ๐ w.h.p.
[Gusfield, Balasubramanian, Naor โ94; Fernรกndez-Baca, Seppรคlรคinen, Slutzki โ04]
YZ[ D ]^ ) is ๐-optimal for ๐ w.h.p.
๐& ๐B
0D ]^)
โ such that:
& ๐ต + โฏ + ๐0 L ๐ 0 ๐ต
& ๐ต , โฆ , ๐ 0 ๐ต features of alignment ๐ต (e.g., # matches, โฆ)
p ๐& โ ๐ & ๐ต + โฏ + ๐0 โ ๐ 0(๐ต) is NP-complete!
[Wang and Jiang, 1994, Kececioglu and Starrett, 2004]
8(๐ต)
0หโฐ^Dโ ]^