ars seq high resolution mapping and mutational scanning
play

ARS-seq: High-Resolution Mapping and Mutational Scanning of - PowerPoint PPT Presentation

ARS-seq: High-Resolution Mapping and Mutational Scanning of Autonomously Replicating Sequences Uri Keich School of Mathematics and Statistics, University of Sydney Joint work with: Ivan Liachko, Rachel A. Youngblood, Maitreya J. Dunham


  1. Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector ready for insert ligation 5’ 3’ 3’ 5’

  2. Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector with ligated insert 5’ CC GG 3’ 3’ GG CC 5’

  3. Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector with ligated insert 5’ CC GG 3’ 3’ GG CC 5’ • Consistently, GATC libraries gave much lower cloning efficiencies than the 3 blunt cutters • The indels are most likely not generated during sequencing, but then why haven’t we observed them when using Sanger sequencing?

  4. contigs and functional cores

  5. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments

  6. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments

  7. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments

  8. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty

  9. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long

  10. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long fragment 702 • Median lengths in bps: contig 1002 core 387

  11. contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long fragment 702 • Median lengths in bps: contig 1002 core 387 • While there are cases of FPs or non-functional cores, in all such cases that we checked, the contig is also non-functional so the cores seem well defined

  12. Comparison with gold standard Ivan Liachko • Able to identify ~85% of S. cerevisiae ARSs in a single experiment • Fairly low FP rate ~12.5% • Confirmed >50 likely ARSs and discovered a handful of new ones

  13. Which part of the ARS is necessary for function? ARS full (231bp) -40L -80L -101L -120L -40R -80R -120R -40L-40R -80L-40R -120L-40R -101L-40R (min) 9.%)+!%*( 7!%:;.%)+!%*( Ivan Liachko

  14. miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media

  15. miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS ARS ARS Amplify ARS-seq inserts

  16. miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS + ARS ARS URA3 Shear and clone Amplify ARS-seq inserts ARS sub-fragments

  17. miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS + ARS ARS URA3 ARS screen Shear and clone Amplify ARS-seq inserts ARS sub-fragments

  18. miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media miniARS miniARS miniARS ARS + ARS ARS URA3 Isolate and sequence ARS screen Shear and clone Amplify ARS-seq inserts miniARS plasmids ARS sub-fragments Ivan Liachko

  19. Works great except...

  20. Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs

  21. Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem:

  22. Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count

  23. Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count • How can that be?

  24. Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count • How can that be? • We’ll get back to it but in the meantime, what’s with all the reads we can’t map?

  25. Persistent mapping

  26. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector

  27. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI

  28. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert

  29. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert

  30. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

  31. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT

  32. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

  33. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read

  34. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read

  35. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

  36. Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 5’ prefixes of reads matching 5’ suffixes of vector S1 read 400K pairs aligned by BT S2 read

  37. Persistence pays

  38. Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores

  39. Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read

  40. Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type:

  41. Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type: 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read

  42. Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type: 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • But we didn’t pursue this further as at this point we realized we have a solution to the more important question of the inexplicable FP mini inserts

  43. Silent double insert may mask the functional and reveal only the non-functional insert

  44. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS

  45. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI

  46. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert

  47. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS

  48. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS 5’ miniARS vector S1 primer S2 primer 3’ miniARS vector non-functional miniARS insert functional miniARS insert ACS

  49. Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS 5’ miniARS vector S1 primer S2 primer 3’ miniARS vector non-functional miniARS insert functional miniARS insert ACS This is the part that will get sequenced

  50. Nice hypothesis, but where’s the evidence?

  51. Nice hypothesis, but where’s the evidence? • It’s all circumstantial

  52. Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads

  53. Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector

  54. Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction

  55. Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465 also share the arsseq orientation relative to the vector (p-value < 2.2e-16)

  56. Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465 also share the arsseq orientation relative to the vector (p-value < 2.2e-16) • Filtering out the mini fragments that share an end with the parent ARS-seq insert removes most of the suspected FPs

  57. miniARS contigs and inferred cores

  58. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  59. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig

  60. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach:

  61. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends

  62. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments

  63. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp

  64. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp • We have no evidence of incorrectly defined cores

  65. miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp • We have no evidence of incorrectly defined cores • FP rate is estimated at 3.9%: 8 of 181 contigs

  66. BCD:/&EF5#%#BCD:/&E,"&G%&,).""&%2,BCD,=%!1(&6$& ACS ARS-seq miniARS-seq ARS419 OriDB YOS9 TGL2 UBC5 * 566000 567000 568000 569000 <#*)3=!, !"#$%&# /.-5#A&6

  67. To boldly go where others have gone before...

  68. To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp

  69. To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14

  70. To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14 • More information 5’ than 3’ of the oriented ACS

  71. To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14 • More information 5’ than 3’ of the oriented ACS • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?

Recommend


More recommend