Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector ready for insert ligation 5’ 3’ 3’ 5’

Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector with ligated insert 5’ CC GG 3’ 3’ GG CC 5’

Where are the indels coming from? • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the alignments to genomic fragments flanked by pairs of identical 4-cutter sites • Only 1% of the fragments attributed to the sticky ends 4-cutter , GATC , contained indels, whereas the three blunt ends 4-cutter ( GTAC 8, GGCC 8, AGCT ) had much higher rates of indels 29.5-54% • If a base is lost at the sticky end ligation is significantly compromised: double stranded vector with ligated insert 5’ GATC GATC 3’ 3’ CTAG CTAG 5’ • However, a lost bp at the blunt end has no effect on ligation: double stranded vector with ligated insert 5’ CC GG 3’ 3’ GG CC 5’ • Consistently, GATC libraries gave much lower cloning efficiencies than the 3 blunt cutters • The indels are most likely not generated during sequencing, but then why haven’t we observed them when using Sanger sequencing?

contigs and functional cores

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long fragment 702 • Median lengths in bps: contig 1002 core 387

contigs and functional cores • Filtering out low quality reads and genomic fragments supported by a single read pair we ended up with 720 overlapping genomic fragments • Assembled into 366 contigs each containing 1-5 fragments • The minimal functional element, or the “functional core” is in principle better approximated by the intersection of all the contig’s fragments • However, due to erroneous (FP) fragments the intersection might be empty • A dynamic programming script finds the smallest number of fragments that needs to be omitted so that the intersection of the remaining fragments is at least 50bp long fragment 702 • Median lengths in bps: contig 1002 core 387 • While there are cases of FPs or non-functional cores, in all such cases that we checked, the contig is also non-functional so the cores seem well defined

Comparison with gold standard Ivan Liachko • Able to identify ~85% of S. cerevisiae ARSs in a single experiment • Fairly low FP rate ~12.5% • Confirmed >50 likely ARSs and discovered a handful of new ones

Which part of the ARS is necessary for function? ARS full (231bp) -40L -80L -101L -120L -40R -80R -120R -40L-40R -80L-40R -120L-40R -101L-40R (min) 9.%)+!%*( 7!%:;.%)+!%*( Ivan Liachko

miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media

miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS ARS ARS Amplify ARS-seq inserts

miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS + ARS ARS URA3 Shear and clone Amplify ARS-seq inserts ARS sub-fragments

miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media ARS + ARS ARS URA3 ARS screen Shear and clone Amplify ARS-seq inserts ARS sub-fragments

miniARS-seq: Defining essential ARS regions purify and sequence construct genomic libraries ARS screen ARS plasmids in ARS-less vector ARS ARS ARS + URA3 selective media miniARS miniARS miniARS ARS + ARS ARS URA3 Isolate and sequence ARS screen Shear and clone Amplify ARS-seq inserts miniARS plasmids ARS sub-fragments Ivan Liachko

Works great except...

Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs

Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem:

Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count

Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count • How can that be?

Works great except... • The miniARS-seq experiment created quite a few inexplicable FPs • Additional technical and biological replicates (4 sequencing runs altogether) did not solve the problem: • We observed clearly non-functional fragments with repeated substantial read count • How can that be? • We’ll get back to it but in the meantime, what’s with all the reads we can’t map?

Persistent mapping

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert S1 read S2 read

Persistent mapping 5’ arsseq vector ARSseq insert 3’ arsseq vector DNAseI DNAseI DNAseI DNAseI miniARS insert miniARS insert 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Stats for 1 of 4 runs: 9M 101bp read pairs S1 read S2 read 3.6M aligned by BT 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 3’ ends of reads 300K pairs aligned by BT S1 read S2 read 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert Trim 5’ prefixes of reads matching 5’ suffixes of vector S1 read 400K pairs aligned by BT S2 read

Persistence pays

Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores

Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read

Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type:

Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type: 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read

Persistence pays • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores • 2M of those read pairs turn out to be confirmed double inserts: the two reads were mapped to distinct parts of the concatenated genome 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • And probably quite a few more are of this type: 5‘ mini vector S1 primer S2 primer 3’ mini vector miniARS insert 1 miniARS insert 2 S1 read S2 read • But we didn’t pursue this further as at this point we realized we have a solution to the more important question of the inexplicable FP mini inserts

Silent double insert may mask the functional and reveal only the non-functional insert

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS 5’ miniARS vector S1 primer S2 primer 3’ miniARS vector non-functional miniARS insert functional miniARS insert ACS

Silent double insert may mask the functional and reveal only the non-functional insert 5’ arsseq 3’ arsseq ARSseq insert 1 S1 primer S2 primer vector vector ACS DNAseI DNAseI non-functional miniARS insert 5’ arsseq 3’ arsseq ARSseq insert 2 S2 primer S1 primer vector vector ACS DNAseI DNAseI functional miniARS insert ACS 5’ miniARS vector S1 primer S2 primer 3’ miniARS vector non-functional miniARS insert functional miniARS insert ACS This is the part that will get sequenced

Nice hypothesis, but where’s the evidence?

Nice hypothesis, but where’s the evidence? • It’s all circumstantial

Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads

Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector

Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction

Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465 also share the arsseq orientation relative to the vector (p-value < 2.2e-16)

Nice hypothesis, but where’s the evidence? • It’s all circumstantial • There is a substantial number of observed double insert: 22-24% of all reads • We observe quite a few miniARS inserts starting with the ARS-seq vector • When it’s not long enough to serve as a primer for the mini sequencing reaction • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465 also share the arsseq orientation relative to the vector (p-value < 2.2e-16) • Filtering out the mini fragments that share an end with the parent ARS-seq insert removes most of the suspected FPs

miniARS contigs and inferred cores

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach:

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp • We have no evidence of incorrectly defined cores

miniARS contigs and inferred cores • After filtering out suspected double inserts we assembled the remaining 12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs • Defining the cores using the same procedure for arsseq was not optimal: average of 68 miniARS vs. 2 ARSseq fragments per contig • We added a statistical aspect to the combinatorial approach: • A mini contig’s core is defined essentially by dropping the 5% rightmost fragment starts as well as the leftmost 5% fragment ends • If it is shorter than 50bp the DP approach removes additional fragments • The contig median length is 230 bp whereas the core’s is 92 bp • We have no evidence of incorrectly defined cores • FP rate is estimated at 3.9%: 8 of 181 contigs

BCD:/&EF5#%#BCD:/&E,"&G%&,).""&%2,BCD,=%!1(&6$& ACS ARS-seq miniARS-seq ARS419 OriDB YOS9 TGL2 UBC5 * 566000 567000 568000 569000 <#*)3=!, !"#$%&# /.-5#A&6

To boldly go where others have gone before...

To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp

To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14

To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14 • More information 5’ than 3’ of the oriented ACS

To boldly go where others have gone before... 71 “left skewed” mini cores: 8 “right skewed” mini cores: ACS ACS ≤ 5bp ≤ 5bp • 2-sided binomial test p-value = 9.7e-14 • More information 5’ than 3’ of the oriented ACS • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

More recommend