ARS-seq: High-Resolution Mapping and Mutational Scanning of - - PowerPoint PPT Presentation

ars seq high resolution mapping and mutational scanning
SMART_READER_LITE
LIVE PREVIEW

ARS-seq: High-Resolution Mapping and Mutational Scanning of - - PowerPoint PPT Presentation

ARS-seq: High-Resolution Mapping and Mutational Scanning of Autonomously Replicating Sequences Uri Keich School of Mathematics and Statistics, University of Sydney Joint work with: Ivan Liachko, Rachel A. Youngblood, Maitreya J. Dunham


slide-1
SLIDE 1

ARS-seq: High-Resolution Mapping and Mutational Scanning of Autonomously Replicating Sequences

Uri Keich

School of Mathematics and Statistics, University of Sydney

Joint work with: Ivan Liachko, Rachel A. Youngblood, Maitreya J. Dunham

Department of Genome Sciences, University of Washington

slide-2
SLIDE 2

The cell cycle

Alberts et al.

  • Growth and

development of an

  • rganism depends on a

series of cell divisions

  • A cell cannot divide

before its DNA is replicated

slide-3
SLIDE 3

!"#$#% !"#$#% !"#$#% "&'(#)*+!%,-.--(&/

Ivan Liachko

slide-4
SLIDE 4

Probably more than you wanted to know about it

Holzen and Sclafani, 2007

slide-5
SLIDE 5

!"#$#% !"#$#% !"#$#%

Yeast Origins are Autonomously Replicating Sequences (ARSs)

slide-6
SLIDE 6

!"#$#% !"#$#% !"#$#%

URA3

ARS

Yeast Origins are Autonomously Replicating Sequences (ARSs)

slide-7
SLIDE 7

!"#$#% !"#$#% !"#$#%

URA3

ARS

Yeast Origins are Autonomously Replicating Sequences (ARSs)

0"!123,!%,/&(&)+4&,5&6#*

slide-8
SLIDE 8

!"#$#% !"#$#% !"#$#%

URA3

ARS

Yeast Origins are Autonomously Replicating Sequences (ARSs)

0"!123,!%,/&(&)+4&,5&6#*

URA3

7!,$"!123,!%,/&(&)+4&,5&6#* Ivan Liachko

slide-9
SLIDE 9

Saccharomyces cerevisiae Origins

slide-10
SLIDE 10

Saccharomyces cerevisiae Origins

  • Numerous studies spanning roughly 30 years located

almost all origins in S. cerevisiae (~400)

slide-11
SLIDE 11

Saccharomyces cerevisiae Origins

  • Numerous studies spanning roughly 30 years located

almost all origins in S. cerevisiae (~400)

  • A conserved 11-33bp sequence motif called ACS (ARS

Consensus Sequence) is necessary but considered insufficient for origin activity

slide-12
SLIDE 12

URA3

+

construct genomic libraries in ARS-less vector

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

slide-13
SLIDE 13

URA3

+

construct genomic libraries in ARS-less vector

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

slide-14
SLIDE 14

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
slide-15
SLIDE 15

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
  • 5.2M 76bp read pairs
slide-16
SLIDE 16

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

S1 primer S2 primer ARSseq insert 5’ vector 3’ vector

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
  • 5.2M 76bp read pairs
slide-17
SLIDE 17

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

S1 primer S2 primer ARSseq insert 5’ vector 3’ vector S1 read S2 read

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
  • 5.2M 76bp read pairs
slide-18
SLIDE 18

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

S1 primer S2 primer ARSseq insert 5’ vector 3’ vector

  • Bowtie: ~3.8M alignments to 6,054 unique contiguous genomic fragments

S1 read S2 read

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
  • 5.2M 76bp read pairs
slide-19
SLIDE 19

URA3

+

construct genomic libraries in ARS-less vector ARS ARS ARS purify and sequence ARS plasmids

ARS-seq: High-throughput mapping of ARSs

  • Genome is digested using four different 4-cutter restriction enzymes
  • A 12x library of overlapping restriction fragments is cloned into an ARS-less plasmid

that contains the URA3 gene essential for colony formation

ARS screen selective media

Ivan Liachko

S1 primer S2 primer ARSseq insert 5’ vector 3’ vector

  • Bowtie: ~3.8M alignments to 6,054 unique contiguous genomic fragments
  • Read pair count per fragment varied between 1 - 227,992

S1 read S2 read

  • Grown colonies are sequenced using Illumina pair-end deep sequencing
  • 5.2M 76bp read pairs
slide-20
SLIDE 20

Curious edge effect

slide-21
SLIDE 21

Curious edge effect

  • Many of the genomic fragments did not start and

end as expected with sites of one of the four 4- cutters:

RE site ARSseq insert genome 3’ GATC RE site GATC genome 5’ GTAC GTAC GGCC GGCC AGCT AGCT

slide-22
SLIDE 22

Curious edge effect

  • Many of the genomic fragments did not start and

end as expected with sites of one of the four 4- cutters:

RE site ARSseq insert genome 3’ GATC RE site GATC genome 5’ GTAC GTAC GGCC GGCC AGCT AGCT

  • As such indels were not observed previously when

using Sanger sequencing the suspicion lay on the Illumina sequencing step

slide-23
SLIDE 23

Where are the indels coming from?

slide-24
SLIDE 24

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

slide-25
SLIDE 25

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

slide-26
SLIDE 26

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:

double stranded genomic fragment CTAG GATC 5’ 3’ 3’ 5’

slide-27
SLIDE 27

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:

double stranded vector ready for insert ligation CTAG 5’ 3’ GATC 3’ 5’

slide-28
SLIDE 28

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:

CTAG 5’ 3’ GATC 3’ 5’ CTAG GATC double stranded vector with ligated insert

slide-29
SLIDE 29

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:
  • However, a lost bp at the blunt end has no effect on ligation:

CTAG 5’ 3’ GATC 3’ 5’ CTAG GATC double stranded vector with ligated insert 5’ 3’ 3’ 5’ double stranded genomic fragment CC GG GG CC

slide-30
SLIDE 30

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:
  • However, a lost bp at the blunt end has no effect on ligation:

CTAG 5’ 3’ GATC 3’ 5’ CTAG GATC double stranded vector with ligated insert double stranded vector ready for insert ligation 5’ 3’ 3’ 5’

slide-31
SLIDE 31

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:
  • However, a lost bp at the blunt end has no effect on ligation:

CTAG 5’ 3’ GATC 3’ 5’ CTAG GATC double stranded vector with ligated insert 5’ 3’ 3’ 5’ double stranded vector with ligated insert CC GG GG CC

slide-32
SLIDE 32

Where are the indels coming from?

  • Allowing a small number of indels we assigned 3,748,614 (over 99.5%) of the

alignments to genomic fragments flanked by pairs of identical 4-cutter sites

  • Only 1% of the fragments attributed to the sticky ends 4-cutter,GATC,

contained indels, whereas the three blunt ends 4-cutter (GTAC8,GGCC8, AGCT) had much higher rates of indels 29.5-54%

  • If a base is lost at the sticky end ligation is significantly compromised:
  • However, a lost bp at the blunt end has no effect on ligation:
  • Consistently, GATC libraries gave much lower cloning efficiencies than the 3

blunt cutters

  • The indels are most likely not generated during sequencing, but then why

haven’t we observed them when using Sanger sequencing?

CTAG 5’ 3’ GATC 3’ 5’ CTAG GATC double stranded vector with ligated insert 5’ 3’ 3’ 5’ double stranded vector with ligated insert CC GG GG CC

slide-33
SLIDE 33

contigs and functional cores

slide-34
SLIDE 34

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

slide-35
SLIDE 35

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
slide-36
SLIDE 36

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
  • The minimal functional element, or the “functional core” is in principle

better approximated by the intersection of all the contig’s fragments

slide-37
SLIDE 37

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
  • The minimal functional element, or the “functional core” is in principle

better approximated by the intersection of all the contig’s fragments

  • However, due to erroneous (FP) fragments the intersection might be empty
slide-38
SLIDE 38

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
  • The minimal functional element, or the “functional core” is in principle

better approximated by the intersection of all the contig’s fragments

  • However, due to erroneous (FP) fragments the intersection might be empty
  • A dynamic programming script finds the smallest number of fragments that

needs to be omitted so that the intersection of the remaining fragments is at least 50bp long

slide-39
SLIDE 39

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
  • The minimal functional element, or the “functional core” is in principle

better approximated by the intersection of all the contig’s fragments

  • However, due to erroneous (FP) fragments the intersection might be empty
  • A dynamic programming script finds the smallest number of fragments that

needs to be omitted so that the intersection of the remaining fragments is at least 50bp long

  • Median lengths in bps:

fragment 702 contig 1002 core 387

slide-40
SLIDE 40

contigs and functional cores

  • Filtering out low quality reads and genomic fragments supported by a single

read pair we ended up with 720 overlapping genomic fragments

  • Assembled into 366 contigs each containing 1-5 fragments
  • The minimal functional element, or the “functional core” is in principle

better approximated by the intersection of all the contig’s fragments

  • However, due to erroneous (FP) fragments the intersection might be empty
  • A dynamic programming script finds the smallest number of fragments that

needs to be omitted so that the intersection of the remaining fragments is at least 50bp long

  • Median lengths in bps:

fragment 702 contig 1002 core 387

  • While there are cases of FPs or non-functional cores, in all such cases that we

checked, the contig is also non-functional so the cores seem well defined

slide-41
SLIDE 41

Comparison with gold standard

Ivan Liachko

  • Able to identify ~85% of S. cerevisiae ARSs in a single experiment
  • Fairly low FP rate ~12.5%
  • Confirmed >50 likely ARSs and discovered a handful of new ones
slide-42
SLIDE 42

full (231bp)

  • 40L
  • 80L
  • 120L
  • 40R
  • 80R
  • 120R
  • 101L
  • 101L-40R (min)

ARS

  • 40L-40R
  • 80L-40R
  • 120L-40R

9.%)+!%*( 7!%:;.%)+!%*(

Ivan Liachko

Which part of the ARS is necessary for function?

slide-43
SLIDE 43

URA3

+

construct genomic libraries in ARS-less vector ARS screen ARS ARS ARS purify and sequence ARS plasmids

miniARS-seq: Defining essential ARS regions

selective media

slide-44
SLIDE 44

URA3

+

construct genomic libraries in ARS-less vector ARS screen ARS ARS ARS purify and sequence ARS plasmids

miniARS-seq: Defining essential ARS regions

ARS ARS ARS

Amplify ARS-seq inserts selective media

slide-45
SLIDE 45

URA3

+

construct genomic libraries in ARS-less vector ARS screen ARS ARS ARS purify and sequence ARS plasmids

miniARS-seq: Defining essential ARS regions

ARS ARS ARS

Amplify ARS-seq inserts URA3

+

Shear and clone ARS sub-fragments selective media

slide-46
SLIDE 46

URA3

+

construct genomic libraries in ARS-less vector ARS screen ARS ARS ARS purify and sequence ARS plasmids

miniARS-seq: Defining essential ARS regions

ARS ARS ARS

Amplify ARS-seq inserts URA3

+

Shear and clone ARS sub-fragments ARS screen selective media

slide-47
SLIDE 47

URA3

+

construct genomic libraries in ARS-less vector ARS screen ARS ARS ARS purify and sequence ARS plasmids

miniARS-seq: Defining essential ARS regions

ARS ARS ARS

Amplify ARS-seq inserts URA3

+

Shear and clone ARS sub-fragments ARS screen miniARS Isolate and sequence miniARS plasmids miniARS miniARS selective media

Ivan Liachko

slide-48
SLIDE 48

Works great except...

slide-49
SLIDE 49

Works great except...

  • The miniARS-seq experiment created quite a few inexplicable FPs
slide-50
SLIDE 50

Works great except...

  • The miniARS-seq experiment created quite a few inexplicable FPs
  • Additional technical and biological replicates (4 sequencing runs altogether)

did not solve the problem:

slide-51
SLIDE 51

Works great except...

  • The miniARS-seq experiment created quite a few inexplicable FPs
  • Additional technical and biological replicates (4 sequencing runs altogether)

did not solve the problem:

  • We observed clearly non-functional fragments with repeated substantial

read count

slide-52
SLIDE 52

Works great except...

  • The miniARS-seq experiment created quite a few inexplicable FPs
  • Additional technical and biological replicates (4 sequencing runs altogether)

did not solve the problem:

  • We observed clearly non-functional fragments with repeated substantial

read count

  • How can that be?
slide-53
SLIDE 53

Works great except...

  • The miniARS-seq experiment created quite a few inexplicable FPs
  • Additional technical and biological replicates (4 sequencing runs altogether)

did not solve the problem:

  • We observed clearly non-functional fragments with repeated substantial

read count

  • How can that be?
  • We’ll get back to it but in the meantime, what’s with all the reads we can’t

map?

slide-54
SLIDE 54

Persistent mapping

slide-55
SLIDE 55

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector

slide-56
SLIDE 56

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI

slide-57
SLIDE 57

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert

slide-58
SLIDE 58

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert

slide-59
SLIDE 59

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read

slide-60
SLIDE 60

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT

slide-61
SLIDE 61

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT

slide-62
SLIDE 62

Persistent mapping

ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT Trim 3’ ends of reads 300K pairs aligned by BT

slide-63
SLIDE 63

Persistent mapping

DNAseI DNAseI miniARS insert ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT Trim 3’ ends of reads 300K pairs aligned by BT

slide-64
SLIDE 64

Persistent mapping

DNAseI DNAseI miniARS insert ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT Trim 3’ ends of reads 300K pairs aligned by BT

miniARS insert S1 primer 5‘ mini vector S1 read S2 primer 3’ mini vector S2 read

slide-65
SLIDE 65

Persistent mapping

DNAseI DNAseI miniARS insert ARSseq insert 5’ arsseq vector 3’ arsseq vector DNAseI DNAseI miniARS insert S1 primer 5‘ mini vector S2 primer 3’ mini vector miniARS insert S1 read S2 read S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert S2 read

Stats for 1 of 4 runs: 9M 101bp read pairs 3.6M aligned by BT Trim 3’ ends of reads 300K pairs aligned by BT

miniARS insert S1 primer 5‘ mini vector S1 read S2 primer 3’ mini vector S2 read

Trim 5’ prefixes of reads matching 5’ suffixes of vector 400K pairs aligned by BT

slide-66
SLIDE 66

Persistence pays

slide-67
SLIDE 67

Persistence pays

  • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores
slide-68
SLIDE 68

Persistence pays

  • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores
  • 2M of those read pairs turn out to be confirmed double inserts: the two

reads were mapped to distinct parts of the concatenated genome

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

slide-69
SLIDE 69

Persistence pays

  • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores
  • 2M of those read pairs turn out to be confirmed double inserts: the two

reads were mapped to distinct parts of the concatenated genome

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

  • And probably quite a few more are of this type:
slide-70
SLIDE 70

Persistence pays

  • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores
  • 2M of those read pairs turn out to be confirmed double inserts: the two

reads were mapped to distinct parts of the concatenated genome

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

  • And probably quite a few more are of this type:

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

slide-71
SLIDE 71

Persistence pays

  • Still, BT fails to align 4.7M (52%) of the reads, many with good quality scores
  • 2M of those read pairs turn out to be confirmed double inserts: the two

reads were mapped to distinct parts of the concatenated genome

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

  • And probably quite a few more are of this type:

S1 primer 5‘ mini vector S2 primer 3’ mini vector S1 read miniARS insert 2 S2 read miniARS insert 1

  • But we didn’t pursue this further as at this point we realized we have a

solution to the more important question of the inexplicable FP mini inserts

slide-72
SLIDE 72

Silent double insert may mask the functional and reveal only the non-functional insert

slide-73
SLIDE 73

Silent double insert may mask the functional and reveal only the non-functional insert

ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector

slide-74
SLIDE 74

Silent double insert may mask the functional and reveal only the non-functional insert

DNAseI DNAseI ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector

slide-75
SLIDE 75

Silent double insert may mask the functional and reveal only the non-functional insert

DNAseI DNAseI non-functional miniARS insert ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector

slide-76
SLIDE 76

Silent double insert may mask the functional and reveal only the non-functional insert

DNAseI DNAseI non-functional miniARS insert ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector ARSseq insert 2 ACS DNAseI DNAseI ACS functional miniARS insert S1 primer 5’ arsseq vector S2 primer 3’ arsseq vector

slide-77
SLIDE 77

Silent double insert may mask the functional and reveal only the non-functional insert

DNAseI DNAseI non-functional miniARS insert non-functional miniARS insert S1 primer S2 primer 5’ miniARS vector 3’ miniARS vector ACS functional miniARS insert ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector ARSseq insert 2 ACS DNAseI DNAseI ACS functional miniARS insert S1 primer 5’ arsseq vector S2 primer 3’ arsseq vector

slide-78
SLIDE 78

Silent double insert may mask the functional and reveal only the non-functional insert

DNAseI DNAseI non-functional miniARS insert non-functional miniARS insert S1 primer S2 primer 5’ miniARS vector 3’ miniARS vector ACS functional miniARS insert This is the part that will get sequenced ARSseq insert 1 S1 primer 5’ arsseq vector ACS S2 primer 3’ arsseq vector ARSseq insert 2 ACS DNAseI DNAseI ACS functional miniARS insert S1 primer 5’ arsseq vector S2 primer 3’ arsseq vector

slide-79
SLIDE 79

Nice hypothesis, but where’s the evidence?

slide-80
SLIDE 80

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
slide-81
SLIDE 81

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
  • There is a substantial number of observed double insert: 22-24% of all reads
slide-82
SLIDE 82

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
  • There is a substantial number of observed double insert: 22-24% of all reads
  • We observe quite a few miniARS inserts starting with the ARS-seq vector
slide-83
SLIDE 83

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
  • There is a substantial number of observed double insert: 22-24% of all reads
  • We observe quite a few miniARS inserts starting with the ARS-seq vector
  • When it’s not long enough to serve as a primer for the mini sequencing

reaction

slide-84
SLIDE 84

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
  • There is a substantial number of observed double insert: 22-24% of all reads
  • We observe quite a few miniARS inserts starting with the ARS-seq vector
  • When it’s not long enough to serve as a primer for the mini sequencing

reaction

  • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465

also share the arsseq orientation relative to the vector (p-value < 2.2e-16)

slide-85
SLIDE 85

Nice hypothesis, but where’s the evidence?

  • It’s all circumstantial
  • There is a substantial number of observed double insert: 22-24% of all reads
  • We observe quite a few miniARS inserts starting with the ARS-seq vector
  • When it’s not long enough to serve as a primer for the mini sequencing

reaction

  • Of the 611 miniARS fragments sharing an end with the parent ARS-seq 465

also share the arsseq orientation relative to the vector (p-value < 2.2e-16)

  • Filtering out the mini fragments that share an end with the parent ARS-seq

insert removes most of the suspected FPs

slide-86
SLIDE 86

miniARS contigs and inferred cores

slide-87
SLIDE 87

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

slide-88
SLIDE 88

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

slide-89
SLIDE 89

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
slide-90
SLIDE 90

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
  • A mini contig’s core is defined essentially by dropping the 5% rightmost

fragment starts as well as the leftmost 5% fragment ends

slide-91
SLIDE 91

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
  • A mini contig’s core is defined essentially by dropping the 5% rightmost

fragment starts as well as the leftmost 5% fragment ends

  • If it is shorter than 50bp the DP approach removes additional fragments
slide-92
SLIDE 92

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
  • A mini contig’s core is defined essentially by dropping the 5% rightmost

fragment starts as well as the leftmost 5% fragment ends

  • If it is shorter than 50bp the DP approach removes additional fragments
  • The contig median length is 230 bp whereas the core’s is 92 bp
slide-93
SLIDE 93

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
  • A mini contig’s core is defined essentially by dropping the 5% rightmost

fragment starts as well as the leftmost 5% fragment ends

  • If it is shorter than 50bp the DP approach removes additional fragments
  • The contig median length is 230 bp whereas the core’s is 92 bp
  • We have no evidence of incorrectly defined cores
slide-94
SLIDE 94

miniARS contigs and inferred cores

  • After filtering out suspected double inserts we assembled the remaining

12,338 miniARS genomic fragments (median 148bp) into 181 unique contigs

  • Defining the cores using the same procedure for arsseq was not optimal:

average of 68 miniARS vs. 2 ARSseq fragments per contig

  • We added a statistical aspect to the combinatorial approach:
  • A mini contig’s core is defined essentially by dropping the 5% rightmost

fragment starts as well as the leftmost 5% fragment ends

  • If it is shorter than 50bp the DP approach removes additional fragments
  • The contig median length is 230 bp whereas the core’s is 92 bp
  • We have no evidence of incorrectly defined cores
  • FP rate is estimated at 3.9%: 8 of 181 contigs
slide-95
SLIDE 95

566000 567000 568000 569000

YOS9 TGL2 UBC5 * ARS419

ACS

ARS-seq miniARS-seq OriDB

<#*)3=!,!"#$%&#/.-5#A&6

BCD:/&EF5#%#BCD:/&E,"&G%&,).""&%2,BCD,=%!1(&6$&

slide-96
SLIDE 96

To boldly go where others have gone before...

slide-97
SLIDE 97

To boldly go where others have gone before...

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-98
SLIDE 98

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-99
SLIDE 99

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14
  • More information 5’ than 3’ of the oriented ACS

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-100
SLIDE 100

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14
  • More information 5’ than 3’ of the oriented ACS
  • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-101
SLIDE 101

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14
  • More information 5’ than 3’ of the oriented ACS
  • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?
  • Marahrens and Stillman (Science, 1992) demonstrated that in the case of

ARS1 this extended ACS cannot initiate replication (common knowledge)

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-102
SLIDE 102

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14
  • More information 5’ than 3’ of the oriented ACS
  • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?
  • Marahrens and Stillman (Science, 1992) demonstrated that in the case of

ARS1 this extended ACS cannot initiate replication (common knowledge)

  • We tested 4 fragments derived from these 8 right-skewed mini cores that

we trimmed to precisely coincide with the 33bp ACS

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-103
SLIDE 103

To boldly go where others have gone before...

  • 2-sided binomial test p-value = 9.7e-14
  • More information 5’ than 3’ of the oriented ACS
  • Still, few of the 8 are functional: is the 33bp ACS sufficient for ARS function?
  • Marahrens and Stillman (Science, 1992) demonstrated that in the case of

ARS1 this extended ACS cannot initiate replication (common knowledge)

  • We tested 4 fragments derived from these 8 right-skewed mini cores that

we trimmed to precisely coincide with the 33bp ACS

  • 3 were found to be functional!

ACS ≤ 5bp ACS ≤ 5bp

71 “left skewed” mini cores: 8 “right skewed” mini cores:

slide-104
SLIDE 104

9"*$5&%2/,(&%$23,/2*2/

22

Ivan Liachko

slide-105
SLIDE 105

!"#"$%&

H.2*$&%#I&,BCD

5.2BCD:/&EJ,K&&',H.2*+!%*(,D)*%%#%$,!;,BCD/

L(!%&,5.2*%2,4*"#*%2/

Ivan Liachko

slide-106
SLIDE 106

!"#"$%&

H.2*$&%#I&,BCD

5.2BCD:/&EJ,K&&',H.2*+!%*(,D)*%%#%$,!;,BCD/

L(!%&,5.2*%2,4*"#*%2/ BCD,/)"&&%

Ivan Liachko

slide-107
SLIDE 107

!"#"$%&

H.2*$&%#I&,BCD

5.2BCD:/&EJ,K&&',H.2*+!%*(,D)*%%#%$,!;,BCD/

L(!%&,5.2*%2,4*"#*%2/ BCD,/)"&&%

Ivan Liachko

slide-108
SLIDE 108

!"#"$%&

H.2*$&%#I&,BCD

5.2BCD:/&EJ,K&&',H.2*+!%*(,D)*%%#%$,!;,BCD/

D&E.&%)& D&E.&%)&

L(!%&,5.2*%2,4*"#*%2/ BCD,/)"&&%

Ivan Liachko

slide-109
SLIDE 109

Acknowledgments

  • Joint work with
  • Ivan Liachko, Rachel A.

Youngblood, Maitreya J. Dunham from the Department of Genome Sciences, University of Washington, Seattle

  • Huge thanks to Ivan Liachko for making his

wonderful slides available