Master’s Thesis Genome Assembly: Scaffolding Guided by Related Genomes
Runar Furenes
Department of Informatics University of Oslo
2013-06-05
Scaffolding Guided by Related Genomes 1 / 42
Masters Thesis Genome Assembly: Scaffolding Guided by Related - - PowerPoint PPT Presentation
Masters Thesis Genome Assembly: Scaffolding Guided by Related Genomes Runar Furenes Department of Informatics University of Oslo 2013-06-05 Scaffolding Guided by Related Genomes 1 / 42 Presentation overview Introduction Problem
Runar Furenes
Department of Informatics University of Oslo
2013-06-05
Scaffolding Guided by Related Genomes 1 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction Problem specification Methods Materials Results Discussion Questions
Scaffolding Guided by Related Genomes 2 / 42
Introduction
Scaffolding Guided by Related Genomes 3 / 42
Introduction Genome assembly
ACTCGCA GGCATGCA GGCTAAGCT CGGATTACC
Scaffolding Guided by Related Genomes 4 / 42
Introduction Genome assembly
ACTCGCA GGCATGCA GGCTAAGCT CGGATTACC
Scaffolding Guided by Related Genomes 4 / 42
Introduction Genome assembly
A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding
Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly
A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding
Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly
A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding
Scaffolding Guided by Related Genomes 5 / 42
Introduction Genome assembly
A scaffold consists of at least two contigs Each contig within a scaffold is ordered and oriented A gap estimate is provided for each pair of contigs Mate pairs are commonly used in scaffolding
Scaffolding Guided by Related Genomes 5 / 42
Introduction Motivation
Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant
Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation
Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant
Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation
Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant
Scaffolding Guided by Related Genomes 6 / 42
Introduction Motivation
Scaffolding is an important step in the process of genome assembly Scaffolding often requires time consuming and expensive lab work Using genomes related to the target genome may make this easier The continuously growth of fully sequenced genomes available makes this increasingly relevant
Scaffolding Guided by Related Genomes 6 / 42
Introduction Hypotheses
Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs
Scaffolding Guided by Related Genomes 7 / 42
Introduction Hypotheses
Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs
Scaffolding Guided by Related Genomes 7 / 42
Introduction Hypotheses
Related genomes can be helpful in scaffolding Many such related genomes can be preferable to a few It can be beneficial to use only the ends of contigs
Scaffolding Guided by Related Genomes 7 / 42
Problem specification
Scaffolding Guided by Related Genomes 8 / 42
Problem specification Scaffolding problem specific to this thesis
Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead?
Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis
Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead?
Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis
Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead?
Scaffolding Guided by Related Genomes 9 / 42
Problem specification Scaffolding problem specific to this thesis
Related genomes may have nucleotide sequence similarities Can contigs be scaffolded with high accuracy using one or more such related genomes? More distant related genomes have more sequence similarities on a protein level than on a nucleotide level Can the same process run on a protein level instead?
Scaffolding Guided by Related Genomes 9 / 42
Problem specification Earlier research on this subject
Existing tools: ABACAS1 GRASS2 Can use additional information such as reference genome(s) in their scaffolding algorithms.
1Assefa et al. 2009 2Gritsenko et al. 2012 Scaffolding Guided by Related Genomes 10 / 42
Methods
Scaffolding Guided by Related Genomes 11 / 42
Methods Overview
GuideScaff is a pipeline producing scaffolds from contigs and guiding
Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available
Scaffolding Guided by Related Genomes 12 / 42
Methods Overview
GuideScaff is a pipeline producing scaffolds from contigs and guiding
Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available
Scaffolding Guided by Related Genomes 12 / 42
Methods Overview
GuideScaff is a pipeline producing scaffolds from contigs and guiding
Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available
Scaffolding Guided by Related Genomes 12 / 42
Methods Overview
GuideScaff is a pipeline producing scaffolds from contigs and guiding
Use contigs or contig ends from an assembly Match contigs with guiding genomes Use agreeing matches to create scaffolds Evaluate scaffolds with target genome if available
Scaffolding Guided by Related Genomes 12 / 42
Methods Overview
Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier
Scaffolding Guided by Related Genomes 13 / 42
Methods Overview
Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier
Scaffolding Guided by Related Genomes 13 / 42
Methods Overview
Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier
Scaffolding Guided by Related Genomes 13 / 42
Methods Overview
Contigs are assumed to be more or less correct Scaffold consists of entire contigs Contigs can map to multiple locations in a genome Using only contig ends could make it easier
Scaffolding Guided by Related Genomes 13 / 42
Methods Description
Manually, based on domain knowledge Automatically, based on BLAST search or similar
Scaffolding Guided by Related Genomes 14 / 42
Methods Description
Manually, based on domain knowledge Automatically, based on BLAST search or similar
Scaffolding Guided by Related Genomes 14 / 42
Methods Description
Contigs with length ≥ 2N are replaced by N nucleotides from each end Smaller contigs are kept intact N is experimentally set
Scaffolding Guided by Related Genomes 15 / 42
Methods Description
Contigs with length ≥ 2N are replaced by N nucleotides from each end Smaller contigs are kept intact N is experimentally set
Scaffolding Guided by Related Genomes 15 / 42
Methods Description
Contigs with length ≥ 2N are replaced by N nucleotides from each end Smaller contigs are kept intact N is experimentally set
Scaffolding Guided by Related Genomes 15 / 42
Methods Description
Contig ends are aligned to each guiding genome using tools from MUMmer3 Initially on a nucleotide level Re-aligned on a protein level if initial results are unsatisfactory
3Kurtz et al. 2004 Scaffolding Guided by Related Genomes 16 / 42
Methods Description
Contig ends are aligned to each guiding genome using tools from MUMmer3 Initially on a nucleotide level Re-aligned on a protein level if initial results are unsatisfactory
3Kurtz et al. 2004 Scaffolding Guided by Related Genomes 16 / 42
Methods Description
Contig ends are aligned to each guiding genome using tools from MUMmer3 Initially on a nucleotide level Re-aligned on a protein level if initial results are unsatisfactory
3Kurtz et al. 2004 Scaffolding Guided by Related Genomes 16 / 42
Methods Description
A tiling of contigs is produced for each guiding genome Tilings are processed to create a distance matrix Links between contigs are created based on this matrix Links are created in a greedy manner At least t guiding genomes must support each link created
Scaffolding Guided by Related Genomes 17 / 42
Methods Description
A tiling of contigs is produced for each guiding genome Tilings are processed to create a distance matrix Links between contigs are created based on this matrix Links are created in a greedy manner At least t guiding genomes must support each link created
Scaffolding Guided by Related Genomes 17 / 42
Methods Description
A tiling of contigs is produced for each guiding genome Tilings are processed to create a distance matrix Links between contigs are created based on this matrix Links are created in a greedy manner At least t guiding genomes must support each link created
Scaffolding Guided by Related Genomes 17 / 42
Methods Description
A tiling of contigs is produced for each guiding genome Tilings are processed to create a distance matrix Links between contigs are created based on this matrix Links are created in a greedy manner At least t guiding genomes must support each link created
Scaffolding Guided by Related Genomes 17 / 42
Methods Description
A tiling of contigs is produced for each guiding genome Tilings are processed to create a distance matrix Links between contigs are created based on this matrix Links are created in a greedy manner At least t guiding genomes must support each link created
Scaffolding Guided by Related Genomes 17 / 42
Methods Description
Linked contigs are converted to scaffolds Gap estimate = n corresponds to n N symbols A negative gap estimate means contig overlap Contigs marked as overlapping are attempted merged Contigs oriented opposite to the normal way are converted to their reversed complement
ACCGGTTANNNNNNACCAGGTTAACNNNNACGGTTT
Scaffolding Guided by Related Genomes 18 / 42
Methods Description
Linked contigs are converted to scaffolds Gap estimate = n corresponds to n N symbols A negative gap estimate means contig overlap Contigs marked as overlapping are attempted merged Contigs oriented opposite to the normal way are converted to their reversed complement
ACCGGTTANNNNNNACCAGGTTAACNNNNACGGTTT
Scaffolding Guided by Related Genomes 18 / 42
Methods Description
Linked contigs are converted to scaffolds Gap estimate = n corresponds to n N symbols A negative gap estimate means contig overlap Contigs marked as overlapping are attempted merged Contigs oriented opposite to the normal way are converted to their reversed complement
ACCGGTTANNNNNNACCAGGTTAACNNNNACGGTTT
Scaffolding Guided by Related Genomes 18 / 42
Methods Description
Linked contigs are converted to scaffolds Gap estimate = n corresponds to n N symbols A negative gap estimate means contig overlap Contigs marked as overlapping are attempted merged Contigs oriented opposite to the normal way are converted to their reversed complement
ACCGGTTANNNNNNACCAGGTTAACNNNNACGGTTT
Scaffolding Guided by Related Genomes 18 / 42
Methods Description
Linked contigs are converted to scaffolds Gap estimate = n corresponds to n N symbols A negative gap estimate means contig overlap Contigs marked as overlapping are attempted merged Contigs oriented opposite to the normal way are converted to their reversed complement
ACCGGTTANNNNNNACCAGGTTAACNNNNACGGTTT
Scaffolding Guided by Related Genomes 18 / 42
Methods Evaluation
N50 Breakpoints
Scaffolding Guided by Related Genomes 19 / 42
Methods Evaluation
N50 Breakpoints
Scaffolding Guided by Related Genomes 19 / 42
Methods Evaluation
Defined as “the size of the smallest contig (or scaffold) such that 50% of the genome is contained in contigs [or scaffolds] of size N50
Gives information about scaffold sizes only
4Gritsenko et al. 2012 Scaffolding Guided by Related Genomes 20 / 42
Methods Evaluation
Defined as “the size of the smallest contig (or scaffold) such that 50% of the genome is contained in contigs [or scaffolds] of size N50
Gives information about scaffold sizes only
4Gritsenko et al. 2012 Scaffolding Guided by Related Genomes 20 / 42
Methods Evaluation
Breakpoints are specific errors made in the resulting scaffolds. These errors are Contigs mapping to different chromosomes in the target genome Incorrect relative orientations of contigs inside a scaffold Incorrect relative ordering of contigs inside a scaffold Gap estimate more than a certain number nucleotides off from the true distance
Scaffolding Guided by Related Genomes 21 / 42
Methods Evaluation
Breakpoints are specific errors made in the resulting scaffolds. These errors are Contigs mapping to different chromosomes in the target genome Incorrect relative orientations of contigs inside a scaffold Incorrect relative ordering of contigs inside a scaffold Gap estimate more than a certain number nucleotides off from the true distance
Scaffolding Guided by Related Genomes 21 / 42
Methods Evaluation
Breakpoints are specific errors made in the resulting scaffolds. These errors are Contigs mapping to different chromosomes in the target genome Incorrect relative orientations of contigs inside a scaffold Incorrect relative ordering of contigs inside a scaffold Gap estimate more than a certain number nucleotides off from the true distance
Scaffolding Guided by Related Genomes 21 / 42
Methods Evaluation
Breakpoints are specific errors made in the resulting scaffolds. These errors are Contigs mapping to different chromosomes in the target genome Incorrect relative orientations of contigs inside a scaffold Incorrect relative ordering of contigs inside a scaffold Gap estimate more than a certain number nucleotides off from the true distance
Scaffolding Guided by Related Genomes 21 / 42
Materials
Scaffolding Guided by Related Genomes 22 / 42
Materials Chosen target genomes and contigs
Escherichia coli str. K-12 substr. MG1655 1 chromosome Pseudoxanthomonas suwonensis 11-1 1 chromosome Rhodobacter sphaeroides 2.4.1 2 chromosomes, 5 plasmids Staphylococcus aureus subsp. aureus USA300 TCH1516 1 chromosome, 2 plasmids
Scaffolding Guided by Related Genomes 23 / 42
Materials Chosen target genomes and contigs
Escherichia coli str. K-12 substr. MG1655 481 contigs Pseudoxanthomonas suwonensis 11-1 303 contigs Rhodobacter sphaeroides 2.4.1 809 contigs Staphylococcus aureus subsp. aureus USA300 TCH1516 301 contigs All contigs were produced5 with Velvet, using short paired-end reads from Illumina sequencing technologies.
5by Gritsenko et al. 2012 and Salzberg et al. 2012 Scaffolding Guided by Related Genomes 24 / 42
Materials Chosen guiding genomes
Escherichia coli str. K-12 substr. MG1655 10 genomes from the same species, but different strains Pseudoxanthomonas suwonensis 11-1 10 genomes. None from the same species as the target genome Rhodobacter sphaeroides 2.4.1 3 genomes from the same species. 7 genomes from other species Staphylococcus aureus subsp. aureus USA300 TCH1516 10 genomes from the same species, but different strains
Scaffolding Guided by Related Genomes 25 / 42
Results
Scaffolding Guided by Related Genomes 26 / 42
Results One guiding genome
Entire contigs
# Contigs 481 303 583 162 # Contigs used 421 97 387 94 # Scaffolds 4 3 13 3 N50 scaffolds 2,465,078 3,169,365 2,730,310 2,016,698 Different chromosomes 1 Different orientations 37 3 Different order 38 2 Gap errors > 500 15 77 36 12 Gap errors > 10,000 2 61 18 8 Contig ends with N = 1, 000
Contig end length 1,000 1,000 1,000 1, 000 # Contigs 481 303 583 162 # Contigs used 433 65 389 98 # Scaffolds 19 27 64 17 N50 scaffolds 597,757 46,936 84,362 252,200 Different chromosomes 2 Different orientations 6 3 Different order 5 1 2 Gap errors > 500 13 27 43 24 Gap errors > 10,000 6 21 17 16 Scaffolding Guided by Related Genomes 27 / 42
Results Multiple guiding genomes
The following plots shows the effect on different metrics when increasing the threshold value t when running GuideScaff on all datasets.
Scaffolding Guided by Related Genomes 28 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 2.5 3 3.5 x 10
6Number of guiding genomes t to agree N50 of scaffolds produced
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 2.5 3 x 10
6Number of guiding genomes t to agree N50 of scaffolds produced
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 29 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 50 100 150 200 250 300 350 400 450 500
Number of guiding genomes t to agree Number of contigs used in scaffolds
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 50 100 150 200 250 300 350 400 450
Number of guiding genomes t to agree Number of contigs used in scaffolds
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 30 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 5 10 15 20 25
Number of guiding genomes t to agree Number of contig pairs from different chromosomes
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14
Number of guiding genomes t to agree Number of contig pairs from different chromosomes
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 31 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70 80 90 100
Number of guiding genomes t to agree Number of contigs placed in wrong order
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60
Number of guiding genomes t to agree Number of contigs placed in wrong order
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 32 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 20 40 60 80 100 120
Number of guiding genomes t to agree Number of contigs placed in wrong orientations
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70
Number of guiding genomes t to agree Number of contigs placed in wrong orientations
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 33 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 20 40 60 80 100 120 140
Number of guiding genomes t to agree Number of gap estimates exceeding ∆ = 500
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60
Number of guiding genomes t to agree Number of gap estimates exceeding ∆ = 500
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 34 / 42
Results Multiple guiding genomes
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70 80 90 100
Number of guiding genomes t to agree Number of gap estimates exceeding ∆ = 10, 000
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus 1 2 3 4 5 6 7 8 9 10 5 10 15 20 25 30 35 40 45
Number of guiding genomes t to agree Number of gap estimates exceeding ∆ = 10, 000
Escherichia coli Pseudoxanthomonas suwonensis Rhodobacter sphaeroides Staphylococcus aureus
Scaffolding Guided by Related Genomes 35 / 42
Discussion
Scaffolding Guided by Related Genomes 36 / 42
Discussion Analysis of proposed method
Can handle an arbitrary number of guiding genomes Alignments can be done independently, and therefore in parallel Scaffolds precision increase with an increasingly agreement threshold value
Scaffolding Guided by Related Genomes 37 / 42
Discussion Analysis of proposed method
Can handle an arbitrary number of guiding genomes Alignments can be done independently, and therefore in parallel Scaffolds precision increase with an increasingly agreement threshold value
Scaffolding Guided by Related Genomes 37 / 42
Discussion Analysis of proposed method
Can handle an arbitrary number of guiding genomes Alignments can be done independently, and therefore in parallel Scaffolds precision increase with an increasingly agreement threshold value
Scaffolding Guided by Related Genomes 37 / 42
Discussion Analysis of proposed method
GuideScaff can be used as it is to provide scaffolds in a fast and inexpensive way It could be used as a supplement to other scaffolding algorithms
Scaffolding Guided by Related Genomes 38 / 42
Discussion Analysis of proposed method
GuideScaff can be used as it is to provide scaffolds in a fast and inexpensive way It could be used as a supplement to other scaffolding algorithms
Scaffolding Guided by Related Genomes 38 / 42
Discussion Analysis of proposed method
A global optimization could be attempted instead of the greedy algorithm Mate-pair information could be utilized if available Contig end length could be set dynamically Guiding genomes could be weighted differently
Scaffolding Guided by Related Genomes 39 / 42
Discussion Analysis of proposed method
A global optimization could be attempted instead of the greedy algorithm Mate-pair information could be utilized if available Contig end length could be set dynamically Guiding genomes could be weighted differently
Scaffolding Guided by Related Genomes 39 / 42
Discussion Analysis of proposed method
A global optimization could be attempted instead of the greedy algorithm Mate-pair information could be utilized if available Contig end length could be set dynamically Guiding genomes could be weighted differently
Scaffolding Guided by Related Genomes 39 / 42
Discussion Analysis of proposed method
A global optimization could be attempted instead of the greedy algorithm Mate-pair information could be utilized if available Contig end length could be set dynamically Guiding genomes could be weighted differently
Scaffolding Guided by Related Genomes 39 / 42
Discussion Conclusion
GuideScaff works as a proof of concept Related genomes can indeed be useful in scaffolding One guiding genome may suffice Demanding at least two guiding genomes to agree decreases all types
Using contig ends increases the scaffold correctness when genomes are very dissimilar
Scaffolding Guided by Related Genomes 40 / 42
Discussion Conclusion
GuideScaff works as a proof of concept Related genomes can indeed be useful in scaffolding One guiding genome may suffice Demanding at least two guiding genomes to agree decreases all types
Using contig ends increases the scaffold correctness when genomes are very dissimilar
Scaffolding Guided by Related Genomes 40 / 42
Discussion Conclusion
GuideScaff works as a proof of concept Related genomes can indeed be useful in scaffolding One guiding genome may suffice Demanding at least two guiding genomes to agree decreases all types
Using contig ends increases the scaffold correctness when genomes are very dissimilar
Scaffolding Guided by Related Genomes 40 / 42
Discussion Conclusion
GuideScaff works as a proof of concept Related genomes can indeed be useful in scaffolding One guiding genome may suffice Demanding at least two guiding genomes to agree decreases all types
Using contig ends increases the scaffold correctness when genomes are very dissimilar
Scaffolding Guided by Related Genomes 40 / 42
Discussion Conclusion
GuideScaff works as a proof of concept Related genomes can indeed be useful in scaffolding One guiding genome may suffice Demanding at least two guiding genomes to agree decreases all types
Using contig ends increases the scaffold correctness when genomes are very dissimilar
Scaffolding Guided by Related Genomes 40 / 42
Questions
Scaffolding Guided by Related Genomes 41 / 42
Questions
https://github.com/runarfu/GuideScaff
Scaffolding Guided by Related Genomes 42 / 42