[PPT] - Challenges of ancient genomics and pan-genomics Kay Nieselt Center PowerPoint Presentation

SLIDE 1

Challenges of ancient genomics and pan-genomics

Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen

SLIDE 2

Overview

Introduction:
ancient DNA
(microbial) paleogenomics
Challenges of computational paleogenomics

2 |

SLIDE 3

Ancient DNA - Paleogenomics

With advent of NGS studying extinct organisms has become possible Most prominent extinct organisms are:

Neandertals
Other humans, e.g. denisovans
Mammoth
Lately also: bacteria and viruses

Field of Paleogenomics: reconstruction and analysis

f ancient genomes

3 |

SLIDE 4

Microbial Paleogenomics

Motivation:

Emergence of human diseases
Evolution of Bacteria and Viruses
Co-evolution with host

4 |

SLIDE 5

Computational Paleogenomics

1. Genome assembly:
reference-based (reconstruction of genome

via mapping)

de novo
2. SNP calling
3. Genome comparison
4. Phylogeny reconstruction

5 |

SLIDE 6

Ancient DNA - Workflow

6 | Source of Figure: Janet Kelso, Keynote BioVis 2016

SLIDE 7

Ancient DNA - Issues

Short fragments:

mean fragment length between 60 and 150 bp

Damaged:

aDNA chemically damages -> C->T (and G->A) conversion at 5’ (3’) end of fragment

Metagenomes:

sample contains complex background (mixture

f ancient and modern DNA)
Contamination:

nature of complex background can also be due to contamination

Low amount of endogenous DNA

7 |

SLIDE 8

Ancient DNA - Approaches

Short fragments:

Produce paired-end reads longer than mean fragment length, after sequencing these are merged into a common single read

8 |

Forward read Reverse read Overlap Merged read

SLIDE 9

Ancient DNA - Approaches

Damage:

used for authentication, afterwards often treated with UDG

9 |

SLIDE 10

Ancient DNA - Issues

Metagenomes:

sample contains complex background (mixture

f ancient and modern DNA)
Possible approach: Use Malt 1 to characterize

taxonomic content

Low amount of endogenous DNA:

Approach: specific enrichment for target species Problems: modern reference, no de novo assembly,…

10 |

1Herbig et al, bioRxiv 2016

SLIDE 11

EAGER1 – a fully automated (ancient) genome

reconstruction pipeline

11 |

Preprocessing

F a s t Q

Clip&Merge FastQC Read Mapping FastQ BWA, BWA-mem … Samtools DeDup QualiMap MapDamage Preseq CircularMapper SAM BAM Genotyping GATK: Preprocessing HaplotypeCaller UnifiedGenotyper VariantFiltration VCF2Genome FastA VCF Schmutzi ANGSD Genot. Likelih.

1Peltzer et al, Genome Biol. 2016

SLIDE 12

Challenge 1: Genome Reconstruction

Question: how to reconstruct the genome of the DNA of interest from the raw reads

by mapping against a reference (only one?), or
by de novo assembly, or
by a hybrid approach?

12 |

SLIDE 13

Challenge 1: Genome Reconstruction

Approaches:

by mapping against a reference:

Problems: 1) single-reference mapping, 2) for aDNA only modern references used

by de novo assembly:

Problems: low coverage, almost never mate pairs, ... Possible approach: MADAM 1 – improved ancient DNA

genome assembly

by hybrid of both: not aware how

13 |

1Seitz and Nieselt, PeerJ Preprints 2016

SLIDE 14

Challenge 2: Calling SNPs

To call SNPs: After mapping typical (best practice) approach:

Apply tools such as GATK or freebayes or ANGSD

to mapping assemblies (i.e, bam files) Challenge: how to efficiently call SNPs from de novo assemblies when input is low coverage (ancient) DNA?

14 |

SLIDE 15

Challenge 3: Genome Comparison

Comparison of ancient and modern genomes:

Large-scale variations: genomic rearrangements

(translocations, inversions)

Small-scale variations: gene content, insertions,

deletions, SNPs

15 |

SLIDE 16

Challenge 3: Genome Comparison

Ancient genomes: mostly built from single common reference Modern genomes: assembled Challenge: how to compare?

16 |

SLIDE 17

Challenge 3: Genome Comparison

Comparative analyses based on genomic positions is challenging due to different coordinate systems across genomes. Possible approaches:

compare genomes via one specific reference

But: genomic regions that cannot be aligned to the reference are lost.

r

17 |

SLIDE 18

18 |

Build metareference that contains the superset

f all genomes

SuperGenome1 (other call this pan-genome)

1Herbig et al, Bioinformatics 2012

SLIDE 19

SuperGenome

19

SuperGenome: Common coordinate system of all aligned genomes, independent of a prechosen reference genome, together with injective mapping of each genome onto SuperGenome Input: WGA of all known genomes of species of interest

SLIDE 20

The SuperGenome and Pan-Genome

Nice „side effect“ of SuperGenome: it allows

consistent assignment of coordinates to genomic

annotations (e.g. genes)

straightforward determination of pan-genome*

by determining the orthologs from overlapping coordinates in the SuperGenome

(* For us the pan-genome (aka supra-genome) is the full complement of all genes of a clade, e.g. species)

20 |

SLIDE 21

Challenge 3A: WGA

Sheer size of available genomes of a single species: currently up to thousands Challenge: How to compute a WGA of thousands of genomes?

21 |

SLIDE 22

Challenge 4: Phylogenetics

Phylogenetic trees from aDNA and modern genomes are reconstructed

1. to assess history of evolution,
2. to date `root’ and/or emergence of specific

clades.

22 |

SLIDE 23

Challenge 4: Phylogenetics

How reliable are phylogenetic trees built from genomic data reconstructed from short reads? Typical workflow:

1. map short reads to a single reference sequence,
2. extract single nucleotide polymorphisms (SNPs),
3. infer phylogenetic tree from the aligned SNP

positions.

23 |

SLIDE 24

Biased phylogenies? - I

1. Source of common alignment?

Reads mapped to one reference -> bias? Possible answer: REALPHY1 (Reference sequence Alignment-based Phylogeny builder)

24 |

1Bertels et al, Mol. Biol. Evol. 2014

SLIDE 25

Biased phylogenies? - II

2. SNPs versus whole genome-based phylogeny?

Bias of ML trees reconstructed from SNP-only positions shown by Bertels1 Possible approach to further investigate: TreeToReads2 - a pipeline for simulating raw reads from phylogenies

25 |

1Bertels et al, Mol. Biol. Evol. 2014 2McTavish et al, bioRxiv 2016

SLIDE 26

Biased phylogenies? - III

3. Influence of `Ns´on phylogeny?

Many aDNA samples suffer from low amount of endogenous DNA -> low coverage genomes -> missing positions (`Ns´) Question / Challenges:

impact of `complete column deletion´versus

`partial column deletion´

threshold of coverage for `good´phylogeny

26 |

SLIDE 27

Biased phylogenies? - IV

4. SNP-based phylogenies without alignment

and/or genome reconstruction? Possible approach: kSNP3.0 3

5. General challenge: Alignment-free versus

Alignment-based phylogenies? 4

27 |

3Gardner et al, Bioinformatics 2015 4Haubold, Brief. Bioinformatics 2013

SLIDE 28

Summary:

Challenges of ancient (pan-)genomics:

1. Reconstruction of ancient genomes from

bacteria (or viruses) via mapping or assembly or hybrid approach

2. SNP calling of low coverage genomes
3. WGA of ancient and modern genomes
4. Phylogenetic tree reconstruction from ancient

and modern genomes

28 |

SLIDE 29

References

Bertels F, Silander OK, Pachkov M, Rainey PB & van Nimwegen E (2014). Automated reconstruction of whole-genome phylogenies from short- sequence reads. Molecular biology and evolution, 31, 1077-1088. Gardner SN, Slezak T & Hall BG (2015). kSNP3. 0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference

genome. Bioinformatics (Oxford, England), 31, 2877.

Haubold B (2014). Alignment-free phylogenetics and population genetics. Briefings in bioinformatics, 15, 407-418. Herbig A, Jäger G, Battke F & Nieselt K (2012). GenomeRing: alignment visualization based on SuperGenome coordinates. Bioinformatics 28: i7–i15 Herbig A, Maixner F, Bos KI, Zink A, Krause J & Huson DH (2016). MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman. bioRxiv, 050559. McTavish EJ, Pettengill J, Davis S, Rand H, Strain E, Allard M & Timme, RE (2016). TreeToReads-a pipeline for simulating raw reads from phylogenies. bioRxiv, 037655. Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J & Nieselt, K (2016). EAGER: efficient ancient genome reconstruction. Genome biology, 17, 1. Seitz A & Nieselt K, Improving ancient genome assembly. PeerJ Preprints 2016.

29 |