Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen - - PowerPoint PPT Presentation

apicomplexan genome sequencing in sanger
SMART_READER_LITE
LIVE PREVIEW

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen - - PowerPoint PPT Presentation

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd November, 2005; ICGEB, New Delhi Overview Overview of Apicomplexan genome sequencing projects in Sanger Update on Plasmodium genome projects


slide-1
SLIDE 1

Apicomplexan Genome Sequencing in Sanger

Arnab Pain, The Pathogen Sequencing Unit (PSU) 2nd November, 2005; ICGEB, New Delhi

slide-2
SLIDE 2

Overview

  • Overview of Apicomplexan genome

sequencing projects in Sanger

  • Update on Plasmodium genome projects
  • Theileria and Tropical Theileriosis
  • Sequencing & annotation strategies
  • Genome architecture
  • Gene families
  • Metabolic reconstruction
  • Comparative genomics (eg. dN/dS analysis,

synteny)

  • SNPs/ INDELs
slide-3
SLIDE 3

Apicomplexans and ongoing genome seuencing projects

slide-4
SLIDE 4

Overview of Eukaryotic Pathogen Sequencing in PSU

slide-5
SLIDE 5

The World of Apicomplexans!

Intestinal disease in mammals Parasitise brain/ kidney of rodent Malaria Tropical theileriosis Babesiosis

  • Periph. eosinophilia (30%)

Brain infection (Chinchila) Parasitises heart muscle/brain Toxoplasmosis parasitic disease of falcolns bowel diease in human Coccidiosis Insect gut parasites Cryptosporidiosis Insect gut parasites Dog parasites Oyster parasite

slide-6
SLIDE 6

Update on Plasmodium genome projects

  • P. falciparum (Gardner et al 2002)
  • P. reichenowi (3x shotgun in progress)
  • P. knowlesi (3x shotgun)
  • P. berghei (3x shotgun complete)
  • P. yoelii (5.6x, Carlton et al 2002)
  • P. chabaudi (3x shotgun complete)
  • P. gallinaceum (3x shotgun in progress)
  • P. vivax (complete sequencing)

3D7: 3 gaps Clinical isolate: 8x IT strain: 31,000 reads (0.8x) 8x complete, prefinishing, annotation 8x complete, some finishing 8x complete, prefinish

slide-7
SLIDE 7

Theileria and Tropical Theileriosis

slide-8
SLIDE 8

Genus-Theileria

Theileria parva Parasite of cattle: East/Central Africa

‘East-Coast Fever’

Theileria annulata Parasite of cattle: S. Europe, North Africa, Middle East, Asia

‘Tropical Theileriosis’

Theileria hirci Parasite of sheep/goats:

  • S. Europe, North Africa,

Middle East & Asia

(B. Shiels)

slide-9
SLIDE 9

Theileria annulata

Disease; Tropical Theileriosis 250 million cattle are at risk Pathogenic in exotic animals, up to 70% mortality Mild to moderate pathogenicity in indigenous breeds but productivity loss

slide-10
SLIDE 10

Theileria parva

Disease: East Coast Fever 50 million cattle at risk Highly pathogenic in naïve animals 97-100 % mortality rates

slide-11
SLIDE 11

Clonal expansion of infected cells

H.nuc

Macroschizont Merozoite production merozoites Piroplasm infected erythrocytes

Theileria Life Cycle

slide-12
SLIDE 12

Clinical Pathology Tropical Theileriosis (T. annulata) Following lymph node enlargement get fever Marked anaemia - pale mucous membranes which may become jaundiced, diarrhoea/blood stained faeces common Sub acute/chronic cases show intermittent fever, anaemia and Jaundice can be seen Poor condition and convalescence is protracted

slide-13
SLIDE 13

Sequencing & annotation strategies

slide-14
SLIDE 14

Shotgun sequencing

DNA Contiguous sequence

STS-1 STS-2 STS-3 STS-4

pUC clone end sequence physical gap sequence gap large clone end sequence “scaffold”

slide-15
SLIDE 15

Strategy

  • Separate chromosomes by PFGE.
  • Shotgun sequence individual

chromosomes

  • Align Contigs to map and close

gaps using PCR/primer walking.

slide-16
SLIDE 16

Map Resources

  • Mapped STS markers.

– Short sequence markers, mapped genetically.

  • Mapped YAC clones.

– reads from mapped YAC clones align with contigs and thus position.

  • Optical map

– DNA fragments of partial digestion of genome are sized

  • ptically and tiled, providing ordered restriction fragments.
  • HAPPY MAP

– Fragmented DNA diluted and replicated. STS markers detected by PCR.

slide-17
SLIDE 17

Curating gene models in Artemis

Use of multiple lines of evidence

slide-18
SLIDE 18
  • T. annulata Genome

From Karyotype:

  • Four chromosomes

– 2.6 Mb (3 gaps) – 2 Mb (Finished) – 1.9 Mb (2 small gaps) – 1.8 Mb (Finished)

From Sequencing:

  • Number of bases: 8,351,610
  • Gene number: 3792
  • Genes with orthologues in T. parva: 3265
  • GC percentage: 32.5
  • Unique T. annulata genes: 34 (60 in T.

parva)

slide-19
SLIDE 19

Gene finding & Annotation

slide-20
SLIDE 20

e.g. Theileria annulata (~ 8.4 Mb)

Total contig length vs No of reads. Contig no. vs No. of reads

4X 8X 6X 3X 4X 8X 6X 3X

slide-21
SLIDE 21

Genome architecture

slide-22
SLIDE 22

The Chromosomes of P. falciparum

Rep20

TARE2-5

telomere Subtelomeric repeats

VAR genes

Rifin/stevor genes Other gene families

antigenic variation House-keeping antigenic variation

VAR genes Rifin

slide-23
SLIDE 23

Chromosome Structure (Theileria)

Other families (0-2) Family_3 (0-3) Family_1 (up to 28), Family_3 (0-3) Repeats Family_5 (1-3)

(T)TTAGGG telomere

Secreted antigens House-keeping genes Secreted antigens Putative centromere

slide-24
SLIDE 24

Telomeres

  • T. parva
  • T. parva
  • T. annulata
  • T. parva

e e

  • T. annulata
  • T. annulata

A) B) C)

Other fam (0-2) [Fam-1 (up to 28), Fam-3 (0 to 4)] [Fam-3 (0 to 3)]

Subtelomeric repeats (species-specific)

Fam-5 (1-3)

[(T)TTTAGGG]n

m

TaSR3 [TaSrpt2,TaSrpt1] TpSR3 TpSR2 TpSrpt1

slide-25
SLIDE 25
  • P. falciparum & P. vivax: sub-telomeric

species-specific gene families

  • P. falciparum
  • P. vivax
slide-26
SLIDE 26

Centromeres

  • P. falciparum

Chromosome 3 P falciparum Chromosome 2

  • T. annulata
slide-27
SLIDE 27

Synteny: TA & TP ACT comparison

slide-28
SLIDE 28

Comparative Genomics: synteny

slide-29
SLIDE 29

Comparing genomes with Artemis Comparison Tool (ACT): Chr 02

BlastN & TBlastX

  • T. annulata
  • T. parva
slide-30
SLIDE 30

Pain et al. Science (2005)

slide-31
SLIDE 31
  • T. parva TPR loci

T PARVA T ANN Chr_02

TPR_related family shown in pink

slide-32
SLIDE 32

ACT comparison: 3 malaria species

P P

  • P. knowlesi
slide-33
SLIDE 33

“species-specific” genes at interruptuions in synteny

Plasmodium falciparum Plasmodium yoelii Plasmodium knowlesi

slide-34
SLIDE 34

Plasmodium core proteome and “species- specific” genes

Hall et al. Science (2005)

slide-35
SLIDE 35

ACT Comparison: 3D7 vs PFCLIN

slide-36
SLIDE 36

Gene Families

slide-37
SLIDE 37

Clustering Theileria proteins

  • All peptide sets from TA and TP combined
  • BLASTed against itself with a cutoff of E=10-5.
  • TRIBE-MCL run with an inflation value of 5 (quite stringent).
  • Each checked for numbers of peptides from each organism.

To identify which gene families have expanded in which

  • rganisms.
  • Clusters annotated using predicted products in TA & TP
slide-38
SLIDE 38

Theileria-specific gene families: Family 1 (SVSP)

Exclusively Sub-telomeric Contain 1 or more DUF529 (now called FAINT) domains Majority contain signal peptides and conserved C-termini Unequally expanded (48 in TA, 85 in TP) Expressed during macroschizont stage (EST evidence)

slide-39
SLIDE 39

DUF529 domain containing proteins Frequently Appears IN Theileria: FAINT

  • Only found in Theileria proteins
  • Highly diverged ~70 residue domain
  • Majority of FAINT- domain containing proteins have signal peptides
  • > 900 copies per genome (in at least 166 Theileria annulata proteins)
  • Many are expressed at least at the macroschizont stage
slide-40
SLIDE 40

Comparative Genomics: protein domains

slide-41
SLIDE 41

Architecture of Theileria proteins with FAINT domain

[TA03125 / TP01_0608, Tash1] [TA20095 / TP01_0602, TashAT2] [TA20082, TashAT3] [TA17375 / TP03_0861, Polymorphic antigen precursor / P150] [TA20085 / TP01_0604, TashAT1] [TA08425 / TP04_0437, Microneme-rhoptry protein] [TA17505, SfiI-fragment-related hypothetical protein, family 3] [TA20090 / TP01_0603, TashHN]

  • FAINT
  • AT-hook
  • PEST
  • PT
  • Signal peptide

(332 aa) (416 aa) (465 aa) (994 aa) (1163 aa) (893 aa) (1338 aa) (2732 aa) [TA18865, Subtelomeric hypothetical protein (SVSP), family 1] [TA18950, Subtelomeric hypothetical protein (SVSP), family 1] (502 aa) (605 aa) Domain keys:

slide-42
SLIDE 42

Whole genome domain

  • rganisation of pfEMP1 proteins
  • 59 var genes in total.
  • Expressed on red

cell surface and involved in sequestration

  • 3 types of domain.

– DBL- duffy binding like – CIDR- cystine rich interdomain region – C2 - constant2

slide-43
SLIDE 43

Comparative Genomics: metabolic reconstruction

slide-44
SLIDE 44
slide-45
SLIDE 45

KEGG: Phospholipid metabolism

X8

slide-46
SLIDE 46

IMP AMP XMP GMP GTP GDP

glutamate N-acetyl-glutamate glutamine

Amino Compounds

aspartate methionine salvage pathway NOVEL INHIBITORS glycine serine cysteine alanine proline asparagine

  • rnithine
  • rnithine

spermidine putrescine

Purine salvage, Pyrimidine synthesis

PEP

pyruvate

GLYCEROL GLUCOSE

fructose-6P fructose-1,6-bisP dihydroxyacetone-P + glyceraldehyde-3P

Glycolysis Glycolysis

myo-inositol-1P L-LACTATE 6-phospho- gluconate ribulose-5P ribose-5P

Pentose Phosphate Pentose Phosphate Pathway Pathway

PRPP glucosamine-6P glucosamine glucosamine-1P dihydroorotate

  • rotate

dUMP Methylene THF DHF THF xylulose-5P + erythrose-4P CDP dCDP DNA RNA chorismate ?

Shikimic Acid Shikimic Acid Pathway Pathway

CoA dephosphoCoA riboflavin FMN FAD CO2 pABA Folate biosynthesis glycosyl phophatidylinositol (GPI anchors)

NOVEL INHIBITORS

Glycerolipid Metabolism Glycerolipid Metabolism

3-deoxyarabino- heptulosanate- 7-phosphate dTMP Pyrimethamine Cycloguanyl

  • xaloacetate

glycerol triacylglycerol phosphatidylcholine choline ethanolamine phosphatidylethanolamine Protohaem (FPIX2+) porpho- bilinogen acetoacetyl-ACP + malonyl-ACP

APICOPLAST

isopentenyl-PP 3-oxoacyl-ACP 3-hydroxy- acyl-ACP enoyl-ACP acyl-ACP acetyl-CoA

Haem Haem Biosynthesis Biosynthesis Fatty acid Fatty acid elongation elongation

Glycerolipids Triclosan Thiolactomycin ALA Haem A Haem C Ubiquinone (UQ) acetate glycine NAD+ NADH modified tRNAs ? malonyl-CoA pyruvate glucose-6P malate

NOVEL INHIBITORS

2C-methylerythrose-4P deoxyxylulose-5P

NOVEL INHIBITORS

O2 FPIX2+ Large peptides Small peptides Amino Amino acids acids Haemoglobin FPIX3+ O2- Haemozoin

FOOD VACUOLE

Chloroquine Artemesinin Quinine

PROTEASE INHIBITORS PROTEASE INHIBITORS F, V, & P-type ATPases

ADP ATP

H+

V

ADP ATP P-lipids, Cu2+,

  • ther cations?

(16)

P

Na+ H+ Ca2+ H+ Zn2+ H+ Mn2+ H+

water/ glycerol nucleotide

  • r nt-sugar?

(2)

nucleo- side/base ADP ATP

H+

F ?

? H+ NOVEL INHIBITORS

Sulfonamides ATP hypoxanthine xanthine guanine

Folate Biosynthesis Folate Biosynthesis

7,8-dihydropteroate dihydrofolate (DHF) tetrahydrofolate (THF) pABA Pyrimethamine Cycloguanyl

Purines and Purines and Pyrimidines Pyrimidines

H2O Cytc Fe3+ Cytc Fe2+ UQ UQH2 O2

  • r

Atovaquone CO2 + aspartate

MITOCHONDRION

acetyl-CoA glucose-1P

DOXP Pathway DOXP Pathway

Fosmidomycin aminolevulinic acid (ALA)

  • xoglutarate

citrate

Tricarboxylic acid Tricarboxylic acid cycle cycle

  • xaloacetate

malate fumarate succinate succinyl-CoA isocitrate cis-aconitate

Fatty Acid Fatty Acid Biosynthesis Biosynthesis

NOVEL INHIBITORS

PPi

H+

(2)

Pi ADP ATP

ABC transporters

(13)

drugs?

NOVEL INHIBITORS NOVEL INHIBITORS

? malate

  • xaloacetate
  • r

ubiquinone pool

Gardner et al. Nature (2002)

? ++

slide-47
SLIDE 47

Transcription-Associated Proteins (TAPs)

Ta Pf Sp Sc At Ce Ag Dm Hs Ta Pf

slide-48
SLIDE 48

Transcription Associated Proteins (TAPs)

slide-49
SLIDE 49

Comparative Genomics: genes under selection

slide-50
SLIDE 50

The dN/dS ratio is higher on average in genes with signal peptides

dN/dS ratio of genes with and without signal peptides in T.parva and T. annulata

0.05 0.1 0.15 0.2 0.25 0.3 0.35

  • .

4 9 . 1

  • .

1 4 9 . 2

  • .

2 4 9 . 3

  • .

3 4 9 . 4

  • .

4 4 9 . 5

  • .

5 4 9 . 6

  • .

6 4 9 . 7

  • .

7 4 9 . 8

  • .

8 4 9 dN/dS Proportion of genes sigp non_sigp

The dN/dS distribution is different. Possibly proteins with signal peptide are interacting with the host immune system hence dN/dS may be an indicator of antigenicity

slide-51
SLIDE 51

Orthologous genes under ‘selection’

Pb vs Py Pb vs Pc Pc vs Py Pb vs Pf Pc vs Pf All rodent All

  • Av. Protein

Identity 88 % 83 % 85 % 63 % 62%

  • Av Nucleotide

Identity 91 % 87 % 88 % 70 % 70 %

  • Median dN

4.2% 7% 6.7% 27% 26.9%

  • Median dS

26 % 48% 53% 26% 26.5%

  • Mean dN/dS

0.16 0.13 0.11 0.009 0.0092

  • Number of
  • rthologous gene

pairs detected 3153 4678 3318 3890 3842 2528 2125

Most orthologues are easily identifiable, so where do the differences lie?

slide-52
SLIDE 52

Comparative Genomics: genes under selection

slide-53
SLIDE 53

Comparing P. falciparum strains / isolates

  • Genes not present in 3D7
  • Genes unique to 3D7
  • Rapidly evolving genes
  • SNP detection
  • Indels (insertions, deletions)
  • Any other noteworthy differences from

3D7

slide-54
SLIDE 54

Data

  • Plasmodium falciparum (3D7) genome

– Laboratory-adapted isolate, completed 2002 – 23 Mb, 14 chromosomes

  • Sanger Pathogen Group projects

  • P. falciparum Ghanaian Isolate (PFCLIN)

– 266K Plasmodium reads (~8x) – Only just completed – all prelim. analysis to date on 5X

  • P. falciparum IT strain (IT)

  • Ex. Brazil, laboratory-adapted

– 36K Plasmodium reads (~1X) –

  • P. riechenowi (REICH)

– chimp parasite – 51K Plasmodium reads (~1x)

slide-55
SLIDE 55

Analysis

  • Genome reconstruction by mapping

PFCLIN (X5 assembly) contigs onto 3D7 scaffold (ACT)

  • Comparative annotation
  • Mapping WGS reads onto 3D7 scaffold

(SSAHA2-Sequence Search and Alignment by Hashing Algorithm )

– SNPs – INDELs

slide-56
SLIDE 56

Contig ordering & annotation of PFCLIN

PF3D7 (chromosome)

NNNNNNNNNN NNN NNN NNN

PFCLIN (ordered contigs) PFCLIN (annotated contigs)

slide-57
SLIDE 57

Contig ordering I: Chr 1

slide-58
SLIDE 58

Mapping reads onto 3D7 assembly using SSAHA

slide-59
SLIDE 59

SNP detection (SSAHA2): Chr1 & Chr2

(1135) (738, 633) (576, 559) (446) (1140) (963, 789) (809, 601) (464)

slide-60
SLIDE 60

Acknowledgements