Apicomplexan Genome Sequencing in Sanger
Arnab Pain, The Pathogen Sequencing Unit (PSU) 2nd November, 2005; ICGEB, New Delhi
Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen - - PowerPoint PPT Presentation
Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd November, 2005; ICGEB, New Delhi Overview Overview of Apicomplexan genome sequencing projects in Sanger Update on Plasmodium genome projects
Arnab Pain, The Pathogen Sequencing Unit (PSU) 2nd November, 2005; ICGEB, New Delhi
Intestinal disease in mammals Parasitise brain/ kidney of rodent Malaria Tropical theileriosis Babesiosis
Brain infection (Chinchila) Parasitises heart muscle/brain Toxoplasmosis parasitic disease of falcolns bowel diease in human Coccidiosis Insect gut parasites Cryptosporidiosis Insect gut parasites Dog parasites Oyster parasite
3D7: 3 gaps Clinical isolate: 8x IT strain: 31,000 reads (0.8x) 8x complete, prefinishing, annotation 8x complete, some finishing 8x complete, prefinish
Theileria parva Parasite of cattle: East/Central Africa
‘East-Coast Fever’
Theileria annulata Parasite of cattle: S. Europe, North Africa, Middle East, Asia
‘Tropical Theileriosis’
Theileria hirci Parasite of sheep/goats:
Middle East & Asia
(B. Shiels)
Clonal expansion of infected cells
H.nuc
Macroschizont Merozoite production merozoites Piroplasm infected erythrocytes
DNA Contiguous sequence
STS-1 STS-2 STS-3 STS-4
pUC clone end sequence physical gap sequence gap large clone end sequence “scaffold”
– Short sequence markers, mapped genetically.
– reads from mapped YAC clones align with contigs and thus position.
– DNA fragments of partial digestion of genome are sized
– Fragmented DNA diluted and replicated. STS markers detected by PCR.
Use of multiple lines of evidence
– 2.6 Mb (3 gaps) – 2 Mb (Finished) – 1.9 Mb (2 small gaps) – 1.8 Mb (Finished)
parva)
Total contig length vs No of reads. Contig no. vs No. of reads
4X 8X 6X 3X 4X 8X 6X 3X
Rep20
TARE2-5
telomere Subtelomeric repeats
VAR genes
Rifin/stevor genes Other gene families
antigenic variation House-keeping antigenic variation
VAR genes Rifin
Other families (0-2) Family_3 (0-3) Family_1 (up to 28), Family_3 (0-3) Repeats Family_5 (1-3)
(T)TTAGGG telomere
Secreted antigens House-keeping genes Secreted antigens Putative centromere
e e
A) B) C)
Other fam (0-2) [Fam-1 (up to 28), Fam-3 (0 to 4)] [Fam-3 (0 to 3)]
Subtelomeric repeats (species-specific)
Fam-5 (1-3)
[(T)TTTAGGG]n
m
TaSR3 [TaSrpt2,TaSrpt1] TpSR3 TpSR2 TpSrpt1
Chromosome 3 P falciparum Chromosome 2
Pain et al. Science (2005)
T PARVA T ANN Chr_02
TPR_related family shown in pink
P P
Plasmodium falciparum Plasmodium yoelii Plasmodium knowlesi
Hall et al. Science (2005)
Exclusively Sub-telomeric Contain 1 or more DUF529 (now called FAINT) domains Majority contain signal peptides and conserved C-termini Unequally expanded (48 in TA, 85 in TP) Expressed during macroschizont stage (EST evidence)
[TA03125 / TP01_0608, Tash1] [TA20095 / TP01_0602, TashAT2] [TA20082, TashAT3] [TA17375 / TP03_0861, Polymorphic antigen precursor / P150] [TA20085 / TP01_0604, TashAT1] [TA08425 / TP04_0437, Microneme-rhoptry protein] [TA17505, SfiI-fragment-related hypothetical protein, family 3] [TA20090 / TP01_0603, TashHN]
(332 aa) (416 aa) (465 aa) (994 aa) (1163 aa) (893 aa) (1338 aa) (2732 aa) [TA18865, Subtelomeric hypothetical protein (SVSP), family 1] [TA18950, Subtelomeric hypothetical protein (SVSP), family 1] (502 aa) (605 aa) Domain keys:
X8
IMP AMP XMP GMP GTP GDP
glutamate N-acetyl-glutamate glutamine
Amino Compounds
aspartate methionine salvage pathway NOVEL INHIBITORS glycine serine cysteine alanine proline asparagine
spermidine putrescine
Purine salvage, Pyrimidine synthesis
PEP
pyruvate
GLYCEROL GLUCOSE
fructose-6P fructose-1,6-bisP dihydroxyacetone-P + glyceraldehyde-3P
Glycolysis Glycolysis
myo-inositol-1P L-LACTATE 6-phospho- gluconate ribulose-5P ribose-5P
Pentose Phosphate Pentose Phosphate Pathway Pathway
PRPP glucosamine-6P glucosamine glucosamine-1P dihydroorotate
dUMP Methylene THF DHF THF xylulose-5P + erythrose-4P CDP dCDP DNA RNA chorismate ?
Shikimic Acid Shikimic Acid Pathway Pathway
CoA dephosphoCoA riboflavin FMN FAD CO2 pABA Folate biosynthesis glycosyl phophatidylinositol (GPI anchors)
NOVEL INHIBITORS
Glycerolipid Metabolism Glycerolipid Metabolism
3-deoxyarabino- heptulosanate- 7-phosphate dTMP Pyrimethamine Cycloguanyl
glycerol triacylglycerol phosphatidylcholine choline ethanolamine phosphatidylethanolamine Protohaem (FPIX2+) porpho- bilinogen acetoacetyl-ACP + malonyl-ACP
APICOPLAST
isopentenyl-PP 3-oxoacyl-ACP 3-hydroxy- acyl-ACP enoyl-ACP acyl-ACP acetyl-CoA
Haem Haem Biosynthesis Biosynthesis Fatty acid Fatty acid elongation elongation
Glycerolipids Triclosan Thiolactomycin ALA Haem A Haem C Ubiquinone (UQ) acetate glycine NAD+ NADH modified tRNAs ? malonyl-CoA pyruvate glucose-6P malate
NOVEL INHIBITORS
2C-methylerythrose-4P deoxyxylulose-5P
NOVEL INHIBITORS
O2 FPIX2+ Large peptides Small peptides Amino Amino acids acids Haemoglobin FPIX3+ O2- Haemozoin
FOOD VACUOLE
Chloroquine Artemesinin Quinine
PROTEASE INHIBITORS PROTEASE INHIBITORS F, V, & P-type ATPases
ADP ATP
H+
V
ADP ATP P-lipids, Cu2+,
(16)
P
Na+ H+ Ca2+ H+ Zn2+ H+ Mn2+ H+
water/ glycerol nucleotide
(2)
nucleo- side/base ADP ATP
H+
F ?
? H+ NOVEL INHIBITORS
Sulfonamides ATP hypoxanthine xanthine guanine
Folate Biosynthesis Folate Biosynthesis
7,8-dihydropteroate dihydrofolate (DHF) tetrahydrofolate (THF) pABA Pyrimethamine Cycloguanyl
Purines and Purines and Pyrimidines Pyrimidines
H2O Cytc Fe3+ Cytc Fe2+ UQ UQH2 O2
Atovaquone CO2 + aspartate
MITOCHONDRION
acetyl-CoA glucose-1P
DOXP Pathway DOXP Pathway
Fosmidomycin aminolevulinic acid (ALA)
citrate
Tricarboxylic acid Tricarboxylic acid cycle cycle
malate fumarate succinate succinyl-CoA isocitrate cis-aconitate
Fatty Acid Fatty Acid Biosynthesis Biosynthesis
NOVEL INHIBITORS
PPi
H+
(2)
Pi ADP ATP
ABC transporters
(13)
drugs?
NOVEL INHIBITORS NOVEL INHIBITORS
? malate
ubiquinone pool
Gardner et al. Nature (2002)
Ta Pf Sp Sc At Ce Ag Dm Hs Ta Pf
dN/dS ratio of genes with and without signal peptides in T.parva and T. annulata
0.05 0.1 0.15 0.2 0.25 0.3 0.35
4 9 . 1
1 4 9 . 2
2 4 9 . 3
3 4 9 . 4
4 4 9 . 5
5 4 9 . 6
6 4 9 . 7
7 4 9 . 8
8 4 9 dN/dS Proportion of genes sigp non_sigp
The dN/dS distribution is different. Possibly proteins with signal peptide are interacting with the host immune system hence dN/dS may be an indicator of antigenicity
Pb vs Py Pb vs Pc Pc vs Py Pb vs Pf Pc vs Pf All rodent All
Identity 88 % 83 % 85 % 63 % 62%
Identity 91 % 87 % 88 % 70 % 70 %
4.2% 7% 6.7% 27% 26.9%
26 % 48% 53% 26% 26.5%
0.16 0.13 0.11 0.009 0.0092
pairs detected 3153 4678 3318 3890 3842 2528 2125
Most orthologues are easily identifiable, so where do the differences lie?
– 266K Plasmodium reads (~8x) – Only just completed – all prelim. analysis to date on 5X
–
–
– 36K Plasmodium reads (~1X) –
– chimp parasite – 51K Plasmodium reads (~1x)
PF3D7 (chromosome)
NNNNNNNNNN NNN NNN NNN
PFCLIN (ordered contigs) PFCLIN (annotated contigs)
Mapping reads onto 3D7 assembly using SSAHA
(1135) (738, 633) (576, 559) (446) (1140) (963, 789) (809, 601) (464)