Around the resistome in 80 ways:
an empirical evaluation of antimicrobial resistance gene detection methods
Finlay Maguire finlaymaguire@gmail.com December 2, 2019
Faculty of Computer Science, Dalhousie University
Around the resistome in 80 ways: an empirical evaluation of - - PowerPoint PPT Presentation
Around the resistome in 80 ways: an empirical evaluation of antimicrobial resistance gene detection methods Finlay Maguire finlaymaguire@gmail.com December 2, 2019 Faculty of Computer Science, Dalhousie University Table of contents 1.
Around the resistome in 80 ways:
an empirical evaluation of antimicrobial resistance gene detection methods
Finlay Maguire finlaymaguire@gmail.com December 2, 2019
Faculty of Computer Science, Dalhousie University
Table of contents
1
Background
Evolution of Eukaryotic Endosymbioses
(Maguire, 2016)
2
Antimicrobial Resistance
(Matthews et al., 2018)
3
Epidemiology
(Stairs et al., 2019)
4
Sociology?
5
Sociology?
5
Why do we care about AMR?
AMR is currently a problem
2015 EU/EEA: 33,110 deaths, Data from (Cassini et al., 2019).
6
AMR is growing
WHO Global Health Observatory Data Repository.
7
What can we do about it?
Improve surveillance
8
Improve surveillance
8
Improve surveillance
8
Improve surveillance
8
Improve diagnostics
(Goossens et al., 2005)
9
How do we do this?
Phenotypically?
(Bradley et al., 2015)
10
DNA sequencing
11
Downside of DNA: capacity not expression
12
Downside of DNA: capacity not expression
mutation
12
Which DNA sequencing method?
Choosing a method
Biological Sample Sequencing Analysis AMR Genes AMR Database Simulated Data Analysis AMR Genes Additional factors:
13
Choosing a method
Biological Sample Sequencing Analysis AMR Genes AMR Database Simulated Data Analysis AMR Genes Additional factors:
13
Choosing a method
Biological Sample Sequencing Analysis AMR Genes AMR Database Simulated Data Analysis AMR Genes Additional factors:
13
Targeted sequencing
Targeted sequencing
Biological Sample Oligonucleotide Probes Enriched DNA Fragment Gel/Cloning/Sequencing
14
Choosing and evaluating primers
Testing primers computationally
github.com/mwhall/VAware: Mike Hall
Needleman-Wunsch alignments:
15
Which primers?
European Committee on Antimicrobial Susceptibility Testing: 78 PCR Primer Sets
16
Which AMR genes?
baumannii haemolyticus l w
i nosocomialis c
i jejuni a e r
e n e s asburiae cloacae kobei inf uenzae p a r a i n f u e n z a e pylori
pneumoniae gonorrhoeae m e n i n g i t i d i s aeruginosa f uorescens putida stutzeri enterica d y s e n t e r i a e f exneri s
n e i tuberculosis b
u l i n u m dif cile perfringens tetani faecalis f a e c i u m aureus epidermidis intermedius pseud- intermedius a g a l a c t i a e anginosus pneumoniae pyogenes
1 2 4 259 3356 3 7 16 2 1 2 15 1 6 73 255 23 40 957 1 4 2 3 8 1 4 4 3 1 4 20 1 7 8 4 1 2 60 4 89 457 2 5 68 5 7 1 7 637 2 1 3 8 124 5 1 995 15 63 1 2 3 310 1046 6318 26 24 539 9 8 4 1826 1 8 5 40 3468 28 5 112 3 19 80 12 4 2 5 3 551 5 4 5 6 8 6 1 2 4 4 4 1 7 5 6 1 12 1446 181 1 997 48 7 6 185 65 12 1 3 7 3 14 46 93 2 2 11 21 37 7 5 2 5 8 199 1143 349 2171 7944 16 16 528 6 1 29 4 1 1 3 59 8 9 8 3 5 1 41 55 4 8 3 1 5 1 2 8 345 l l lCAR D R esistomes & V ariants
lf f i
CARD-prevalence: 85 pathogens, 116,914 resistomes (chromosome, plasmid, and WGS assembly). Brian Alcock/McArthur Lab
17
How well do these primers work?
Surprisingly poorly
missed
18
Lots of serious mismatches
No primer alignment in 27.58% of tetD alleles
19
Stagnation of primers
20
Can we improve on this?
Designing probes with up-to-date AMR allele diversity
(Guitor et al., 2019)
21
Downsides of targeted-approaches
22
Why do we care about context?
Genomics
Case-study on strengths of genomics
3193 (V2) 3125 (G) 3146 (I) 3149 (J1) 3147 (I) 3186 (L) 3200 (N) 3191 (M) 3197 (M) 3 1 4 2 ( H ) 3353 (O) 3 1 3 3 ( H ) 3135 (H) 3 1 3 7 ( H ) 3138 (H) 3139 (H) 3156 (K) 3158 (K) 1760 (P1) 1773 (P1) 1 7 6 2 ( P 1 ) 1772 (P1) 1 7 7 ( P 1 ) 1771 (P1) 1 7 6 6 ( P 1 ) 1775 (P1) 1767 (P1) 1768 (P1) 1 7 6 9 ( P 1 ) 3140 (Q1) 3 1 6 8 ( Q 1 ) 3332 (R) 3342 (R) 1792 (Z) 1793 (T) 1803 (A2) 1890 (A2) 1888 (A2) 1 8 9 1 ( A 2 ) 1811 (A2) 3126 (Q2) 3 1 7 6 ( A D ) 3128 (AB) 3171 (V1) 3339 (S1) 3 1 4 3 ( A C ) 1892 (AA) 1893 (AA) 3333 (S2) 3151 (W) 3162 (U) 3198 (X) 3160 (U) 3 1 6 6 ( U ) 3199 (Y) 1 7 9 7 ( A 1 ) 2 3 ( B ) 2 5 ( B ) 3 1 6 7 ( C ) 3169 (D) 3 1 8 ( D ) 3348 (F) 3352 (O) 3 1 8 1 ( E ) 3 3 3 ( F ) 3179 (J2) 3184 (J2) 3305 (S1) 3306 (S1) 3 3 2 4 ( S 1 ) 3351 (S1) 3322 (S1) 3344 (S1) 3341 (S1) 3321 (S1) 3 3 2 6 ( S 1 ) 3132 (Q2) 3 3 1 4 ( S 1 ) 3134 (Q2) 3144 (Q2) 3145 (Q2) 3302 (S1) 3323 (S1) 3311 (S1) 3319 (S1) 3 3 3 6 ( S 1 ) 3337 (S1) 3313 (S1) 3315 (S1) 3 3 1 8 ( S 1 ) 3338 (S2) 3310 (S1) 3349 (S1) 3317 (S1) 1783 (P2) 1758 (P2) 1 7 7 8 ( P 2 ) 0.056 substitutions per site AMOCLA AMPICI AZITHR CEFOXI Resistant Susceptible Intermediate resistance CEFTIF CEFTRI CHLORA CIPROF GENTAM NALAC STRETP SULFIZ TETRA TRISULSIR Status Serotype
Kentucky Hadar Heidelberg I:4,[5],12:i: Enteritidis T yphimurium Thompson(Maguire et al., 2019)
23
Phenotype prediction modelling
RGI+CARD K-mers T allying Logistic Regression Set-Covering Machines Genomes AMR Genes Phenotype Decompose
(Maguire et al., 2019)
24
Genomes allow gene-free models
A B
(Maguire et al., 2019)
25
Generate co-selection hypotheses
(Maguire et al., 2019)
26
Generate co-selection hypotheses
A B
(Maguire et al., 2019)
26
Generate co-selection hypotheses
ISEcp1 CMY-2 Blc sugE
(Maguire et al., 2019)
26
Downsides of genomics
We need genomes to identify previously unknown factors, but:
27
Metagenomics
Read-based AMR Metagenomics
Genomes Reads AMR Genes
Sequencing AMR detection 28
Difficulties of metagenomics
AMR genes are rare genomically All (~324M) AMR (~2.1M) 107 108
log(Read Count) AMR Reads in Metagenome (0.643%)
2184 CARD-prevalence genomes at 1-10X abundance
29
AMR genes have wildly different abundances
1236 AMR PATRIC genomes
30
AMR sequence space overlaps
1000 500 500 1000 1000 500 500 1000 Actual Families 1000 500 500 1000 1000 500 500 1000 Affinity Clusters (Adj. Rand=0.30041)
MDS of CARD Proteins BLASTP-%ID
31
Choosing an analysis approach
Simulate data and compare tools
NT Query & NT CARD Database Methods ESKAPE Genomes Resistance Gene Identifier + CARD ART Read Simulator Labeled Simulated Metagenome ORFM Predicted ORF Protein Sequences NT Query & AA CARD Database Methods AA Query & AA CARD Database Methods
32
Terminology refresher
bit.ly/2pZzxJU
33
How well do different methods do?
We can find reads from AMR genes
34
We can mostly identify which family
35
We cannot identify which specific gene
36
Highly similar families to blame
37
Is there any way to improve this?
Statistical/Machine-Learning Correction
DIAMOND-BLASTX Output Classifier AMR Gene Predictions
38
Statistical/Machine-Learning Correction
DIAMOND-BLASTX Output Classifier AMR Gene Predictions
38
Statistical/Machine-Learning Correction
DIAMOND-BLASTX Output Classifier AMR Gene Predictions
38
Statistical/Machine-Learning Correction
DIAMOND-BLASTX Output Classifier AMR Gene Predictions Average Precision: 0.63
38
Statistical/Machine-Learning Correction
DIAMOND-BLASTX Output Classifier AMR Gene Predictions Average Precision: 0.63 %
38
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Revised classifier structure: exploiting the ARO
DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family 1 Classifier Family ... Reads Family ... Classifier Family N Reads Family N Classifier AMR Gene Predictions
39
Slightly improved family performance
Precision Recall
Family Test Peformance
0.00 0.25 0.50 0.75 1.00
Proportion Normalised Bitscore Random Forest
Mean Precision: 0.995, Mean Recall: 0.985
40
Greatly improved gene performance
41
Gains not evenly distributed
25 50 75 100 125 150 175 200 225
Ordered AMR Family Index
0.00 0.25 0.50 0.75 1.00
Proportion Median Precision-Recall Within Families
Precision Recall
42
Metagenomic resistome profile
ARO:0000042 ! glycylcycline ARO:0000072 ! linezolid ARO:0000004 ! monobactam ARO:0000025 ! fosfomycin ARO:3000157 ! rifamycin antibiotic ARO:3000034 ! nucleoside antibiotic ARO:3000111 ! novobiocin ARO:3000282 ! sulfonamide antibiotic ARO:3000053 ! peptide antibiotic ARO:0000041 ! bacitracin ARO:3003253 ! aminocoumarin sensitive parY ARO:3000657 ! paromomycin ARO:0000021 ! ribostamycin ARO:3000701 ! lividomycin B ARO:3000700 ! lividomycin A ARO:3000655 ! gentamicin B ARO:0000024 ! butirosin ARO:0000049 ! kanamycin A ARO:0000032 ! cephalosporin ARO:3000387 ! phenicol antibiotic ARO:3000554 ! mupirocin ARO:0000001 ! fluoroquinolone antibiotic ARO:0000044 ! cephamycin ARO:3000103 ! aminocoumarin antibiotic ARO:3000171 ! diaminopyrimidine antibiotic ARO:0000000 ! macrolide antibiotic ARO:0000016 ! aminoglycoside antibiotic ARO:0000026 ! streptogramin antibiotic ARO:3000081 ! glycopeptide antibiotic ARO:0000022 ! polymyxin antibiotic ARO:0000017 ! lincosamide antibiotic Indeterminate Class ARO:3001219 ! elfamycin antibiotic ARO:3000007 ! beta-lactam antibiotic ARO:3000050 ! tetracycline derivative Drug Class 10
610
510
410
3Normalised Read Proportion AMR hits related to Drug Class
47 human gut metagenome profiles
43
Great, but...
44
Can we get the best of metagenomics and genomics?
Metagenomic-Assembled Genomes
MAG binning
Genomes Reads Contigs
Metagenome- Assembled Genomes
Sequencing
Assembly Binning
45
MAGs are popular
Figure from (Parks et al., 2017)
46
What about plasmids?
Figure from (Antipov et al., 2016)
47
Or genomic islands
www.pathogenomics.sfu.ca/islandviewer
and prophages
IslandPath-DIMOB)
48
How well do MAGs recover these sequences?
Time to start simulating again
from difficult genomes
Megahit
dastool
49
Chromosomes fairly well binned
26-94.3% median chromosomal coverage (Pre-print draft github.com/fmaguire/mag_sim_paper)
50
Chromosomes fairly well binned
26-94.3% median chromosomal coverage (Pre-print draft github.com/fmaguire/mag_sim_paper)
50
Plasmids are not
1.5-29.2% plasmids binned
51
Genomic islands are better but bad
28-42% GIs binned
52
What about AMR genes?
24-43% AMR genes binned
53
Which AMR genes are lost?
54
Be cautious with MAGs
55
Conclusions
Conclusions
Method Strengths Weaknesses
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation Genomics Context, moderate analysis Isolation, throughput
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis Metagenomic-Assembed Genomes Context for many genomes Lose key data, complex analysis
56
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis Metagenomic-Assembed Genomes Context for many genomes Lose key data, complex analysis
strengths
56
Conclusions
Method Strengths Weaknesses Targeted Cheap, easy analysis a priori, stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis Metagenomic-Assembed Genomes Context for many genomes Lose key data, complex analysis
strengths
gene-free AST prediction models)
56
Acknowledgements
Andrew McArthur
Brinkman
57
Questions?
57
References
Antipov, D., Hartwick, N., Shen, M., Raiko, M., Lapidus, A., and Pevzner, P. (2016). plasmidspades: assembling plasmids from whole genome sequencing data. bioRxiv, page 048942. Bradley, P., Gordon, N. C., Walker, T. M., Dunn, L., Heys, S., Huang, B., Earle, S., Pankhurst, L. J., Anson, L., De Cesare, M., et al. (2015). Rapid antibiotic-resistance predictions from genome sequence data for staphylococcus aureus and mycobacterium tuberculosis. Nature communications, 6:10063.
58
Cassini, A., Hogberg, L. D., Plachouras, D., Quattrocchi, A., Hoxha, A., Simonsen, G. S., Colomb-Cotinat, M., Kretzschmar, M. E., Devleesschauwer, B., Cecchini, M., et al. (2019). Attributable deaths and disability-adjusted life-years caused by infections with antibiotic-resistant bacteria in the eu and the european economic area in 2015: a population-level modelling analysis. The Lancet Infectious Diseases, 19(1):56–66. de Kraker, M. E., Stewardson, A. J., and Harbarth, S. (2016). Will 10 million people die a year due to antimicrobial resistance by 2050? PLoS medicine, 13(11):e1002184. Goossens, H., Ferech, M., Vander Stichele, R., Elseviers, M., Group,
association with resistance: a cross-national database study. The Lancet, 365(9459):579–587.
59
Guitor, A. K., Raphenya, A. R., Klunk, J., Kuch, M., Alcock, B., Surette,
Capturing the resistome: A targeted capture method to reveal antibiotic resistance determinants in metagenomes. Antimicrobial agents and chemotherapy, pages AAC–01324. Maguire, F. (2016). A multi-omic analysis of the photosynthetic endosymbioses of paramecium bursaria. PhD Thesis. Maguire, F., Rehman, M. A., Carrillo, C., Diarra, M. S., and Beiko, R. G. (2019). Identification of primary antimicrobial resistance drivers in agricultural nontyphoidal salmonella enterica serovars by using machine
Matthews, T. C., Bristow, F. R., Griffiths, E. J., Petkau, A., Adam, J., Dooley, D., Kruczkiewicz, P., Curatcha, J., Cabral, J., Fornika, D., et al. (2018). The integrated rapid infectious disease analysis (irida)
60
globally: final report and recommendations. Review on antimicrobial resistance. Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P.-A., Woodcroft,
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature microbiology, 2(11):1533. Stairs, J., Bal, N., Maguire, F., and Scott, H. (2019). A resident-led clinic that promotes the health of refugee women through advocacy and partnership. Canadian Medical Education Journal.
61
Backup
10 million deaths?
(on Antimicrobial Resistance, 2016), (de Kraker et al., 2016)
62
10 million deaths?
(on Antimicrobial Resistance, 2016), (de Kraker et al., 2016)
62
Where does 10 million come from?
For 3rd-generation cephalosporin resistant E. coli, K. pneumoniae, and MRSA:
European hospitals by global population).
2 studies n=16 BSIs)
unspecified manner.
doubled infection rates by 2050.
63