AMRtime Precise identification of antimicrobial resistance - PowerPoint PPT Presentation

AMRtime Precise identification of antimicrobial resistance determinants from metagenomic data Finlay Maguire finlaymaguire@gmail.com December 3, 2019 Faculty of Computer Science, Dalhousie University

Table of contents 1. Background 2. AMRtime Overview 3. Filtering out non-AMR reads 4. Sensitive Homology Classification 1

Background

AMR-metagenomics Genomes Sequencing Reads AMR detection AMR Genes 2

Comprehensive Antibiotic Resistance Database card.mcmaster.ca 3

Why is AMR metagenomics difficult?

AMR genes are rare genomically AMR Reads in Metagenome (0.643%) log(Read Count) 10 8 10 7 All (~324M) AMR (~2.1M) 2184 CARD-Prevalence Genomes at 1-10X abundance 4

AMR genes have wildly different abundances 1236 AMR PATRIC genomes 5

AMR sequence space overlaps MDS of CARD Proteins BLASTP-%ID Actual Families Affinity Clusters (Adj. Rand=0.30041) 1000 1000 500 500 0 0 500 500 1000 1000 1000 500 0 500 1000 1000 500 0 500 1000 6

AMRtime Overview

AMRtime structure Input files Metagenomic Reads Processes AMR Filtering Intermediate files Output files Filtered reads CARD Sensitive Homology Classification Homology predictions Variant Identification Metamodels Variant predictions Metamodel predictions 7

AMRtime structure Input files Metagenomic Reads Processes AMR Filtering Intermediate files Output files Filtered reads CARD Sensitive Homology Classification Homology predictions Variant Identification Metamodels Variant predictions Metamodel predictions 8

AMRtime structure Input Files Metagenomic Reads CARD Processes Read Filtering Intermediate Files Output Files Filtered Reads Features Sensitive AMR Classification ARO Predictions 9

Filtering out non-AMR reads

Testing sequence similarity search tools ESKAPE Genomes Resistance Gene Identi fi er ART Read Simulator + CARD Labeled Simulated Metagenome ORFM Predicted ORF Protein Sequences NT Query & NT CARD NT Query & AA CARD AA Query & AA CARD Database Methods Database Methods Database Methods - BLASTN - BLASTX - BLASTP - bowtie2 - DIAMOND BLASTX - DIAMOND BLASTP - BWA-MEM - PALADIN - HMMSearch - biobloom* - groot - HMMSearch 10

Terminology refresher interlude https://commons.wikimedia.org/wiki/File:Precisionrecall.svg 11

DNA subject best for precision, Protein subject best for recall 1.0 Domain DNA Query/DB DNA Query, Protein DB 0.8 Protein Query/DB Precision 0.6 0.4 0.2 0.00 0.25 0.50 0.75 1.00 Recall Simulated MiSeq v3 250bp reads, 30.31M reads (7.21M AMR derived) 12

K-mer methods perform poorly 1.0 Paradigm BWT BLAST 0.8 k-mer HMM Precision 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Recall BWT: bowtie2, bwa-mem, paladin; BLAST: blast, diamond; HMM: 13 hmmsearch; K-MER: biobloom, groot.

DIAMOND-BLASTX best compromise 1.00 Tool blastx 0.98 bwa diamond_blastx paladin 0.96 Precision blastp diamond_blastp 0.94 0.92 0.90 0.90 0.92 0.94 0.96 0.98 1.00 Recall DIAMOND-BLASTX ‘more sensitive’ setting (min < 1 e − 10 ): 4.926 hours with 2 cores and 8.3Gb of memory. AMR Reads: 7.15M detected, 59.26K missed, 1.87M false positives. 14

Why not just use these sequence searches?

Poor gene-level accuracy ARO Accuracy groot diamond_blastp diamond_blastx blastp Tool blastx paladin blastn bowtie2 bwa 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of reads per ARO correct Performance at optimal settings for ARO accuracy 15

Good family-level accuracy Correct Family groot hmmsearch_nt bowtie2 bwa hmmsearch_aa Tool blastn paladin diamond_blastp diamond_blastx blastx blastp 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of reads per family correct Performance at optimal settings for Family accuracy 16

Sensitive Homology Classification

Initial classifier Training Data Classifier ARO predictions 17

Initial classifier Training Data Classifier ARO predictions NB 7-mer Average Precision: 0.63 17

Initial classifier Training Data Classifier ARO predictions NB 7-mer Average Precision: 0.63 % 17

Revised classifier structure: exploiting the ARO Training Data AMR Family Classifier AMR Families Family 1 SMOTE Family ... SMOTE Family N SMOTE Family 1 Data Family ... Data Family N Data Family 1 Classifier Family ... Classifier Family N Classifier ARO predictions 18

Read encoding gene 1 gene 2 gene j − 1 gene j ...   1256 0 0 63 read 1 ... 0 0 0 0 read 2   ...   Sequence bitscore matrix =   ...  ... ... ... ... ...    0 512 0 0 read i − 1  ...  0 0 785 129 read i ... Advantages: read length invariant, low dimensionality, uses filtering data 19

Held-out test results Normalised Bitscore Random Forest 1.00 0.75 Proportion 0.50 0.25 0.00 Precision Recall Family Test Peformance Mean Precision: 0.995, Mean Recall: 0.985 20

ARO level classification more variable Median Precision-Recall Within Families 1.00 Precision Recall 0.75 Proportion 0.50 0.25 0.00 0 25 50 75 100 125 150 175 200 225 Ordered AMR Family Index 21

On-going work • Soft-threshold (i.e. propagating probabilities through layers) • Multiset labels based on sequence redundancy within families. • Threshold identification for variant model counts. • Metamodel rule parsing. • Galaxy bindings (CARD/IRIDA integration). 22

Summary

Conclusions • Direct homology searches are suprisingly poor for AMR metagenomics. 23

Conclusions • Direct homology searches are suprisingly poor for AMR metagenomics. • K-mer based approaches fall flat with sequencing error, low coverage and sparse labels. 23

Conclusions • Direct homology searches are suprisingly poor for AMR metagenomics. • K-mer based approaches fall flat with sequencing error, low coverage and sparse labels. • Direct homology search results ARE useful when combined with machine learning. 23

Conclusions • Direct homology searches are suprisingly poor for AMR metagenomics. • K-mer based approaches fall flat with sequencing error, low coverage and sparse labels. • Direct homology search results ARE useful when combined with machine learning. • The Antibiotic Resistance Ontology provides useful structure to improve predictions. 23

Conclusions • Direct homology searches are suprisingly poor for AMR metagenomics. • K-mer based approaches fall flat with sequencing error, low coverage and sparse labels. • Direct homology search results ARE useful when combined with machine learning. • The Antibiotic Resistance Ontology provides useful structure to improve predictions. • AMRtime: coming soon to CARD and your local government genomic epidemiology platform. 23

Acknowledgements

Acknowledgements • McMaster University: Brian Alcock and Andrew McArthur • Simon Fraser University: Fiona Brinkman • Dalhousie University: Robert Beiko • Funding: Donald Hill Family Fellowship, Genome Canada Grant. 24

Questions? 24

Insufficient Intrafamily Signal Intra-Family Shared 250mers TEM beta-lactamase SHV beta-lactamase OCH beta-lactamase MIR beta-lactamase LEN beta-lactamase GES beta-lactamase AMR Family PDC beta-lactamase NDM beta-lactamase GOB beta-lactamase KPC beta-lactamase SME beta-lactamase GIM beta-lactamase TMB beta-lactamase BEL beta-lactamase CfxA beta-lactamase VEB beta-lactamase 0 200 400 600 800 Number of Shared 250mers

Interfamily Collisions

AMRtime Precise identification of antimicrobial resistance - PowerPoint PPT Presentation

AMRtime Precise identification of antimicrobial resistance determinants from metagenomic data Finlay Maguire finlaymaguire@gmail.com December 3, 2019 Faculty of Computer Science, Dalhousie University Table of contents 1. Background 2.

Rapid Identification of AMR Determinants from Metagenomic Samples AMRtime Progress Report Finlay

HOMOLOGY IN ELECTROMAGNETIC MODELING Saku Suuriniemi Tampere University of Technology,

Nov Novena ena for for t the he Pr Present esentation tion of the of the Bl Bless essed

O N G O I N G T E A M F O R M AT I O N M A I N TA I N I N G T H E I N T E G R I T Y O F T H

St. Gabriel Building Update September 29, 2018 (The feast of St. Gabriel, the Archangel) Amazing

HPC Asia 2004 BioGrid workshop Development of a Database System for Drug Discovery by Employing

Conservation Biology MODULE 25: CONSERVATION BIOLOGY UNIT 4: TOPICS AND APPLICATIONS Objectives

words in the English language. What is incorrect here? Why? The sign should read Doctors

S.P.A.G. LOLZ! KS2 2: KNOCK - KNOCK! TODAYS CHALLENGE: Learn how to EXPAND nouns into

The Phonics Challenge danger Dec ecod oding ng Qu Quiz church 1 minute rain Split the

KS1 SATs 2017 Aims What are SATs? Changes for 2017 Example papers FAQs SATs

ELAR Competencies Competency 001 Oral Language Competency 002 Phonological

SATs 2018 Ardley Hill Academy Our Team Our team consists of: Mr J Smith Head teacher

KS2 SATS Tests Dates Activity English Paper 1: grammar and punctuation Monday 14 May 2018

Learning About Teaching From Teachers Fundac aci i Jau aume Bofill ll, B , Barcelona

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN 07-07-2017 Put

California Film Commission Joint Informational Hearing Assembly Committees - Arts,

Be supportive and actively

Hospital Council of East Central Michigan Annual Spring Trustee Forum Physician and Executive

Gillia n Wilmot Quoted companies Private Equity Owned Privately Owned Government Statutory The

7 Steps to Commercial Insight Graham Hall www.ambitiousbrands.com 12th June 2012 1 7 Steps to

Your name History 432 May 22, 2009 Sydnor, Charles. Gentlemen Freeholders: Political Practices in

Why Buildings? Why Buildings? A way to transport a dry Mediterranean climate to every god

1 contribution decline at 0.6% . However, our aim is to mitigate that brand contribution margin

AMRtime Precise identification of antimicrobial resistance - PowerPoint PPT Presentation

AMRtime Precise identification of antimicrobial resistance determinants from metagenomic data Finlay Maguire finlaymaguire@gmail.com December 3, 2019 Faculty of Computer Science, Dalhousie University Table of contents 1. Background 2.

Rapid Identification of AMR Determinants from Metagenomic Samples AMRtime Progress Report Finlay

HOMOLOGY IN ELECTROMAGNETIC MODELING Saku Suuriniemi Tampere University of Technology,

Nov Novena ena for for t the he Pr Present esentation tion of the of the Bl Bless essed

O N G O I N G T E A M F O R M AT I O N M A I N TA I N I N G T H E I N T E G R I T Y O F T H

St. Gabriel Building Update September 29, 2018 (The feast of St. Gabriel, the Archangel) Amazing

HPC Asia 2004 BioGrid workshop Development of a Database System for Drug Discovery by Employing

Conservation Biology MODULE 25: CONSERVATION BIOLOGY UNIT 4: TOPICS AND APPLICATIONS Objectives

words in the English language. What is incorrect here? Why? The sign should read Doctors

S.P.A.G. LOLZ! KS2 2: KNOCK - KNOCK! TODAYS CHALLENGE: Learn how to EXPAND nouns into

The Phonics Challenge danger Dec ecod oding ng Qu Quiz church 1 minute rain Split the

KS1 SATs 2017 Aims What are SATs? Changes for 2017 Example papers FAQs SATs

ELAR Competencies Competency 001 Oral Language Competency 002 Phonological

SATs 2018 Ardley Hill Academy Our Team Our team consists of: Mr J Smith Head teacher

KS2 SATS Tests Dates Activity English Paper 1: grammar and punctuation Monday 14 May 2018

Learning About Teaching From Teachers Fundac aci i Jau aume Bofill ll, B , Barcelona

POSTGRESQL ON AWS: TIPS &amp; TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN 07-07-2017 Put

California Film Commission Joint Informational Hearing Assembly Committees - Arts,

Be supportive and actively

Hospital Council of East Central Michigan Annual Spring Trustee Forum Physician and Executive

Gillia n Wilmot Quoted companies Private Equity Owned Privately Owned Government Statutory The

7 Steps to Commercial Insight Graham Hall www.ambitiousbrands.com 12th June 2012 1 7 Steps to

Your name History 432 May 22, 2009 Sydnor, Charles. Gentlemen Freeholders: Political Practices in

Why Buildings? Why Buildings? A way to transport a dry Mediterranean climate to every god

1 contribution decline at 0.6% . However, our aim is to mitigate that brand contribution margin

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN 07-07-2017 Put