Hypothesis Generation for Antibiotic Resistance using Machine Learning Techniques
Nicholas Joodi, Minseung Kim, Ilias Tagkopoulos Tagkopoulos Lab
Hypothesis Generation for Antibiotic Resistance using Machine - - PowerPoint PPT Presentation
Hypothesis Generation for Antibiotic Resistance using Machine Learning Techniques Nicholas Joodi, Minseung Kim, Ilias Tagkopoulos Tagkopoulos Lab Antibiotic Resistance Medicines for treating infection lose effect because of Microbe change:
Nicholas Joodi, Minseung Kim, Ilias Tagkopoulos Tagkopoulos Lab
○ Mutation ○ Acquire new genetic information to develop resistance
○ Study in the United States (CDC 2013)[2] ■ 2 million people infected by bacteria resistant to antibiotics ■ 23,000 deaths ○ Overall Societal costs[2] ■ Up to $20 billion direct ■ Up to $35 billion indirect
Predict the Antibiotic Resistant Genes (ARG)
○ leverage known ARG sequences from within genomic or metagenomic sequence libraries ○ Commonly used approach: “Best Hit”
○ A machine learning approach over sequencing data ○ Improvements to the “Best Hit” approach
Graph Inference
integrated/discrepancy resolved E. coli knowledge base to predict antibiotic resistance
○ Composed of entities (nodes) and relations between entities (edges)
○ Combine the powers of two disparate approaches to predict new facts
resistance to an antibiotic
Entity Type Node Count gene 4769 antibiotic 109 cellular component 152 biological process 1522 Molecular Function 1782
○ 5 groups
Domain Relation Type Range Edge Count Gene activates gene 2549 Gene is Cellular component 4325 Gene represses gene 2473 Gene Is involved in Biological process 6508 Gene Upregulated by antibiotic antibiotic 159 Gene Confers resistance to antibiotic antibiotic 902 Gene has Molecular function 7835 Gene Targeted by antibiotic 31 Gene Not upregulated by antibiotic antibiotic 338124 Gene Not confers resistance to antibiotic antibiotic 422899 Gene Not activates gene 48312 Gene Not represses gene 48544
12 relation types ○ 4 negatives
1. Score edge using PRA and ER-MLP 2. Calibrate Scores 3. Majority vote using Boosted Decision Stumps 4. Boolean Prediction
relation embeddings
1. With ReLU activation 2. Dropout with ReLU activation 3. Dropout with Sigmoid activation
the confidence score
types
averaging the constituent word embeddings
entities is established after training
and object entities
○ Paths are the features
○ Activates → Confers Resistance to Antibiotic ○ Activates-1 → Confers Resistance to Antibiotic ○ Represses ○ Activates → Represses ○ Activates-1 → Represses
○ [(1,0,0,0,0), 1], [0,1,0,0,0),1], [0,0,1,1,0),0], [0,0,1,0,1),0]
○ Log Loss, Hinge Loss, Exponential Loss
be superior in prediction
○ Isotonic Regression
○
Decision stumps with Adaboost
○ 100 samples of each ■ 1 positive edge of confers resistance to antibiotic ■ 99 negative edges of confers resistance to antibiotic
Preliminary results show that the PRA performed optimally while the Stacked had the highest recall
antibiotic
to antibiotic relation only
○ Training on the scores produced from the other edges could provide for more training data ○ Would reduce size of knowledge graph to include more edges in validation set ○ Would require the use of the local closed world assumption
ER-MLP/PRA
1. Organization, W.H., Antimicrobial resistance: global report on surveillance. 2014: World Health Organization. 2. Centres for Disease Control and Prevention (US). Antibiotic resistance threats in the United States,
Services, 2013. 3. Achenbach, Joel. "CDC comes close to an all-clear on romaine lettuce as E. coli outbreak nears historic level." The Washington Post. The Washington Post Company, 16 May 2018. Web. 28 May 2018. 4. McArthur, Andrew G., and Kara K. Tsang. "Antimicrobial resistance surveillance in the genomic age." Annals of the New York Academy of Sciences 1388.1 (2017): 78-91. 5. Arango-Argoty, Gustavo, et al. "DeepARG: A deep learning approach for predicting antibiotic resistance genes from metagenomic data." Microbiome 6.1 (2018): 23. 6. Dong, X., et al. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data