Kexin Huang
Harvard kexinhuang@hsph.harvard.edu
Tianfan Fu
Georgia Tech tfu42@gatech.edu
Wenhao Gao
MIT whgao@mit.edu
Yue Zhao
CMU zhaoy@cmu.edu
Marinka Zitnik
Harvard marinka@hms.harvard.edu
Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik - - PowerPoint PPT Presentation
Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik Harvard Georgia Tech MIT CMU Harvard kexinhuang@hsph.harvard.edu marinka@hms.harvard.edu tfu42@gatech.edu whgao@mit.edu zhaoy@cmu.edu Retrieving, curating, and processing
Kexin Huang
Harvard kexinhuang@hsph.harvard.edu
Tianfan Fu
Georgia Tech tfu42@gatech.edu
Wenhao Gao
MIT whgao@mit.edu
Yue Zhao
CMU zhaoy@cmu.edu
Marinka Zitnik
Harvard marinka@hms.harvard.edu
Retrieving, curating, and processing ML-ready datasets is time- consuming and requires extensive domain expertise. Datasets are scattered around the bio repositories and there is no centralized repository for a variety of therapeutics tasks. Many tasks are under-explored in AI/ML community because
2 https://github.com/mims-harvard/TDC
3
manufacturing
https://github.com/mims-harvard/TDC
Domain scientists
ML scientists Identify meaningful therapeutics tasks Design powerful ML models
Advancing algorithms for key therapeutics problems
4 https://github.com/mims-harvard/TDC
TDC “Central Dogma”
5
Single- instance Multi- instance Generation
Y Y
https://github.com/mims-harvard/TDC
6 https://github.com/mims-harvard/TDC
DrugRes ADME Tox HTS QM Yields Paratope Epitope Develop DTI DDI PPI GDA DrugSyn Peptide MHC AntibodyAff MTI Catalyst Reaction MolGen PairMolGen RetroSyn
7 https://github.com/mims-harvard/TDC
3 Lines of Code
The core TDC library uses minimum packages thus is installed hassle-free. Data loaders are simplified so that you can get access to ML- ready datasets within only 3 lines of code.
8 https://github.com/mims-harvard/TDC
9
Data sources
https://github.com/mims-harvard/TDC
Drug Response Prediction Drug Synergy Prediction
DrugRes DrugSyn
High Response Low Response High Response Low Response
+
10 https://github.com/mims-harvard/TDC
Paratope Epitope Develop Peptide MHC AntibodyAff MTI
11 https://github.com/mims-harvard/TDC
12 https://github.com/mims-harvard/TDC
GuacaMol MOSES Literature
Molecule Generation
13
Generated Molecules Oracle Score Optimize
GuacaMol: Benchmarking Models for de Novo Molecular Design, J. Chem. Inf. Model., 2019 MOSES: A Benchmarking Platform for Molecular Generation Models, Frontiers in Pharmacology, 2020 https://github.com/mims-harvard/TDC
3 Lines of Code
Clinical Trials, CRISPR, Phenotypic Screening, Protein Contact, Crystal Structure …….
Tasks Datasets
Data Wrangling, Data Visualization, Realistic Splits, Molecule Generation Oracles, …….
Data Functions
HTS, ADME, Drug Response, Drug Synergy, Reactions, Antibody affinity, …….
14
Fill in this form: rb.gy/ytbyfl
https://github.com/mims-harvard/TDC
zitnikl klab.hms.harvard.edu/TDC /TDC
15 https://github.com/mims-harvard/TDC
Kexin Huang
Harvard kexinhuang@hsph.harvard.edu
Tianfan Fu
Georgia Tech tfu42@gatech.edu
Wenhao Gao
MIT whgao@mit.edu
Yue Zhao
CMU zhaoy@cmu.edu
Marinka Zitnik
Harvard marinka@hms.harvard.edu @K @Kex exinHuan ang5 @marinka kazitnik @Ti TianfanFu @W @Wen enhao aoGao ao1 @y @yzhao ao062 062
git github.com
mims-ha harva vard/TDC /TDC
Website GitHub
zitnikl klab.hms.harvard.edu/TDC /TDC gr grou
io/g/ /g/td tdc