Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik - - PowerPoint PPT Presentation

kexin huang tianfan fu wenhao gao yue zhao marinka zitnik
SMART_READER_LITE
LIVE PREVIEW

Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik - - PowerPoint PPT Presentation

Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik Harvard Georgia Tech MIT CMU Harvard kexinhuang@hsph.harvard.edu marinka@hms.harvard.edu tfu42@gatech.edu whgao@mit.edu zhaoy@cmu.edu Retrieving, curating, and processing


slide-1
SLIDE 1

Kexin Huang

Harvard kexinhuang@hsph.harvard.edu

Tianfan Fu

Georgia Tech tfu42@gatech.edu

Wenhao Gao

MIT whgao@mit.edu

Yue Zhao

CMU zhaoy@cmu.edu

Marinka Zitnik

Harvard marinka@hms.harvard.edu

slide-2
SLIDE 2

Retrieving, curating, and processing ML-ready datasets is time- consuming and requires extensive domain expertise. Datasets are scattered around the bio repositories and there is no centralized repository for a variety of therapeutics tasks. Many tasks are under-explored in AI/ML community because

  • f the lack of data access.

2 https://github.com/mims-harvard/TDC

slide-3
SLIDE 3

Machine Learning Datasets for Therapeutics

3

  • Open-Source ML Datasets for Therapeutics:
  • Wide range of tasks: target discovery, activity screening, efficacy, safety,

manufacturing

  • Wide range of products: small molecules, antibodies, vaccine, miRNA
  • Numerous Data Functions:
  • Extensive data functions and model evaluators
  • Data processing and splits, molecule generation oracles, and much more
  • 3 Lines of Code:
  • Minimum package dependency, lightweight loaders

https://github.com/mims-harvard/TDC

slide-4
SLIDE 4

Domain scientists

Our Vision for TDC

ML scientists Identify meaningful therapeutics tasks Design powerful ML models

Advancing algorithms for key therapeutics problems

4 https://github.com/mims-harvard/TDC

slide-5
SLIDE 5

Modular Structure of TDC

TDC “Central Dogma”

5

Single- instance Multi- instance Generation

Y Y

https://github.com/mims-harvard/TDC

slide-6
SLIDE 6

Diverse Coverage of Tasks

6 https://github.com/mims-harvard/TDC

slide-7
SLIDE 7

DrugRes ADME Tox HTS QM Yields Paratope Epitope Develop DTI DDI PPI GDA DrugSyn Peptide MHC AntibodyAff MTI Catalyst Reaction MolGen PairMolGen RetroSyn

7 https://github.com/mims-harvard/TDC

slide-8
SLIDE 8

3 Lines of Code

The core TDC library uses minimum packages thus is installed hassle-free. Data loaders are simplified so that you can get access to ML- ready datasets within only 3 lines of code.

8 https://github.com/mims-harvard/TDC

slide-9
SLIDE 9

Highlight:

9

Data sources

https://github.com/mims-harvard/TDC

slide-10
SLIDE 10

Highlight:

Drug Response Prediction Drug Synergy Prediction

DrugRes DrugSyn

High Response Low Response High Response Low Response

+

10 https://github.com/mims-harvard/TDC

slide-11
SLIDE 11

Highlight: 10 Biologics Datasets

Paratope Epitope Develop Peptide MHC AntibodyAff MTI

11 https://github.com/mims-harvard/TDC

slide-12
SLIDE 12

Data Functions to Support your Research

12 https://github.com/mims-harvard/TDC

slide-13
SLIDE 13

GuacaMol MOSES Literature

Molecule Generation

Molecule Generation Oracles

13

Generated Molecules Oracle Score Optimize

GuacaMol: Benchmarking Models for de Novo Molecular Design, J. Chem. Inf. Model., 2019 MOSES: A Benchmarking Platform for Molecular Generation Models, Frontiers in Pharmacology, 2020 https://github.com/mims-harvard/TDC

3 Lines of Code

slide-14
SLIDE 14

Clinical Trials, CRISPR, Phenotypic Screening, Protein Contact, Crystal Structure …….

Tasks Datasets

Data Wrangling, Data Visualization, Realistic Splits, Molecule Generation Oracles, …….

Data Functions

HTS, ADME, Drug Response, Drug Synergy, Reactions, Antibody affinity, …….

14

Contribute

You Are Invited to Join TDC! TDC is an Open-Source, Community Effort

Fill in this form: rb.gy/ytbyfl

https://github.com/mims-harvard/TDC

slide-15
SLIDE 15

zitnikl klab.hms.harvard.edu/TDC /TDC

15 https://github.com/mims-harvard/TDC

slide-16
SLIDE 16

Kexin Huang

Harvard kexinhuang@hsph.harvard.edu

Tianfan Fu

Georgia Tech tfu42@gatech.edu

Wenhao Gao

MIT whgao@mit.edu

Yue Zhao

CMU zhaoy@cmu.edu

Marinka Zitnik

Harvard marinka@hms.harvard.edu @K @Kex exinHuan ang5 @marinka kazitnik @Ti TianfanFu @W @Wen enhao aoGao ao1 @y @yzhao ao062 062

git github.com

  • m/mims

mims-ha harva vard/TDC /TDC

Website GitHub

zitnikl klab.hms.harvard.edu/TDC /TDC gr grou

  • ups.io

io/g/ /g/td tdc