High-Throughput Virtual Molecular Docking: Hadoop Implementation of - - PowerPoint PPT Presentation

▶

Oct 03, 2023 137 likes •330 views

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud The Second International Emerging Computational Methods for the Life Sciences Workshop ACM International Symposium on High Performance Distributed

SLIDE 1

High-Throughput Virtual Molecular Docking:

Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson

Graduate Research Assistant Center for Molecular Biophysics, UT/ORNL Department of Genome Science and Technology, UT Scalable Computing and Leading Edge Innovative Technologies (IGERT)

Dr. Jerome Baudry

PhD Advisor Center for Molecular Biophysics, UT/ORNL Department of BCMB, UT

The Second International Emerging Computational Methods for the Life Sciences Workshop ACM International Symposium on High Performance Distributed Computing June 8, 2011, San Jose, CA

SLIDE 2

Ultimate Goal: Reduce the time and cost of discovering novel drugs

SLIDE 3

1. Virtual Molecular Docking

a) Novel Drug Discovery b) Virtual high-throughput screenings (VHTS)

2. Cloud Computing

a) Advantages for VHTS b) Kandinsky c) Hadoop (MapReduce)

3. AutoDockCloud

a) Current Implementation b) Future Implementations

SLIDE 4

Virtual Molecular Docking

Given a receptor (protein) and ligand (small molecule), predict

1. Bound conformations
Search algorithm to explore conformational space
2. Binding affinity
Force field to evaluate energetics

SLIDE 5

Virtual Docking Engine

http://autodock.scripps.edu/wiki/AutoDock4

SLIDE 6

Novel Drug Discovery

Human HDAC4

HA3 crystal structure ZINC03962325

SLIDE 7

Virtual High-Throughput Screening (VHTS)

SLIDE 8

VHTS with Autodock4

SLIDE 9

Potential advantages of Cloud Computing for VHTS

Affordable access to compute resources

(especially for small labs and classrooms).

Easy to use interface accessible through web

for non-computer experts. Software maintained by experts.

Scalable resources for size of screening.

SLIDE 10

Kandinsky

Private Cloud Platform at ORNL

Kandinsky, the Systems Biology Knowledgebase Computer, Sponsored by the Office of Biological and Environmental Research in the DOE Office of Science 68 nodes X 16 cores/node = 1088 cores 20 Gbps Infiniband Interconnect Designed to support Hadoop applications and gain an understanding of the MapReduce paradigm.

57 nodes for MapReduce tasks
1 tasktracker per node
10 map and 6 reduce tasks per node (16 tasks

per node)

570 map tasks and 342 reduce tasks can run

simultaneously on Kandinsky

SLIDE 11

Hadoop

Scalable
Economical
Efficient
Reliable

http://hadoop.apache.org/common/docs/current/api/overview-summary.html

SLIDE 12

MapReduce

programming paradigm used by Hadoop

people.apache.org

SLIDE 13

Current AutoDockCloud Implementation

input=file names needed for each docking map(input) { copy input to local working directory; run AutoDock4 locally; copy result file to HDFS; } *pre-docking set-up and post-docking analysis is currently done manually *no reduce function is currently being used

SLIDE 14

Current AutoDockCloud Implementation

Er Agonist screening from DUD as benchmark 450 speed-up with 570 available map slots on Kandinsky, private cloud at ORNL

SLIDE 15

Current AutoDockCloud Implementation

Docking enrichment plot for ER agonist using AutoDockCloud and DUD.

Percent of known ligands found Percent of ranked database

SLIDE 16

Future AutoDockCloud Implementation

input=ligand file from chemical compound database map(input) { create pdbqt (AutoDock input file) from input; run AutoDock4 locally; find best scoring ligand structure; save structure to HDFS; return <score, ligand>; } reduce(<score, ligand>) { sort; return ranked_database; } *pre-docking and post-docking will be automated and distributed *less total I/O requirements

SLIDE 17

Future Plans

Incorporate additional docking engines

– Autodock Vina

Less I/O
More efficient and accurate algorithm
No charge information needed
Deploy on Commercial Cloud (EC2)
Develop web interface

SLIDE 18

1. Virtual Molecular Docking

a) Novel Drug Discovery b) Virtual high-throughput screenings (VHTS)

2. Cloud Computing

a) Advantages for VHTS b) Kandinsky c) Hadoop (MapReduce)

3. AutoDockCloud

a) Current Implementation b) Future Implementations

SLIDE 19

Questions/Comments

Acknowledgements

Dr. Jerome Baudry (advisor)
Center for Molecular Biophysics, UT/ORNL
Genome Science and Technology, UT
Scalable Computing and Leading Edge

Innovative Technologies (IGERT)

Avinash Kewalramani, ORNL
ECMLS and HPDC organizers and participants