How is python used in biomolecular sciences? Antonia Mey - - PowerPoint PPT Presentation

how is python used in biomolecular sciences
SMART_READER_LITE
LIVE PREVIEW

How is python used in biomolecular sciences? Antonia Mey - - PowerPoint PPT Presentation

How is python used in biomolecular sciences? Antonia Mey antonia.mey@ed.ac.uk @ppxasjsm L. Hedges C. Woods Europython 2018 Edinburgh 25/07/2018 1 What is computational biosomething? 42 x10 -10 m X-ray


slide-1
SLIDE 1

Europython 2018 — Edinburgh 25/07/2018 Antonia Mey antonia.mey@ed.ac.uk @ppxasjsm

How is python used in biomolecular sciences?

1

  • L. Hedges
  • C. Woods
slide-2
SLIDE 2

What is computational biosomething?

2

X-ray structure of a protein: set of x,y,z spatial coordinates of atoms 42 x10-10 m

slide-3
SLIDE 3

Typical questions tackled with MD

3

many potential compounds …

How well do small molecules inhibit a protein’s functionality?

How do proteins fold? How do proteins interact with each

  • ther?
slide-4
SLIDE 4

Timescales in biology

4 Timescales

  • f processes

NMR Molecular Dynamics Other experimental techniques: e.g. cryo EM

slide-5
SLIDE 5

Generate timeseries to compare to experiments

5 F113

200 ns of protein dynamics time trace of an angle

slide-6
SLIDE 6

What is MD?

6

+ box + integrator +

Physics engine usually written in C++, to integrate Newtons equation of motion using leap frog type algorithms.

hAiensemble = hAitime

slide-7
SLIDE 7

Typical MD workflow

7

Prep for simulation Run simulation protocol Time-series analysis Simulation setup Gromacs PyEMMA Download pdb Amber NAMD OpenMM SOMD DLPoly Charmm Python API MDTraj MDAnalysis MDAnalysis MMTools md-analysis-tools […] pdb api Schrödinger suite RDkit OpenMM tools Amber tools Python […] pdb2gmx mostly C++ commandline TCL, bash, python, Perl … get creative […] GUI, TCL, bash, python, Perl … get creative Cresset suite Commercial

slide-8
SLIDE 8

Typical file formats

8

coordinates

.pdb .crd .gro .xyz .mol2

trajectories

.dcd .xtc .trr

forcefields

.psf .parm7 .itp

And of course all MD programs can read all these file formats? — No

slide-9
SLIDE 9

Scenario: Simulate .gro file with Amber

9

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

slide-10
SLIDE 10

Scenario: Simulate .gro file with Amber

10

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

ppxasjsm::azuma { ~/Documents/}-> vmd dAla.gro

  • r, use one of the many
  • ther tools…
slide-11
SLIDE 11

Scenario: Simulate .gro file with Amber

11

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

  • r
  • r ….
slide-12
SLIDE 12

Scenario: Simulate .gro file with Amber

12

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

ppxasjsm::azuma { ~/Documents/}-> FESetup setup.in

pretty much all the arrows are bash scripts with input files

taken from AMBER tutorial

slide-13
SLIDE 13

Scenario: Simulate .gro file with Amber

13

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

ppxasjsm::azuma { ~/Documents/}-> FESetup setup.in

[globals] forcefield = amber, ff99SBildn, tip3p [protein] basedir = protein file.name = protein.pdb molecules = ala box.type = rectangular box.length = 10.0 align_axes = True neutralize = yes min.nsteps = 1000 min.restr_force = 10.0 min.restraint = notsolvent md.heat.nsteps = 1000 md.heat.restr_force = 10.0 md.heat.restraint = notsolvent md.constT.nsteps = 1000 md.constT.restr_force = 10.0 md.constT.restraint = notsolvent md.press.T = 298.0 md.press.nsteps = 50000 md.press.p = 1.0

slide-14
SLIDE 14

Scenario: Simulate .gro file with Amber

14

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation

slide-15
SLIDE 15

Scenario: Simulate .gro file with Amber

15

I have a coordinate .gro (Gromacs) file I want to simulate with Amber, how do I do this? visualise coordinates convert to pdb format use a setup tool run simulation pmemd -O -i ala.in -p ala.parm7

  • c ala.rst7 -o myout.mdout -x

myout.nc -e myout.mdene -r myout.rst7

And another command line tool…

slide-16
SLIDE 16

Zoo of applications leads to hacky workflows

16

  • Most tools have organically grown from

academic software with poor software practices

  • In order for tools to work with each other in

complicated workflows a lot of hacky bash scripting is used by academic users

  • Users need to be experts in many different

software with command line interfaces to interlink them

  • Many tools can do similar things and there may not

be obvious solutions for one problem (google trap: try suggestions until one works)

  • > Loss of focus on the science
slide-17
SLIDE 17

The same example as before BioSimSpace

17

slide-18
SLIDE 18

Cloud server demo — using BioSimSpace

18

http://130.61.69.221/

docker run chryswoods/biosimspace

http://130.61.69.221/hub/tmplogin

slide-19
SLIDE 19

BioSimSpace phase I

19

Manual FESetup Sire/SOMD Sire/analyse_freenrg based

  • n pymbar

freenrgworkflows

ccpbiosim.org siremol.org

BioSimSpace

  • L. Hedges
  • C. Woods

Commercially available software

Prep for simulation Run simulation protocol Time-series analysis Simulation setup Download pdb

some manual prep

slide-20
SLIDE 20

API overview

20

MD IO Gateway Process Protocol Trajectory

Core: Sire — Molecular library in C++

slide-21
SLIDE 21

What is BioSimSpace — summary

21

BioSimSpace

Simulation Analysis

Software MSM Perturbation Map Reweighting … pyemma gmx wham pymbar …

System Setup

Protein Protein + ligand ligand in water ligand in organic solvent …

Software pdb2gmx tleap FESetup … Interoperable tool for biomolecular simulations MD MC Enhanced sampling … Software Amber Gromacs OpenMM HTMD Plumed …

Trajectory generation

https://github.com/michellab/BioSimSpace

slide-22
SLIDE 22

Conclusions

22

BioSimSpace

  • Python API to write workflow components for Biomolecular

simulations

  • Allows to focus on science and not software: No need to

become an expert at different MD packages, setup or analysis tools

  • Ease of use in the cloud, with scalable computing resources

and future academic fool proof pricing model

  • Planned support for Knime and CWL workflow managers

BioSimSpace

  • A workflow engine
  • A top down approach by trying to reinvent the wheel again
  • A ‘finished’ piece of software: It is very much in an alpha

development stage with a large list of features and capabilities to be added in the future

slide-23
SLIDE 23

Questions

23