PyEMMA Package Overview and Software Development Martin K. Scherer - PowerPoint PPT Presentation

PyEMMA Package Overview and Software Development Martin K. Scherer Free University Berlin February 17, 2019

Outline Software overview and design patterns Python Anaconda stack Package overview Coordinates package MSM package PyEMMA Development Principles Processes GitHub Continous Integration Services Collaboration

Python in Data Science ◮ Easy to use core libraries (eg. NumPy, SciPy, Pandas, Jupyter, Matplotlib, . . . ) ◮ Scientific software for MD, data science, biology, chemistry . . . ◮ Easy to learn general purpose language ◮ Quick prototyping ◮ Glue together software written in faster languages (eg. C/C++, Fortran)

Anaconda Cloud and Conda package manager ◮ Anaconda is a (Python-based) software stack built for all three major platforms (Linux, OSX, Windows) ◮ Easy installation and upgrading, no need to compile anything yourself. ◮ Different software channels for different purposes (eg. Omnia [MD], BioConda [Bioinformatics], . . . ) ◮ Automatic handling of dependencies (conflict checking) ◮ Possibility to create isolated work environments (separate package versions etc.)

From MD data to Knowledge PyEMMA Python- Featurization Dim. reduction Discretization subpackage feature selection TICA k-means . coordinates discrete regspace VAMP MD data trajs ➜ [01] ... ➜ [02] ➜ [02] MSM estimation & validation Maximum likelihood (ML) MSM implied timescales convergence discrete Markov Bayesian MSM Chapman-Kolmogorov test trajs model ➜ [03] ➜ [03], [04], [07] ML hidden MSM identifying common problems Bayesian hidden MSM . msm ➜ [07] ➜ [08] MSM analysis spectral analysis metastable states with PCCA++ Markov stationary properties TPT model Knowledge kinetic properties ➜ [05] uncertainty estimation Experimental observables ➜ [04] ➜ [06]

Package hierarchy - abstracting detailedness User- PyEMMA Interface: High-level Functionality / Detailedness API (abstract) User-friendliness coordinates msm thermo plots Implementation (detailed) MDTraj MSMTools BHMM Thermotools Matplotlib Implementation C/C++ Fortran NumPy SciPy (very detailed) extensions extensions

Principles of coordinate package ◮ Streaming data pattern ◮ Avoid the need of dumping intermediate results to disk ◮ Support for multiple data formats ◮ Random access possible (either simulated or IO efficient) Featurization Dim. reduction Discretization k-means feature selection TICA discrete regspace VAMP MD data trajs ➜ [01] ➜ [02] ... ➜ [02] Figure: Workflow: state space discretisation

Readers / Data sources ◮ All readers are Python-“iterable”, which means you can process data in chunks. The more general concept in PyEMMA is called ‘DataSource‘. my_source = pyemma.coordinates.source([’traj001.xtc’, ...) 1 for element in my_source: 2 print (element) 3 Supported reader data formats: ◮ MD-simulation data (XTC, DCD, . . . via MDTraj) ◮ NumPy (.npy) files ◮ T abulated ASCII data (around three times more efficient than Numpy.loadtxt) ◮ Fragmented trajectories [(’sim_0_part0.xtc’, ’sim_0_part1.xtc’), ’sim_1_part0.xtc’, ’sim_1_part1.xtc’)]

MDTraj Python package for reading/writing and analyzing molecular trajectories. Analysis functions: ◮ distances ◮ bonds/angles/dihedrals ◮ hydrogen bonding identification ◮ secondary structure assignment ◮ NMR observables ◮ . . . and many more Supported formats: ◮ DCD ◮ binpos ◮ XTC ◮ NetCDF ◮ TRR ◮ LH5 ◮ PDB ◮ HDF5 ◮ XYZ ◮ . . .

MSM package MSM estimation & validation Maximum likelihood (ML) MSM implied timescales convergence discrete Markov Bayesian MSM Chapman-Kolmogorov test trajs model ➜ [03] ➜ [03], [04], [07] ML hidden MSM Bayesian hidden MSM identifying common problems ➜ [07] ➜ [08] MSM analysis spectral analysis metastable states with PCCA++ Markov stationary properties TPT model Knowledge kinetic properties ➜ [05] uncertainty estimation Experimental observables ➜ [06] ➜ [04] Figure: MSM estimation and analysis workflow.

MSM package User-API examples Step Goal API function (all in pyemma.msm package) 1.a choose lag time its = timescales_msm(dtrajs) 1.b choose lag time (visual pyemma.plots. inspection) plot_implied_timescales(its) 2 estimate a model msm_obj = estimate_markov_model(dtrajs, lag) 3.a validate model ck_obj = msm_obj.cktest() 3.b validate model (vis. in- pyemma.plots.plot_cktest(ck_obj) spection) 4.a Analyze slow processes msm_obj.timescales() etc. 4.b Perform coarse graining coarsed = msm_obj.pcca() 4.c Transition path analysis coarsed.tpt()

Outline Software overview and design patterns Python Anaconda stack Package overview Coordinates package MSM package PyEMMA Development Principles Processes GitHub Continous Integration Services Collaboration

Principles ◮ Use Python as the glue to faster languages (C/C++, Fortran) ◮ Stable and easy to use high level user interface ◮ Open source (GNU Lesser Public license 3+, minimal restrictions on redistribution) ◮ Open development process on GitHub (everybody can contribute) ◮ Focus on speed and stability (NumPy, SciPy under the hood) ◮ Focus on good documentation (see http://emma-project.org)

Development processes ◮ GitHub as frontend (collect issues/bugs, discuss proposed changes, plan new features, . . . ) ◮ Continuous integration/deployment (Travis-CI, AppVeyor, custom Jenkins instances) ◮ Unit-tests for API and implementation ◮ Integration tests of notebooks ◮ Release bug fixes regularly ◮ Release major/minor versions, if API changes. ◮ Preserve API compatibility (deprecate functions first, to notice users, that in the future their program/scripts will not work the same way as before)

Releasing and deploying ◮ Before a release we freeze acceptance of new features (their milestone gets postponed to the next release) ◮ T esting sessions - eliminate all found bugs ◮ Deploy source archive to PyPI (installable with pip) and binaries to Anaconda.org binary services. ◮ Version scheme: Major.minor.micro major = major new (and API break features) minor = new features preserving existing API micro = patches/bug fixes

GitHub Figure: PyEMMA GitHub page

Collaboration on GitHub 1. Propose a change/feature via an issue 2. Create a local branch in Git to work on 3. Push the (tested) branch to your fork 4. Open a “pull request” (PR) on main repository (markovmodel/PyEMMA) 5. Discuss changes, eventually add more commits 6. Maintainer merges your PR

Propose file change on GitHub

...continued

Participate ◮ Create a GitHub account to directly post issues ( preferred ). ◮ Join our channel on Gitter.im ◮ Send mails to the developers (more overhead for us, might not reach somebody in time).

Thank you for your attention! Further questions?

PyEMMA Package Overview and Software Development Martin K. Scherer - PowerPoint PPT Presentation

PyEMMA Package Overview and Software Development Martin K. Scherer Free University Berlin February 17, 2019 Outline Software overview and design patterns Python Anaconda stack Package overview Coordinates package MSM package PyEMMA

Protein dynamics and markov modeling Frank No Talk 01 - Introduction + Overview Before we

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

Package software for any distribution with upt Cyril Roelandt Package software for any

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

croft design studio Package Prices 2020 Package Prices We are now offering these package

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

Package Management with Package Management with Package Management with Anaconda Anaconda

Parsing package docs: Part III: Using the ReadP package

Thank you to our Sponsors Zeek Package Contest Winners First Prize EternalSafety Package - Lexi

The traitr package John Verzani CUNY/The College of Staten Island useR!2010 The traitr package

Tmux & Other Tools Jake Zimmerman October 22, 2016 Package Managers Package managers make

to functional package management with GNU Guix Package managers are really useful. But they can

with GNU Guix for developers & power users Package managers are really useful. Package

Border Region Stimulus Package January 2020 Border Region Stimulus Package Four New Funds

Border Region Stimulus Package January 2020 Border Region Stimulus Package Four New Funds

Contractor Awareness and Contractor Awareness and Prequalification Meeting Prequalification

TeV Par(cle Astrophysics with the High Al(tude Water Cherenkov (HAWC) Gamma-Ray Observatory

r r

Multi-parameter regularization for ill-posed problems with noisy right hand side and noisy

Jam-barrel politics: Road building and legislative voting in Colombia Leonardo Bonilla-Mej a

Mobile App T esting with Xamarin T est Cloud Martin imeek martin.simecek@microsoft.com

Jae Woo Choi, Dong In Shin, Young Jin Yu, Hyunsang Eom, Heon Young Yeom Seoul National Univ.

GANT network architecture Fibre and Photonic layers Guy Roberts, DANTE 21 November, 2012

Introduction to Parallel Computing George Karypis Search Algorithms for Discrete Optimization

PyEMMA Package Overview and Software Development Martin K. Scherer - PowerPoint PPT Presentation

PyEMMA Package Overview and Software Development Martin K. Scherer Free University Berlin February 17, 2019 Outline Software overview and design patterns Python Anaconda stack Package overview Coordinates package MSM package PyEMMA

Protein dynamics and markov modeling Frank No Talk 01 - Introduction + Overview Before we

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

Package software for any distribution with upt Cyril Roelandt Package software for any

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

croft design studio Package Prices 2020 Package Prices We are now offering these package

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

Package Management with Package Management with Package Management with Anaconda Anaconda

Parsing package docs: Part III: Using the ReadP package

Thank you to our Sponsors Zeek Package Contest Winners First Prize EternalSafety Package - Lexi

The traitr package John Verzani CUNY/The College of Staten Island useR!2010 The traitr package

Tmux &amp; Other Tools Jake Zimmerman October 22, 2016 Package Managers Package managers make

to functional package management with GNU Guix Package managers are really useful. But they can

with GNU Guix for developers &amp; power users Package managers are really useful. Package

Border Region Stimulus Package January 2020 Border Region Stimulus Package Four New Funds

Border Region Stimulus Package January 2020 Border Region Stimulus Package Four New Funds

Contractor Awareness and Contractor Awareness and Prequalification Meeting Prequalification

TeV Par(cle Astrophysics with the High Al(tude Water Cherenkov (HAWC) Gamma-Ray Observatory

r r

Multi-parameter regularization for ill-posed problems with noisy right hand side and noisy

Jam-barrel politics: Road building and legislative voting in Colombia Leonardo Bonilla-Mej a

Mobile App T esting with Xamarin T est Cloud Martin imeek martin.simecek@microsoft.com

Jae Woo Choi, Dong In Shin, Young Jin Yu, Hyunsang Eom, Heon Young Yeom Seoul National Univ.

GANT network architecture Fibre and Photonic layers Guy Roberts, DANTE 21 November, 2012

Introduction to Parallel Computing George Karypis Search Algorithms for Discrete Optimization

Tmux & Other Tools Jake Zimmerman October 22, 2016 Package Managers Package managers make

with GNU Guix for developers & power users Package managers are really useful. Package