Acceleration of Chemical Shift Prediction Eric Wright and Alex - PowerPoint PPT Presentation

1 S9277 - OpenACC-Based GPU Acceleration of Chemical Shift Prediction Eric Wright and Alex Bryer Sunita Chandrasekaran and Juan Perilla {efwright, abryer, schandra, jperilla} @udel.edu Collaborative project from Depts of CIS and Chemistry University of Delaware GTC March 19, 2019

2 Xu, et al. Nature (2018)

Proteins are central to biology, physiology and pathology translation transcription protein DNA mRNA DNA replication information action encapsulation motor … and much more transport Only 20 unique amino acids... Function arises from structure Hadden, et al. eLife (2018)

Hierarchy of protein structure Primary structure : sequence of amino acids Secondary structure causes chain to fold into tertiary structure . . . Glu Phe Ala Met Leu Gln Trp Sequence is organized into secondary structure Quaternary structure complexes multiple, folded chains

Structure is essential to function Determining a protein’s native structure is critical Tools of structure determination: - X-Ray crystallography - Electron microscopy - Nuclear Magnetic Resonance (NMR) NMR studies proteins with minimal tampering (i.e., freezing or crystallization) https://pdb101.rcsb org/motm/72 Medical Research Council: Mitochondrial Biology Unit (Creative commons attribution license)

6 What does an NMR experiment look like? (repeat for remaining ? atom types) … then Chemical shift assignment ( months/years ) Data collection ( days/weeks ) ❑ Validation ❑ Positional restraints ❑ Partial occupancies ❑ ... ❑ Deposition of structure Completion Correlation assignment ( months/years ) Structural ensemble

7 What does an NMR experiment look like? (repeat for remaining ? atom types) … then Chemical shift assignment ( months/years ) Data collection ( days/weeks ) ❑ Validation ❑ Positional restraints ❑ Partial occupancies ❑ ... ❑ Deposition of structure Completion Correlation assignment ( months/years ) Structural ensemble

Semi-empirical chemical shift prediction: PPM_One Treats chemical shift as a sum of differentiable functions which depend on internal coordinates Higher dimensional data (3D cartesian) maps to lower dimensional internal coordinates e.g., dihedral angle: ( α ) 𝑏 1 𝑦 + 𝑐 1 𝑧 + 𝑑 1 𝑨+ 𝑒 1 = 0 ( β ) 𝑏 2 𝑦 + 𝑐 2 𝑧 + 𝑑 2 𝑨 + 𝑒 2 = 0 cosΨ = 𝒐 1 ∙ 𝒐 2 𝒐 1 𝒐 2 More familiar challenges: NBody Dense linear algebra Unstructured grid (?) Dawei Li, Rafael Bruschweiler J.Biomol.NMR (2012) Dawei Li, Rafael Bruschweiler J.Biomol.NMR (2015)

11 Takeaway: theoretical biophysics is compute and data intensive Large systems necessitate high- performance codes and systems Perilla, et al. Nature (2016) 64 million atomistic simulation of HIV-1 virion

12 Project Motivation Nuclear Magnetic Resonance (NMR) is a vital tool in ● structural biology and biochemistry Chemical shift gives insight into the physical structure of ● the protein Predicting chemical shift has important uses in scientific ● areas such as drug discovery Our goal: To enable execution of multiple chemical shift ● predictions repeatedly To allow chemical shift predictions for larger scale ● structures

13 Introduction to the PPM_One code • Parametrize a new empirical knowledge-based chemical shift predictor of protein backbone atoms • Accepts a single static 3D protein structure (PDB format) as input • Emulates local protein dynamics • Outputs chemical shift prediction with high accuracy PPM_One: a static protein structure based chemical shift predictor Dawei Li, Rafael Brüschweiler, Journal of Biomolecular NMR. July 2015, Volume 62, Issue 3, pp 403 – 409

14 Profile Driven Development

15 Profile Driven Development • Tackling a large and unfamiliar code is daunting • Advantages of profiling: – High-level view of the code – Baseline performance metrics – Sanity check during the development process

16 Serial Code Profile (Main Function) Main Function % Runtime main() 100% predict_bb_static_ann(void) 81.226% predict_proton_static_new(void) 16.276% load(string) 1.921%

17 Serial Profile Visual Other 19% • Profiled code using PGPROF – Without any get_contact optimizations 35% • Gave a baseline snapshot of getring the code 4% – Identified hotspots within the code – Identified functions that Other Contains: are potential getani ● File I/O bottlenecks 14% ● PDB • Obtained large overview Structure without needing to read Initialization thousands of lines of code ● Data error gethbond correction 5% getselect 23%

21 Optimization in steps • getselect() • Looking into optimizing the serial code prior to parallelizing it getselect 23%

22 Serial Optimization (getselect) // Pseudocode for getselect function Reusing the same flags results in the function for( ... ) // Large loop returning the same set { of atoms c2=pdb->getselect(":1-%@allheavy"); traj->get_contact(c1,c2,&result); }

23 Serial Optimization (getselect) getselect originally // Pseudocode for getselect function accounted for 25% of the codes runtime. After optimization, it for( ... ) // Large loop takes less than 1% . { c2=pdb->getselect(":1-%@allheavy"); traj->get_contact(c1,c2,&result); } // Pseudocode for getselect function c2=pdb->getselect(":1-%@allheavy"); for( ... ) // Large loop { traj->get_contact(c1,c2,&result); }

24 Serial Optimizations(other smaller optimizations) • Filtering functions: – Filter objects from a large list – Written in an inefficient C++ style way – Runtime for filtering functions went from 5+min to 1 second for some datasets • Replace C++ stl vectors: – All data is stored within stl vectors – There are a few ways to work around this for GPUs – We chose to just replace them with pointers when possible

25 Serial Profile After Optimization Before After Other Other 12% 19% getring 12% get_contact getring get_contact 35% 4% 44% getani 14% getani 18% gethbond getselect gethbond 5% 23% 14%

26 Porting PPM to GPUs

27 Our Weapon of Choice Applications Compiler Programming Libraries Directives Languages • • High Performance Portable • High Performance • • Limited Uses Performance based • Most Difficult on compiler

28 Introduction to OpenACC • OpenACC is a directive based parallel programming model used to accelerate code on heterogenous systems. • Implemented by PGI, GCC, and Cray (until 2.0) • PGI community editions are freely available: https://www.pgroup.com/products/community.htm

29 Introduction to OpenACC Benefits: • Portable without sacrificing performance • Simple, based on directives • Ease of code porting (no large #pragma acc parallel loop code rewrites) for(int i = 0; i < N; ++i) a[i] = a[i]*b[i] + c[i];

30 Most compute intensive get_contact 44%

31 Accelerating get_contact get_contact is called many times • in the code The “pos” vector actually only • contains 3 values; x, y, z coordinates for(i=1;i<index_size-1;i++) { The “used” vector contains all of • ... the atoms in the structure traj->get_contact(c1,c2,&result); GPU focused, we collapsed the • ... outer loop } • Now we compute 3 contacts simultaneously We also combined all calls to • get_contact into one large function called get_all_contacts

32 Accelerating get_contact Inside of the get_contact function get_contact is called many times • in the code // For x,y,z coordinate The “pos” vector actually only • for(i=0;i<(int)pos.size();i++) contains 3 values; x, y, z { coordinates ... The “used” vector contains all of • // For every atom the atoms in the structure for(j=0;j<(int)used.size();j++) GPU focused, we collapsed the • { outer loop // Calculate contact • ... Now we compute 3 contacts simultaneously } We also combined all calls to • result->push_back(contact); get_contact into one large } function called get_all_contacts

33 Accelerating get_contact #pragma acc parallel loop private(...) \ present(..., results[0:results_size]) copyin(...) ● Large outer-loop for(i=1;i<index_size-1;i++) covers all individual { get_contact calls ... ● Inner-loop still iterates over all #pragma acc loop reduction(+:contact1, +:contact2, \ atoms +:contact3) private(...) ● Now calculating 3 for(j=0;j<c2_size;j++) different contacts { simultaneously // Calculate contact1, contact2, contact3 ● Writing contacts to } one large results ... array to be used later results[((i-1)*3)+0]=contact1; results[((i-1)*3)+1]=contact2; results[((i-1)*3)+2]=contact3; }

34 Next most compute intensive get_hbond

35 Acceleration of gethbond Gang and vector directives #pragma acc parallel loop gang for(i=0;i<_hbond_size;i++) allow us to implement { multiple levels of loop parallelism. #pragma acc loop vector for(j=0;j<hbond_size;j++) { ... #pragma acc loop seq The innermost loop is for(k=0;k<nframe;k++) typically very small, and { would provide no benefit in ... parallelizing, so we mark it } } as “sequential” }

Acceleration of Chemical Shift Prediction Eric Wright and Alex - PowerPoint PPT Presentation

1 S9277 - OpenACC-Based GPU Acceleration of Chemical Shift Prediction Eric Wright and Alex Bryer Sunita Chandrasekaran and Juan Perilla {efwright, abryer, schandra, jperilla} @udel.edu Collaborative project from Depts of CIS and Chemistry

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Chemical Reactions Slide 3 / 142 Slide 4 / 142 Table of Contents: Chemical Reactions Chemical

Chemical Thermodynamics Chemical Potential: gas Need chemical potential at arbitrary temperature

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Equipment and methods of metals and alloys chemical analysis in various industries Chemical

Profiles STAR European Conference 2010 London By: Dr Martin van Staden Aerotherm Computational

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance

for M AY 14, 2020 Asnuntuck Community College Q ING M ACK , ED.D D IRECTOR OF I NSTITUTIONAL R

College Career Pathways Learning for Life College Career Pathways Initiative: Overview Our

Basic Ordering Agreement (BOA) William McKenna Chief, EAGLE Business Office Army Sustainment

Development)of)Water<Soluble) Catalysts)for)Polyurethane)) Coa,ng)Applica,ons))

Strategic Planning Preparation Strategic planning overview and g p g working definitions

ACC EXPANSION CONCEPT DESIGN PREVIEW COMMITTEE OF THE WHOLE JANUARY 30, 2020 COMMITTEE OF THE

Acceleration of Chemical Shift Prediction Eric Wright and Alex - PowerPoint PPT Presentation

1 S9277 - OpenACC-Based GPU Acceleration of Chemical Shift Prediction Eric Wright and Alex Bryer Sunita Chandrasekaran and Juan Perilla {efwright, abryer, schandra, jperilla} @udel.edu Collaborative project from Depts of CIS and Chemistry

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Chemical Reactions Slide 3 / 142 Slide 4 / 142 Table of Contents: Chemical Reactions Chemical

Chemical Thermodynamics Chemical Potential: gas Need chemical potential at arbitrary temperature

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &amp;

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Equipment and methods of metals and alloys chemical analysis in various industries Chemical

Profiles STAR European Conference 2010 London By: Dr Martin van Staden Aerotherm Computational

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance

for M AY 14, 2020 Asnuntuck Community College Q ING M ACK , ED.D D IRECTOR OF I NSTITUTIONAL R

College Career Pathways Learning for Life College Career Pathways Initiative: Overview Our

Basic Ordering Agreement (BOA) William McKenna Chief, EAGLE Business Office Army Sustainment

Development)of)Water&lt;Soluble) Catalysts)for)Polyurethane)) Coa,ng)Applica,ons))

Strategic Planning Preparation Strategic planning overview and g p g working definitions

ACC EXPANSION CONCEPT DESIGN PREVIEW COMMITTEE OF THE WHOLE JANUARY 30, 2020 COMMITTEE OF THE

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &

Development)of)Water<Soluble) Catalysts)for)Polyurethane)) Coa,ng)Applica,ons))