Multicore versus FPGA in the Acceleration of Discrete Molecular - PowerPoint PPT Presentation

Multicore versus FPGA in the Acceleration of Discrete Molecular Dynamics* + Tony Dean ~ Josh Model # Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab * This work supported, in part, by MIT Lincoln Lab and the U.S. NIH/NCRR + Thanks to Nikolay Dokholyan, Shantanu Sharma, Feng Ding, George Bishop, François Kosie Now at General Dynamics ~ # Now at MIT Lincoln Lab HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Overview – mini-talk • FPGAs are effective niche accelerators – especially suited for fine-grained parallelism • Parallel Discrete Event Simulation (PDES) is often not scalable – need ultra-low latency communication • Discrete Event Simulation of Molecular Dynamics (DMD) is – a canonical PDES problem – critical to computational biophysics/biochemistry – not previously shown to be scalable • FPGAs can accelerate DMD by 100x – Configure FPGA into a superpipelined event processor with speculative execution • Multicore DMD by applying FPGA method HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Why Molecular Dynamics Simulation is so important … • Core of Computational Chemistry • Central to Computational Biology, with applications to � Drug design � Understanding disease processes … From DeMarco & Dagett: PNAS 2/24/04 Shows conversion of PrP protein from healthy to harmful isoform. Aggregation of misfolded intermediates appears to be the pathogenic species in amyloid (e.g. “mad cow” & Alzheimer’s) diseases. Note: this could only have been discovered with simulation! HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Why LARGE MD Simulations are so important … MD simulations are often “heroic”: 100 days on 500 nodes … HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Motivation - Why Accelerate MD? One second traditional MD with a PC of modeled reality Heroic* traditional MD with a PC Heroic* traditional MD with a large MPP P. Ding & N. Dokholyan * Heroic ≡ > one month elapsed time Trends in Biotechnology,2005 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

What is (Traditional) Molecular Dynamics? MD – An iterative application of Newtonian mechanics to ensembles of atoms and molecules Runs in phases � Motion Force Update update (Verlet) state of each particle Initially O(n 2 ), done Generally O(n), on coprocessor done on host is updated every fs − = + + + + Many forces typically computed, total bond angle torsion H non bonded F F F F F F but complexity lies in the non-bonded, spatially extended forces: van der Waals (LJ) and Coulombic (C) ⎧ ⎫ ⎛ ⎞ 14 8 ⎛ ⎞ ⎛ ⎞ ε σ σ ⎪ ⎪ = ∑ � ⎜ ⎟ � ⎜ ⎟ ⎜ ⎟ q ∑ − = LJ C ⎨ ⎬ i ab ab ab � � F 12 6 r F q r ⎜ ⎟ � ⎜ ⎟ ⎜ ⎟ σ i ji i i ji ⎜ ⎟ 2 3 ⎪ r r ⎪ ⎝ ⎠ ⎝ ⎠ ≠ ≠ r ⎝ ⎠ j i ⎩ ⎭ j i ab ji ji ji HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

An Alternative ... Only update particle state when “something happens” • “Something happens” = a discrete event Advantage � DMD runs 10 6 times faster than • tradition MD • Disadvantage � Laws of physics are continuous HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

But the physical world isn’t discrete … DMD force approximation Covalent Bond Hard Sphere Potential Multi-well Single-well Potential Distance Distance HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

While we’re approximating forces … • Traditional MD often uses all-atom models • DMD often models atoms behaviorally 1. Ab initio, assuming no knowledge of specific protein dynamics 2. Go -like models, which use empirical knowledge of the native state Force Models 2 . 1 . Ab initio Go-like 1. Urbanc et al. 2006 2. Dokholyan et al. 1998 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

After all this approximation … … is there any reality left?? Yes , but requires application-specific model tuning – Using traditional MD – Frequent user feedback � Interactive simulation HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Current DMD Performance One second traditional MD with a PC of modeled reality Heroic* traditional MD with a PC Heroic* traditional MD with a large MPP Heroic* Discrete MD with a PC P. Ding & N. Dokholyan * Heroic ≡ > one month elapsed time Trends in Biotechnology,2005 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Motivation - Why Accelerate DMD? Example: Model nucleosome dynamics i.e., how DNA is packaged and accessed – three meters of it in every cell! From Steven M. Carr, Memorial University, Newfoundland HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Discrete Event Simulation Event Predictor • Simulation proceeds as a series of (& Remover) discrete element-wise interactions new state – NOT time-step driven info state info System Event events & State Processor invalidations events • Seen in simulations of … – Circuits Time-Ordered – Networks Event Queue – Traffic arbitrary insertions – Systems Biology and deletions – Combat HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

How to make DMD even faster? Parallelize?? Approaches to Parallel DES are well known: • Conservative – Guarantees causal order between processors – Depends on “safe window” to avoid serialization • Optimistic – Allows processors to run (more) independently – Correct resulting causality violations with rollback Neither approach has worked in DMD: – Conservative: no safe window � causal order = serialization – Optimistic: � rollback is frequent and costly No existing production PDMD system! HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

What’s hard about parallelizing DMD? DMD production systems are highly optimized • 100K events/sec for up to millions of particles (10us/event) • Typical message passing latency ~1us-10us • Typical memory access latency ~ 50ns-100ns HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

What’s hard about parallelizing DMD? How about Task-Based Decomposition? New events can Event Predictor – invalidate queued events (& Remover) anywhere in the event queue new state state info – be inserted anywhere in the info System event queue Event events & State Processor invalidations events Time-Ordered Event Queue arbitrary insertions and deletions D A After events AB and CD at t 0 and t 0+ ε , newly C B predicted event BC happens almost E immediately – inserted at head of queue! Also, previously predicted BE gets cancelled. HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

What’s hard about parallelizing DMD? But those events were necessarily local -- Can’t we partition the simulated space? A After event AB, cascade of events causes B OP to happen almost immediately on the P other side of the simulation space. O Yes, but requires speculation and rollback HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Event propagation can Note: “chain” with rigid links be infinitely fast over is analogous and much more any distance! likely to occur in practice Atomic Force Microscope unravels a protein HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Outline • Overview: MD, DMD, DES, PDES • FPGA Accelerator conceptual design – Design overview – Component descriptions • Design Complications • FPGA Implementation and Performance • Multicore DMD • Discussion HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

FPGA Overview - Dataflow Main idea: DMD in one big pipeline • Events processed with a throughput of one event per cycle • Therefore, in a single cycle : • State is updated (event is committed ) • Invalidations are processed • New events are inserted – up to four are possible Event flow Update state Collider On-Chip Event Commit Off-Chip Event Predictor Event Heap Priority Queue Units Invalidations New Event Insertions Stall Inducing Insertions HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

FPGA Overview - Dataflow Main idea: DMD in one big pipeline • Events processed with a throughput of one event per cycle • Three com plications: 1. Processing units must have flexibility of event queue 2. Events cannot be processed using stale state information 3. Off-chip event queue must have same capability as on-chip Event flow Update state Collider On-Chip Event Commit Off-Chip Event Predictor Event Heap Priority Queue Units Invalidations New Event Insertions Stall Inducing Insertions HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Components High-Level DMD Accelerator System Diagram Bead, Cell Event Memory Banks Back Write Insertion Commit Event Priority Buffer Event Processor Queue Event Storage Predictor Units = = = = = Computation Invalidation Broadcast Particle Tags HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Multicore versus FPGA in the Acceleration of Discrete Molecular - PowerPoint PPT Presentation

Multicore versus FPGA in the Acceleration of Discrete Molecular Dynamics* + Tony Dean ~ Josh Model # Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Acceleration of Tear Film Map Definition on Multicore Systems Jorge Gonzlez-Domnguez* ,

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Investigate the History of the Chinese Dialects Improving Phylogeny-Based Network Approaches to

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Investor Presentation November 2016 TSX.V: INP 1 Forward Looking Information This Presentation

NAYR Denver Presentation The following is an explanation of the: The North American Yak

4th International Conference on Environmental Science and Technology 2008 July 27-31, 2008

Hall of fame A photograph or cartoon drawing hall of fame of people in history who have made a

Assessm ent of Cox Proportional Hazard Model Adequacy Using PROC PHREG and PROC GPLOT Jadwiga

Insights Through March 2019 J o n a t h a n S m o k e & Z o R a h i m 1 T R A N S F O R

Multicore versus FPGA in the Acceleration of Discrete Molecular - PowerPoint PPT Presentation

Multicore versus FPGA in the Acceleration of Discrete Molecular Dynamics* + Tony Dean ~ Josh Model # Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Acceleration of Tear Film Map Definition on Multicore Systems Jorge Gonzlez-Domnguez* ,

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Investigate the History of the Chinese Dialects Improving Phylogeny-Based Network Approaches to

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Investor Presentation November 2016 TSX.V: INP 1 Forward Looking Information This Presentation

NAYR Denver Presentation The following is an explanation of the: The North American Yak

4th International Conference on Environmental Science and Technology 2008 July 27-31, 2008

Hall of fame A photograph or cartoon drawing hall of fame of people in history who have made a

Assessm ent of Cox Proportional Hazard Model Adequacy Using PROC PHREG and PROC GPLOT Jadwiga

Insights Through March 2019 J o n a t h a n S m o k e &amp; Z o R a h i m 1 T R A N S F O R

Insights Through March 2019 J o n a t h a n S m o k e & Z o R a h i m 1 T R A N S F O R