multicore versus fpga in the acceleration of discrete
play

Multicore versus FPGA in the Acceleration of Discrete Molecular - PowerPoint PPT Presentation

Multicore versus FPGA in the Acceleration of Discrete Molecular Dynamics* + Tony Dean ~ Josh Model # Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University


  1. Multicore versus FPGA in the Acceleration of Discrete Molecular Dynamics* + Tony Dean ~ Josh Model # Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab * This work supported, in part, by MIT Lincoln Lab and the U.S. NIH/NCRR + Thanks to Nikolay Dokholyan, Shantanu Sharma, Feng Ding, George Bishop, François Kosie Now at General Dynamics ~ # Now at MIT Lincoln Lab HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  2. Overview – mini-talk • FPGAs are effective niche accelerators – especially suited for fine-grained parallelism • Parallel Discrete Event Simulation (PDES) is often not scalable – need ultra-low latency communication • Discrete Event Simulation of Molecular Dynamics (DMD) is – a canonical PDES problem – critical to computational biophysics/biochemistry – not previously shown to be scalable • FPGAs can accelerate DMD by 100x – Configure FPGA into a superpipelined event processor with speculative execution • Multicore DMD by applying FPGA method HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  3. Why Molecular Dynamics Simulation is so important … • Core of Computational Chemistry • Central to Computational Biology, with applications to � Drug design � Understanding disease processes … From DeMarco & Dagett: PNAS 2/24/04 Shows conversion of PrP protein from healthy to harmful isoform. Aggregation of misfolded intermediates appears to be the pathogenic species in amyloid (e.g. “mad cow” & Alzheimer’s) diseases. Note: this could only have been discovered with simulation! HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  4. Why LARGE MD Simulations are so important … MD simulations are often “heroic”: 100 days on 500 nodes … HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  5. Motivation - Why Accelerate MD? One second traditional MD with a PC of modeled reality Heroic* traditional MD with a PC Heroic* traditional MD with a large MPP P. Ding & N. Dokholyan * Heroic ≡ > one month elapsed time Trends in Biotechnology,2005 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  6. What is (Traditional) Molecular Dynamics? MD – An iterative application of Newtonian mechanics to ensembles of atoms and molecules Runs in phases � Motion Force Update update (Verlet) state of each particle Initially O(n 2 ), done Generally O(n), on coprocessor done on host is updated every fs − = + + + + Many forces typically computed, total bond angle torsion H non bonded F F F F F F but complexity lies in the non-bonded, spatially extended forces: van der Waals (LJ) and Coulombic (C) ⎧ ⎫ ⎛ ⎞ 14 8 ⎛ ⎞ ⎛ ⎞ ε σ σ ⎪ ⎪ = ∑ � ⎜ ⎟ � ⎜ ⎟ ⎜ ⎟ q ∑ − = LJ C ⎨ ⎬ i ab ab ab � � F 12 6 r F q r ⎜ ⎟ � ⎜ ⎟ ⎜ ⎟ σ i ji i i ji ⎜ ⎟ 2 3 ⎪ r r ⎪ ⎝ ⎠ ⎝ ⎠ ≠ ≠ r ⎝ ⎠ j i ⎩ ⎭ j i ab ji ji ji HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  7. An Alternative ... Only update particle state when “something happens” • “Something happens” = a discrete event Advantage � DMD runs 10 6 times faster than • tradition MD • Disadvantage � Laws of physics are continuous HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  8. But the physical world isn’t discrete … DMD force approximation Covalent Bond Hard Sphere Potential Multi-well Single-well Potential Distance Distance HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  9. While we’re approximating forces … • Traditional MD often uses all-atom models • DMD often models atoms behaviorally 1. Ab initio, assuming no knowledge of specific protein dynamics 2. Go -like models, which use empirical knowledge of the native state Force Models 2 . 1 . Ab initio Go-like 1. Urbanc et al. 2006 2. Dokholyan et al. 1998 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  10. After all this approximation … … is there any reality left?? Yes , but requires application-specific model tuning – Using traditional MD – Frequent user feedback � Interactive simulation HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  11. Current DMD Performance One second traditional MD with a PC of modeled reality Heroic* traditional MD with a PC Heroic* traditional MD with a large MPP Heroic* Discrete MD with a PC P. Ding & N. Dokholyan * Heroic ≡ > one month elapsed time Trends in Biotechnology,2005 HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  12. Motivation - Why Accelerate DMD? Example: Model nucleosome dynamics i.e., how DNA is packaged and accessed – three meters of it in every cell! From Steven M. Carr, Memorial University, Newfoundland HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  13. Discrete Event Simulation Event Predictor • Simulation proceeds as a series of (& Remover) discrete element-wise interactions new state – NOT time-step driven info state info System Event events & State Processor invalidations events • Seen in simulations of … – Circuits Time-Ordered – Networks Event Queue – Traffic arbitrary insertions – Systems Biology and deletions – Combat HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  14. How to make DMD even faster? Parallelize?? Approaches to Parallel DES are well known: • Conservative – Guarantees causal order between processors – Depends on “safe window” to avoid serialization • Optimistic – Allows processors to run (more) independently – Correct resulting causality violations with rollback Neither approach has worked in DMD: – Conservative: no safe window � causal order = serialization – Optimistic: � rollback is frequent and costly No existing production PDMD system! HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  15. What’s hard about parallelizing DMD? DMD production systems are highly optimized • 100K events/sec for up to millions of particles (10us/event) • Typical message passing latency ~1us-10us • Typical memory access latency ~ 50ns-100ns HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  16. What’s hard about parallelizing DMD? How about Task-Based Decomposition? New events can Event Predictor – invalidate queued events (& Remover) anywhere in the event queue new state state info – be inserted anywhere in the info System event queue Event events & State Processor invalidations events Time-Ordered Event Queue arbitrary insertions and deletions D A After events AB and CD at t 0 and t 0+ ε , newly C B predicted event BC happens almost E immediately – inserted at head of queue! Also, previously predicted BE gets cancelled. HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  17. What’s hard about parallelizing DMD? But those events were necessarily local -- Can’t we partition the simulated space? A After event AB, cascade of events causes B OP to happen almost immediately on the P other side of the simulation space. O Yes, but requires speculation and rollback HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  18. Event propagation can Note: “chain” with rigid links be infinitely fast over is analogous and much more any distance! likely to occur in practice Atomic Force Microscope unravels a protein HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  19. Outline • Overview: MD, DMD, DES, PDES • FPGA Accelerator conceptual design – Design overview – Component descriptions • Design Complications • FPGA Implementation and Performance • Multicore DMD • Discussion HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  20. FPGA Overview - Dataflow Main idea: DMD in one big pipeline • Events processed with a throughput of one event per cycle • Therefore, in a single cycle : • State is updated (event is committed ) • Invalidations are processed • New events are inserted – up to four are possible Event flow Update state Collider On-Chip Event Commit Off-Chip Event Predictor Event Heap Priority Queue Units Invalidations New Event Insertions Stall Inducing Insertions HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  21. FPGA Overview - Dataflow Main idea: DMD in one big pipeline • Events processed with a throughput of one event per cycle • Three com plications: 1. Processing units must have flexibility of event queue 2. Events cannot be processed using stale state information 3. Off-chip event queue must have same capability as on-chip Event flow Update state Collider On-Chip Event Commit Off-Chip Event Predictor Event Heap Priority Queue Units Invalidations New Event Insertions Stall Inducing Insertions HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

  22. Components High-Level DMD Accelerator System Diagram Bead, Cell Event Memory Banks Back Write Insertion Commit Event Priority Buffer Event Processor Queue Event Storage Predictor Units = = = = = Computation Invalidation Broadcast Particle Tags HPEC – 9/23/2008 Discrete MD with FPGAs and Multicore

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend