Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago - PowerPoint PPT Presentation

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago Craveiro, Daniela Cruzes, Kate Despain, Bill Dorland, Lorin Hochstein*, Nico Zazworka, and Marvin Zelkowitz University of Maryland in College Park and (* University of Nebraska, Lincoln) University of Maryland Slide-1 SECSE

Scientific Computing • Problem: How to increase computational power for solving complex scientific problems? • Solutions: – Increase speed of processing unit – If not powerful enough, build networks of processors (Traditional approach in building supercomputers – thousands of communicating processors) • Expensive to build • Expensive to use - Uses lots of power for computing and cooling – Alternative – Add inexpensive processors to current desktop machines to increase computational power. • Intel – Multicore processors • Use graphics processing units as general purpose computers (GPGPU) This is the solution to be discussed today University of Maryland Slide-2 SECSE

Productivity measures • Related question: How effectively can we program these machines? – Traditionally the speed of the machine was measured in FLOPS (Floating Point Operations Per Second) on specific benchmark programs • Real programs rarely achieved those numbers • Often only 10-20% of peak performance – We have been studying programmer productivity in the High Performance Computing (HPC) domain as part of the DARPA High Productivity Computer System (HPCS) program from 2004-8 as a companion measure to machine performance – Can we apply those techniques to the problems of measuring productivity in the GPGPU domain. University of Maryland Slide-3 SECSE

Format for rest of talk • Review aspects of our work on programmer productivity from the DARPA HPCS program • Introduction to the GPGPU problem • Initial work on this issue and some thoughts on how we intend to proceed University of Maryland Slide-4 SECSE

HPCS Areas of Study Users/Developers Effort Process flow Defects Cost & benefit, relationships, context variables, predictive models, tradeoffs Programming Performance Tools models Environment/Hardware University of Maryland Slide-5 SECSE

Overall research process • What: Performed several studies of programmers building HPC programs in various environments – Replicated studies with graduate students at various universities on a set of standardized programs – In-depth observational studies of a few individuals to understand their behavior in solving HPC problems – Interviews with developers on their experiences in building HPC codes • How: Developed a series of tools for collecting development data – Effort data for programmers – Source files, edits, and test runs – System commands and execution times University of Maryland Slide-6 SECSE

Studies conducted UIUC U Chicago Stanford U U Utah ASC-Alliance ASC-Alliance ASC-Alliance ASC-Alliance MIT 3 studies UCSB 3 studies CalTech ASC-Alliance UMD 11 studies USC 5 studies UCSD 1 study SDSC Iowa State Mississippi State U Hawaii 1 study 1 study 2 studies 1 study University of Maryland Slide-7 SECSE

Sample Results: Characterizing novices (graduate students in classroom assignments) • OpenMP saves 35-75% of effort vs. MPI on most problems • Experience with problem reduces effort, but effect of programming model is greater than effect of experience • When performance is the goal: – Experts and students spend the same amount of time – Experts get significantly better performance • No correlation between effort and performance University of Maryland Slide-8 SECSE

Results: Understanding workflow (Observational study) 5 Successful compile-run cycle 4 Successful edit-compile 3 Failed compile-run cycle 2 Failed edit-compile 1 0 0:00 0:11 0:24 1:34 1:49 2:24 2:44 3:14 3:20 3:42 4:00 4:14 4:57 5:11 5:19 5:30 5:48 5:52 6:07 6:15 6:24 6:31 6:36 6:46 7:20 7:26 7:44 7:50 8:04 8:10 8:16 8:25 8:30 8:35 Elapsed Time A series of successful A series of failed and A series of failed and successful Observation successful Compile- Compile and failed Compile cycles with no runs Run cycles Run cycles New code is being added and Run Time defects being Developer unable to fix Hypothesis Compile Time defects being fixed fixed defects Truth Hypotheses were validated. (Interview) University of Maryland Slide-9 SECSE

Resulting Infrastructure Tools & Packages For the hpcs studies we built a collection of tools life.c openMP life.c LOC: CAPTURE PROCESS ANALYZE DERIVE > 654 MPI capture tools : processing analyze tools : knowledge help to gather tools : provide views bases : present data from study calculate / post on the DB in the derived participants and process data in order to support knowledge of join this data in the DB to the validation of analyze our common retrieve non hypotheses and processes data source - a captured and to gain new relational DB higher level data insights Information available at: http://hpcs.cs.umd.edu University of Maryland Slide-10 SECSE

GPGPU Solution • High-end PCs use separate display processors (GPUs or graphics processing units) for manipulating data on the display for computational complex applications (e.g., video games) • GPUs can be separately programmed for many tasks • Speeds for GPUs are increasing faster 350 300 than general CPU speeds 250 GFLOPS Intel 200 Question 1: Can GPUs be used to ATI 150 NVIDIA 100 program solutions in the HPC domain? 50 – Can get today GPU boards with 0 2001 2002 2003 2004 2005 2006 512 or more GPUs Year Question 2: Can we apply our approach in the HPCS domain to study GPGPU programming as well? A group at the University of Maryland was porting an application from a multiprocessing system to a GPGPU system. This provided an environment for testing these ideas. University of Maryland Slide-11 SECSE

Initial issues under study • Domain knowledge (how to solve the underlying problem in physics): – What distinguishes porting to a cluster from porting to a GPU? – What tools can aid scientists unfamiliar with GPUs when porting? – What tools help or are essential for software engineers using that methodology? • Methodology understanding (how to study productivity issues): – What kind of methodology do you need to examine an on-going port? – How important are interviews for analysis? University of Maryland Slide-12 SECSE

CodeVizard – Software Evolution Visualization Compiles: green File versions with lifelines: lines for successful captured at compile time. and red for failed Black borders indicate that Y-axis: folders and files compiles the file has been changed to colored by file type the previous version. Lifelines show first compile of this file Shell events: runs (blue), make (magenta), and others (black) X-axis: time line with hours in upper and days in bottom row University of Maryland Slide-13 SECSE

Preliminary GPU study (One week-port of rMHD code) In first 2: No Compiles New files, 3 work In last: High work Observation makes but focus on Makes sessions New files density runs one And runs First two phases: trying something new Adding new component, dense and successful work points to error free Hypothesis Third phase: getting first runs / earlier problems development solved After meetings with colleagues he got the The subject ported his code to GPU in little Truth template code to run in the third phase. time. (Interview) Adjustments were still necessary. University of Maryland Slide-14 SECSE

Scaling up: The weekly cycle steps 1. Process collected data – prior to interview 2. Pre-analysis of data – immediately before interview 3. Interview (semi-structured) developer 4. Post-analysis of data and interview University of Maryland Slide-15 SECSE

Question on Methodology • Interviews in a longer study while it is in process instead of conducting them retrospectively? – Hypothesis: A week is a short enough time for the subject to remember details – Hypothesis: Regular code inspections (possible with tools) and interview techniques are effective necessary • Experiences from each week can help improve both the methodology and the domain knowledge gain for the next one University of Maryland Slide-16 SECSE

Second GPU Case Study • Characteristics: – Graduate student porting serial 2D MHD Fortran code to 3D on a GPU – Original used OpenMP. OpenMP removed from code and CUDA commands added – Used DevObject Fortran library; some work still had to be done in CUDA (kernels) – Parallelization of derivative and FFT calculation suspected to bring most speedup University of Maryland Slide-17 SECSE

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago - PowerPoint PPT Presentation

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago Craveiro, Daniela Cruzes, Kate Despain, Bill Dorland, Lorin Hochstein, Nico Zazworka, and Marvin Zelkowitz University of Maryland in College Park and ( University of

Databases Announcements Create Table and Drop Table Create Table 4 Create Table CREATE

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Sketchup drawing of Trestle Table plan Bottom: Legs and stretcher waiting for the table top to be

Top ten mental tips Number one Know your real goal Top ten mental tips Number two Get nervous

THE HOMOTOPY TYPE OF G/ TOP QAYUM KHAN 1. Definition of G/ TOP Recall TOP n is the topological

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Top Physics @FCCee Patrizia Azzi - INFN Padova & CERN 1 how is top physics doing now?

Colour & Precision Top Physics Peter Skands (Monash University) Perturbative aspects of top

Top polarisation at colliders Top polarisation: what physics can it probe Probes of the top

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

NEU TABLE By HAY Neu Table is a small table designed by HAY with a round or a square tabletop.

Simple, Efficient, Portable Decomposition of Simple, Efficient, Portable Decomposition of Large

GLAST Large Area Telescope: GLAST Large Area Telescope: Gamma- -ray Large ray Large Gamma

Kirwan Commission on Innovation and Excellence in Education Report due Dec., 2017

December 10 Webinar 1 Age nda Welcome and Introductions TSMO Program Planning Context,

Portfolio Building. A design portfolio? Whats that? So you want a design job? the portfolio

Company Presentation March 2011 AGENDA 2010 Results Review 2010 Business Updates & 2011

Service of Process The Basics and Challenges Thomas Mulinazzi MLO/Mulinazzi Law Office

Civics 101: A Primer to Federal, Maryland, and Howard County Government What are we going to

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

UCSF Vascular Symposium 2018 April 19-21, 2018 Parc55 San Francisco San Francisco, CA COURSE

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago - PowerPoint PPT Presentation

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago Craveiro, Daniela Cruzes, Kate Despain, Bill Dorland, Lorin Hochstein*, Nico Zazworka, and Marvin Zelkowitz University of Maryland in College Park and (* University of

Databases Announcements Create Table and Drop Table Create Table 4 Create Table CREATE

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Sketchup drawing of Trestle Table plan Bottom: Legs and stretcher waiting for the table top to be

Top ten mental tips Number one Know your real goal Top ten mental tips Number two Get nervous

THE HOMOTOPY TYPE OF G/ TOP QAYUM KHAN 1. Definition of G/ TOP Recall TOP n is the topological

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Top Physics @FCCee Patrizia Azzi - INFN Padova &amp; CERN 1 how is top physics doing now?

Colour &amp; Precision Top Physics Peter Skands (Monash University) Perturbative aspects of top

Top polarisation at colliders Top polarisation: what physics can it probe Probes of the top

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

NEU TABLE By HAY Neu Table is a small table designed by HAY with a round or a square tabletop.

Simple, Efficient, Portable Decomposition of Simple, Efficient, Portable Decomposition of Large

GLAST Large Area Telescope: GLAST Large Area Telescope: Gamma- -ray Large ray Large Gamma

Kirwan Commission on Innovation and Excellence in Education Report due Dec., 2017

December 10 Webinar 1 Age nda Welcome and Introductions TSMO Program Planning Context,

Portfolio Building. A design portfolio? Whats that? So you want a design job? the portfolio

Company Presentation March 2011 AGENDA 2010 Results Review 2010 Business Updates &amp; 2011

Service of Process The Basics and Challenges Thomas Mulinazzi MLO/Mulinazzi Law Office

Civics 101: A Primer to Federal, Maryland, and Howard County Government What are we going to

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

UCSF Vascular Symposium 2018 April 19-21, 2018 Parc55 San Francisco San Francisco, CA COURSE

Large Efficient Table-Top Teraflop Computing Victor. Basili, Thiago Craveiro, Daniela Cruzes, Kate Despain, Bill Dorland, Lorin Hochstein, Nico Zazworka, and Marvin Zelkowitz University of Maryland in College Park and ( University of

Top Physics @FCCee Patrizia Azzi - INFN Padova & CERN 1 how is top physics doing now?

Colour & Precision Top Physics Peter Skands (Monash University) Perturbative aspects of top

Company Presentation March 2011 AGENDA 2010 Results Review 2010 Business Updates & 2011