CCDSC 2016 10/4/2016 Equivalent platforms for unmodified - PowerPoint PPT Presentation

Tiziano Passerini, Jaroslaw Slawinski, Umberto Villa, Sofia Guzzetti Alessandro Veneziani, Vaidy Sunderam Mathematics & Computer Science Emory University, Atlanta, USA CCDSC 2016 10/4/2016

• Equivalent platforms for unmodified application c o r e Application Intra/Inter ‐ net RAM IB, low latency SMP VO, P2P, etc. Cluster, supercomp • Single OS • Heter. CPUs • Homogen. CPUs • Parallel • Distributed • Soft precon’d (threads, Logical view computing • Good network OpenXYZ, MPI) • I have my application Virtualization, IaaS clouds • I need some CPU(s) • Look: soft condition to have • Do I care about a resource like above comm/io? Maybe • Feel: depends (on coupling) 10 Gb/s eth

• If different computational platforms may be used interchangeably … Not real data Turnaround time [in time or effort units] 80 70 60 50 40 30 20 10 0 Dev cluster Single node Supercomputer IaaS cloud Soft preconditioning Waiting for resources Computation

• Dev environment – no soft conditioning • “Rented” resources – no up ‐ front costs Not real data Distribution of costs per execution [in virtual dollars] 200 180 160 140 120 100 80 60 40 20 0 Dev cluster Single node Supercomputer, VO IaaS cloud Amortized up ‐ front Amortized admin Comp. & storage or energy

Case study: LifeV ‐ based hemodynamic simulation • CFD/FEM MPI parallel code • LifeV library • Issues – Process placement – Turnaround – Cost • Utility

• FEM input mesh partitioned into 8 partitions (8 processes) • Logical topology graph • Physical topology • How to match? 400 Affinity zones 350 CPU cores 300 250 200 Scotch 150 Internode connection 100Gb/s 100 50Gb/s 50 1Gb/s 0 0 1 2 3 4 5 6 7

• M – data from the partitioner • D – data from benchmarks • I – inverted D • Round ‐ robin and per ‐ core – input ‐ agnostic allocation

1 4 3 2 5 • Diagnosis • Bypass or stent placement • Cost vs. turnaround

1. Ellipse: university cluster 256 ‐ node 1k ‐ core; 1Gb/s ethernet; queue SGE 2. Puma: dev environment cluster 32 ‐ nodes 128 ‐ core; IB SDR; queue PBS 3. Lonestar: XSEDE supercomputer IB QDR; queue PBS 4. Rockhopper cluster: On ‐ Demand HPC Cloud Service, Penguin Computing IB QDR; queue PBS 5. Amazon EC2; 1 ‐ 16 nodes cc2.8xlarge 16 ‐ core per node; 10Gb/s ethernet

• Aneurism simulation • About 1 million elements (FEM) • Computes pressure and velocity for each 0.01 sec • Same problem, various number of processes (strong scalability test) • One MPI process per computing core in round ‐ robin placement

• A – fastest overall • B – supercom ‐ puter nodes are not the fastest • C – single EC2 = 16 processes on supercomputer • D – fastest EC2 configuration • EC2 scalability…

Avg is 4h 44m

• Puma and Lonestar – estimated cost based on hardware/ operational expenses; typical figures reported in literature • Ellipse – university pricing • Rockhopper – actual charges • EC2 – we used as many cheap spot ‐ request (bid ‐ based) instances as possible (about 6 times cheaper than regular instances)

• Value of simulation results to user over time • T * ‐ expected completion • U – utility value (e.g., in $) time • U max – the max value the • |T * ‐ T 0 | ‐ delay tolerance user is willing to pay (importance of the task) • T 0 – latest completion time

Range of min. prices per simulation for all architectures: $3.53 ‐ $22.59 Avg. $10.30

Low (3), high (1), average (2) priority jobs T* = 4.44 hrs #3 = $10.31 #1 = $20.62 A – overall fastest execution C – overall cheapest execution D – fastest time for EC2

• Turnaround vs. cost tradeoffs vary considerably across platforms (multiplied by parameter sweeps) • Some IaaS cloud resources offer superior capabilities compared to cluster/supercomputer nodes (large single instances vs. local clusters) • Queue waiting time is not considered in this study, but it may significantly change selection decisions for time ‐ critical computation (e.g., medical diagnosis)

CCDSC 2016 10/4/2016 Equivalent platforms for unmodified - PowerPoint PPT Presentation

Tiziano Passerini, Jaroslaw Slawinski, Umberto Villa, Sofia Guzzetti Alessandro Veneziani, Vaidy Sunderam Mathematics & Computer Science Emory University, Atlanta, USA CCDSC 2016 10/4/2016 Equivalent platforms for unmodified application c

Distinguishing Parallel and Distributed Computing Performance CCDSC 2016

Equivalent Circuits: Voltage R eq Thevenin Theorem Thevenin Equivalent Circuit

Equivalent Curves in Surfaces Anja Bankovi c University of Illinois Equivalent Curves Fix a

WILL YOU EAT OR BE EATEN ? Platforms are as old as trains 2 Sometimes platforms go wrong 3

Task-based programming in COMPSs to converge from HPC to Big Data Rosa M Badia Barcelona

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016

Sustaining the Data Ecosystem There is no free lunch but you still need to eat CCDSC 2016

Computational Significance (and its implications for HPC) Dimitrios S. Nikolopoulos CCDSC

You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian

Platforms Where is the market going? Adviser lead Platforms: Current state of affairs c.

Reporting Equivalent IPFIX Information Elements draft-aitken-ipfix-equivalent-ies-00 Paul Aitken

Thevenin dependent Ohms Law tells us that V = I*R so Equivalent Circuit 2 R = V / I

AUDIT HIGHLIGHTS YEAR ENDED JUNE 30, 2018 Audit dit Opinio inion- Unmodified Opinion

Progressive paravirtualization Keir Fraser, XenSource HVM Architecture Domain 0 Domain N Guest

OpenEXR / ID Cyril Corvazier Today Isolate part of the image with regexps Unmodified

T-PLATFORMS March 3, 2016 Artem Osipov Alexander Daryin GraphHPC-2016 www.t-platforms.com BFS

A Generic Framework for Interprocedural Analysis of Numerical Properties + Markus Mller-Olm

Formal Methods for Interactive Systems Part 8 Cognitive Architectures Antonio Cerone

On Stability theory for C 0 -Semigroups and applications Francis Flix Crdova Puma

Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Thursday, March 8, 12 Outline

AI and Predictive Analytics in Data-Center Environments Distributed Computing using Spark

Vectors, Matrices, Rotations Why are we studying this? You want to put your hand on the cup

Optimizing Phylogenetic Supertrees Using Answer Set Programming Laura Koponen 1 , Emilia Oikarinen

Visual Servoing Henrik I. Christensen Robotics and Intelligent Machines @ GT College of

Sambuz

Useful Links

Newsletter

Mail Us