Profiling Composable HPC Data Services WIP@PDSW, 2019 Srinivasan - PowerPoint PPT Presentation

Nov 03, 2022 •893 likes •967 views

Profiling Composable HPC Data Services WIP@PDSW, 2019 Srinivasan Ramesh Philip H. Carns Allen D. Malony Robert Ross Shane Snyder University of Oregon Argonne National Laboratory Data Services: Managing Heterogeneity and Change Storage:

Profiling Composable HPC Data Services WIP@PDSW, 2019 Srinivasan Ramesh Philip H. Carns Allen D. Malony Robert Ross Shane Snyder University of Oregon Argonne National Laboratory
Data Services: Managing Heterogeneity and Change Storage: Applications: ● Difficult to build custom data services Heterogeneous, Diverse Workflows, Multi-layered Data-driven efficiently: ○ Lots of moving parts ○ Need to dynamically adapt to SSD Simulation NVM changing application patterns ● Debugging performance problems is hard: ARCHIVE ○ Numerous attempts at debugging DISKS Data microservices: Dapper@Google, Machine Analysis Learning Stardust, X-Trace, etc MEMORY ○ We take inspiration from these “KOVE” DEVICES
Mochi: Composable Data Services Mobject service: An object store ● Mochi data services are built by composing microservices: ○ RPC for control ○ RDMA for data movement ● Mochi’s building blocks: ○ Mercury , Argobots, Margo ● Performance Analysis in Mochi: ○ Build performance analysis capability directly into Mochi: ■ Available out-of-the-box! *Image credits: Matthieu Dorier, Argonne National Laboratory
Mochi: Performance Analysis Call path profiling: Mobject service: Call path profiling ● We track the time spent in various call paths within the service: ○ A->C->D is a different call path from B->C->D ● Key idea: Each microservice stores and forwards RPC call path ancestry ● Time, call count, resource-level usage statistics updated at four instrumentation points: Client send/receive, Server send/receive ● What performance questions do we hope to answer?
Call Path Profiling: Detecting Load Imbalance ● Performance question: For a given call path, what is the distribution of call path times and counts in origin/target entities? mobject_read_op: Raw distribution of call times across all origin (client) entities 15s 4.8s read bw: 5700 MiB/s read bw: 2155 MiB/s 3.4s 7s Overloaded server: Large variation in response time Multi-threaded server: Better read perf. and response time
Tracing: Detecting Resource-Level Inefficiencies ● Margo servers spawn a new Argobot User-Level-Task (ULT) for every incoming RPC request ○ Size of pool of tasks waiting to run is a measure of load and responsiveness of system ● We perform request tracing at the 4 instrumentation points previously described: ○ We collect Argobot pool size info, memory usage along request path ○ This enables correlation of call path behaviour with resource usage on node mobject_read_op: Max number of pending Argobot ULT’s along request path 20 pending tasks 7 pending tasks Overloaded server: Pending tasks are Multi-threaded server: Reduction in number stacking up of pending tasks

Recommend

Trends and Challenges in Big Data Ion Stoica November 14, 2016 PDSW-DISCS16 PDSW-DISCS16

Trends and Challenges in Big Data Ion Stoica November 14, 2016 PDSW-DISCS16 PDSW-DISCS16 UC BERKELEY Before starting Disclaimer: I know little about HPC and storage More collaboration than ever between HPC, Distributes Systems, Big

930 views • 68 slides

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO February 2013 1 / 33 Outline Outline Results of the survey What is H YDRA How to use H YDRA Answer to some survey questions

614 views • 33 slides

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC) Team E. Kieffer University of Luxembourg (UL), Luxembourg http://hpc.uni.lu E. Kieffer & Uni.lu HPC Team (University of Luxembourg) Uni.lu HPC

335 views • 11 slides

The Data Accelerator PDSW-DISCS18 WIP Alasdair King SC2018 Data Accelerators Workflows and

The Data Accelerator PDSW-DISCS18 WIP Alasdair King SC2018 Data Accelerators Workflows and Features Stage in/Stage out Storage volumes - namespaces - can persist Transparent Cashing longer than the jobs and shared with multiple

276 views • 14 slides

Data Pallets For Traceable Data Jay Lofstead, Joshua Baker, Andrew Younge PDSW-DISCS WIP

Data Pallets For Traceable Data Jay Lofstead, Joshua Baker, Andrew Younge PDSW-DISCS WIP November 12, 2018 SAND2018-12555 C Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering

462 views • 7 slides

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF: International HPC Certification Program ISC 19, Frankfurt, Germany June 18, 2019 The HPC Certification Forum: https://www.hpc-certification.org/ HPC

272 views • 12 slides

MarFS Metadata Scaling PDSW WIP Report 2016 David Bonnie, Hsing-Bung Chen, Gary Grider, Jeffrey

MarFS Metadata Scaling PDSW WIP Report 2016 David Bonnie, Hsing-Bung Chen, Gary Grider, Jeffrey Inman, BreH KeHering, William Vining LA-UR 16-28615 Metadata scaling components Deploy one drMDS per file system as rank 1 on first node

472 views • 12 slides

Plan (WIP) Funding Mechanisms Upper Susquehanna Watershed Forum October 1, 2019 2 Phase III

1 Phase III Watershed Implementation Plan (WIP) Funding Mechanisms Upper Susquehanna Watershed Forum October 1, 2019 2 Phase III WIP: Funding Information Each major sector has a section of the WIP dedicated to funding programs: Section 5.8:

304 views • 5 slides

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE & EXTENSIBLE REST API REST API Thierry Delprat td@nuxeo.com https://github.com/tiry/ AGENDA AGENDA Quick introduction provide some context API design

1.37k views • 75 slides

UL HPC School 2017 PS6: Debugging, profiling and performance analysis UL High Performance

UL HPC School 2017 PS6: Debugging, profiling and performance analysis UL High Performance Computing (HPC) Team V. Plugaru University of Luxembourg (UL), Luxembourg http://hpc.uni.lu V. Plugaru & UL HPC Team (University of Luxembourg) UL

399 views • 39 slides

Phase 3 WIP - Recommendations Phase 3 WIP - State Work Group Meeting Presentation Wednesday,

Adams County Phase 3 WIP - Recommendations Phase 3 WIP - State Work Group Meeting Presentation Wednesday, February 20, 2019 ADAMS COUNTY PHASE 3 WATERSHED IMPLEMENTATION PLAN OVERVIEW Plan Highlights Adams County benefits from abundant

712 views • 23 slides

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru - https://twitter.com/HPC_Guru Glenn Lockwood - http://www.glennklockwood.com/ http://www.nextplatform.com 2 Whats old is new again? All aspects of HPC are

426 views • 17 slides

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance Computing (HPC) Team C. Parisot University of Luxembourg (UL), Luxembourg http://hpc.uni.lu C. Parisot & UL HPC Team (University of Luxembourg) UL

1.06k views • 65 slides

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High Performance Computing (HPC) Team V. Plugaru University of Luxembourg (UL), Luxembourg http://hpc.uni.lu V. Plugaru & UL HPC Team (University of

1.89k views • 154 slides

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing (HPC) Team C. Parisot University of Luxembourg (UL), Luxembourg http://hpc.uni.lu C. Parisot (University of Luxembourg) UL HPC School 2017 1 / 22

1.3k views • 52 slides

PDSW 2019 4th International Parallel Data Systems Workshop Suzanne McIntosh, General Chair Jay

PDSW 2019 4th International Parallel Data Systems Workshop Suzanne McIntosh, General Chair Jay Lofstead, Vice General Chair Glenn Lockwood, Program Co-chair Philip Carns, Program Co-chair Welcome! The goal of PDSW is to facilitate

358 views • 11 slides

1 KULKUNYA PRAYARACH, PH.D. Multiple Regression Analysis I. Analysis of Data III. Dummy

Multiple Regression Analysis I. Analysis of Data III. Dummy Variable II. Hypothesis Testing IV. Research & Group Work 1 KULKUNYA PRAYARACH, PH.D. Multiple Regression Analysis I. Analysis of Data III. Dummy Variable II. Hypothesis

501 views • 24 slides

City Centre Task Force What Is It? A group formed by Cllr Kelly in response to Coventry

City Centre Task Force What Is It? A group formed by Cllr Kelly in response to Coventry Telegraph Save Our Shops Campaign. City Centre Task Force Who Is It? Key city centre stakeholders including Coventry City Council

676 views • 34 slides

Prolog to Lecture 2 CS 236 On-Line MS Program Networks and Systems Security Peter Reiher

Prolog to Lecture 2 CS 236 On-Line MS Program Networks and Systems Security Peter Reiher Lecture 2 Page 1 CS 236 Online Whats This Prolog Stuff? When I can, I will add a short presentation to each lecture Discussing application

234 views • 10 slides

Disclosures Ms. Bolen serves as a Consultant to Paradigm Labs. 2 1. Review DEA Regulatory

3/9/20 Drugs, Documentation, and DEA Improving your Charting of Prescribing Rationale in 2020 and Beyond, Prepared and Presented by Jen Bolen, JD for PainWeek and PainWeekEnd 1 Disclosures Ms. Bolen serves as a Consultant to Paradigm Labs.

426 views • 15 slides

People Wont Pay for Privacy Adam Shostack adam@homeport.org (Presented at PET2003)

People Wont Pay for Privacy Adam Shostack adam@homeport.org (Presented at PET2003) Motivation n 3 Years at Zero Knowledge Systems n Freedom Network didnt succeed n Problems was sales, not law enforcement n LE moved from scared to a

400 views • 7 slides

RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS Yu

RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS Yu Hua 1 , Dan Feng 1 , Hong Jiang 2 , Lei Tian 1 1 School of Computer, Huazhong University of Science and Technology, China. 2 Department of Computer,

438 views • 11 slides

Log-Powered Test Scenario Generation for Distributed Systems Ivan Beschastnikh Yuriy Brun

http://synoptic.googlecode.com Log-Powered Test Scenario Generation for Distributed Systems Ivan Beschastnikh Yuriy Brun University of Washington Michael D. Ernst Arvind Krishnamurthy Thomas E. Anderson src : 2, dst : 0, timestamp : 4, type

379 views • 14 slides

Unicity of type inhabitants; a Work in Progress Gabriel Scherer Gallium (INRIA

Unicity of type inhabitants; a Work in Progress Gabriel Scherer Gallium (INRIA Paris-Rocquencourt) May 30, 2013 Gabriel Scherer (Gallium) Unique Inhabitants; WIP May 30, 2013 1 / 27 What? This talk is about a problem rather than a solution.

885 views • 37 slides