Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An - PowerPoint PPT Presentation

Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach Vinícius Garcia Pinto, Lucas Mello Schnorr , Luka Stanisic Arnaud Legrand, Samuel Thibault, Vincent Danjean WSPPD Workshop Porto Alegre, Brazil – September 4th, 2017

Context Current HPC architectures Moving from transistors to heterogeneity scaling Hybrid computing resources: CPUs, GPUs, MICs 2 / 11

Context Current HPC architectures Moving from transistors to heterogeneity scaling Hybrid computing resources: CPUs, GPUs, MICs Programming hybrid platforms Traditional, explicit programming models (MPI, CUDA, OpenMP, pthreads, . . . ) Perfect control � maximal achievable performance 2 / 11

Context Current HPC architectures Moving from transistors to heterogeneity scaling Hybrid computing resources: CPUs, GPUs, MICs Programming hybrid platforms Traditional, explicit programming models (MPI, CUDA, OpenMP, pthreads, . . . ) Perfect control � maximal achievable performance Monolithic codes � hard to develop and maintain Hard to optimize � performance portability Fixed scheduling � sensitive to variability 2 / 11

Context Current HPC architectures Moving from transistors to heterogeneity scaling Hybrid computing resources: CPUs, GPUs, MICs Programming hybrid platforms Traditional, explicit programming models (MPI, CUDA, OpenMP, pthreads, . . . ) Perfect control � maximal achievable performance Monolithic codes � hard to develop and maintain Hard to optimize � performance portability Fixed scheduling � sensitive to variability Recent task-based programming models (PaRSEC, OmpSs, Charm++, StarPU, . . . ) Single, abstract programming model based on DAG Runtime responsible for dynamic scheduling Portability of code and performance 2 / 11

Context Current HPC architectures Moving from transistors to heterogeneity scaling Hybrid computing resources: CPUs, GPUs, MICs Programming hybrid platforms Traditional, explicit programming models (MPI, CUDA, OpenMP, pthreads, . . . ) Perfect control � maximal achievable performance Monolithic codes � hard to develop and maintain Hard to optimize � performance portability Fixed scheduling � sensitive to variability Recent task-based programming models (PaRSEC, OmpSs, Charm++, StarPU, . . . ) Single, abstract programming model based on DAG Runtime responsible for dynamic scheduling Portability of code and performance New challenge � scheduling heuristic 2 / 11

Visualization of Task Scheduling Parallel simulation of superscalar scheduling , Haugen, Kurzak, YarKhan, Dongarra. ICPP 2014 . The QR factorization of a matrix (size: 3960; tiles size: 180) The QUARK scheduler: 48 cores (one node). The Cholesky factorization of a matrix (size: 47040; tiles size: 960) The “MPI-Aware” DMDAS scheduler of StarPU+MPI: 2 nodes with 4 cores and 4 GPUs each. 3 / 11

Related Work: Classical Analysis Tools Space/time view (resources may be hierarchically organized) + bonus Paraver (100K) – https://tools.bsc.es/paraver Projections (35K) – http://charm.cs.uiuc.edu/software FrameSoC (300K+LTTNG) – https://soctrace-inria.github.io/framesoc/ Ravel (19K) – https://github.com/LLNL/ravel Paje (31K in Objective-C) – https://github.com/schnorr/Paje ViTE (27K) – http://vite.gforge.inria.fr/ Tiled Cholesky Factorization from StarPU+MPI visualized with ViTE. 4 / 11

Related Work: Emerging Alternatives Ad hoc visualization of task dependencies (??? SLOC) See VPA 2015 Exploiting DAG structure: DAGViz (??? SLOC) See VPA 2015 Entropy-aware aggregation: Ocelotl (3K+300K) https://github.com/soctrace-inria/ocelotl 5 / 11

Current Tools for Visual Performance Analysis Tools Implemented in C/C++ to scale Interactive (depending on scale) and user friendly (mouse interaction) Large and complex source code, difficult to extend Generally not designed for hybrid platforms and dynamic runtimes Flexible filter calls for scripting capability Lack custom views exploiting application and platform structure 6 / 11

Our (Agile, Scriptable, Flexible) 2-Phase Workflow Adopt modern data analysis tools for scripting → pj_dump + R + tidyverse + ggplot2 + plotly ( ≈ 3.5K SLOC) Workflow Execution: screen (1st phase) + org-mode (2nd phase) read left_join left_join Chameleon/Cholesky dot2csv DAG DAG DAG Execution Traces (FXT) SH DOT CSV FEATHER read outliers ZERO left_join states states starpu_fxt_tool FXT FXT FXT PJ CSV FEATHER FXT C read tree_ fi lter y_coord. entities entities PJ CSV FEATHER Trace pjdump read CPP PAJE links links PJ CSV FEATHER read variable variable PJ CSV FEATHER A Export B Conversion C Reading D Cleaning, fi ltering, derivation E Output In-memory analysis & visualization DAG read K-Iteration user con fi g FEATHER TI Space/time ggplot2 CPP Idleness YAML static plots Outlier read states s e G . a t ABE scarce FEATHER A s t D Ready CPE CPP case TI master entities read TI CPP Submitted FEATHER l CPP v i n a k TI r s MPI transfers s . CPP links read fi lter plotly TI GPU transfers FEATHER CPP interactive TI GFlops read variable CPP TI FEATHER Used Mem. CPP A Reading B Data visualization C Assembly D Analisys Simplified 2-phase workflow (see our forthcoming paper). Fail fast if an idea does not work Workflow can be shared to reproduce (and change) the analysis 7 / 11

Experimental validation: application and platform MORSE – Matrices Over Runtime Systems @ Exascale http://icl.cs.utk.edu/projectsdev/morse/ Tiled Cholesky factorization available in Chameleon dpotrf 0 for (k = 0; k < N; k++) { dtrsm 0 dtrsm 0 dtrsm 0 dtrsm 0 DPOTRF (RW,A[k][k]); dsyrk 0 dgemm 0 dgemm 0 dgemm 0 dsyrk 0 dgemm 0 dgemm 0 dsyrk 0 dgemm 0 dsyrk 0 for (i = k+1; i < N; i++) dpotrf 1 DTRSM (RW,A[i][k], R,A[k][k]); dtrsm 1 dtrsm 1 dtrsm 1 for (i = k+1; i < N; i++) { dsyrk 1 dgemm 1 dgemm 1 dsyrk 1 dgemm 1 dsyrk 1 DSYRK (RW,A[i][i], R,A[i][k]); dpotrf 2 for (j = k+1; j < i; j++) dtrsm 2 dtrsm 2 DGEMM (RW,A[i][j], dsyrk 2 dgemm 2 dsyrk 2 R,A[i][k], R,A[j][k]); dpotrf 3 } dtrsm 3 } dsyrk 3 dpotrf 4 StarPU runtime on these platforms idcin-2.grenoble.grid5000.fr (Digitalis, phased out in February 2017) Two 14-core Intel Xeon E5-2697v3 with Three NVIDIA Titan X 8 / 11

Scheduler Comparison (input: 60×60 of 960×960) DMDA DMDAS WS Unconstrained Constrained Small matrix + interaction (12×12) → try yourself at http://perf-ev-runtime.gforge.inria.fr/vpa2016/ 9 / 11

Conclusion and Ongoing Work Achievements Flexible analysis workflow in ≈ 3.5K SLOC Dynamic task-based applications Multi-node, multi-core, multi-GPU · · · What’s next? Suitable for scheduling specialists Immediate work Investigate data dependencies (scheduler) anomalies on scale 10 / 11

Thank you for your attention! schnorr@inf.ufrgs.br vgpinto@inf.ufrgs.br Questions? Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach. 3rd Workshop on Visual Performance Analysis (VPA) https://hal.inria.fr/hal-01353962 11 / 11

Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An - PowerPoint PPT Presentation

Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach Vincius Garcia Pinto, Lucas Mello Schnorr , Luka Stanisic Arnaud Legrand, Samuel Thibault, Vincent Danjean WSPPD Workshop Porto Alegre, Brazil

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Towards a Hybrid Dynamic Logic for Hybrid Dynamic Systems e Platzer 1 , 2 Andr 1 Carnegie

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

CHAPTER-1 1 Expert Systems Grading: Midterm Exam % 25 Project /Assignments/Quizzes

PVC & HEART FAILURE THERAPEUTIC OPTIONS AND NOVEL APPROACHES Alireza Ghorbani Sharif, MD

9/22/20 Kratom or Baitem: History, Pharmacology, PK, and Regulation Revisited Jeffrey Fudin,

REDDING LIBRARY GREEN ROOF GREEN ROOF PLANNING PLANNING Citizen need Grant

Landscaping Building pollinator habitats for Pollinators Design considerations

OF NE REGIONS DBT-NECAB Workshop Assam Agricultural University, Jorhat 1 September 12 -14

Extracting drug-drug interactions from pharmacological texts. Isabel Segura Bedmar, Cesar de

Building a Digital First Future: Digital Primary Care Congress Chamber of Commerce, Manchester 5

Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An - PowerPoint PPT Presentation

Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach Vincius Garcia Pinto, Lucas Mello Schnorr , Luka Stanisic Arnaud Legrand, Samuel Thibault, Vincent Danjean WSPPD Workshop Porto Alegre, Brazil

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Towards a Hybrid Dynamic Logic for Hybrid Dynamic Systems e Platzer 1 , 2 Andr 1 Carnegie

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

CHAPTER-1 1 Expert Systems Grading: Midterm Exam % 25 Project /Assignments/Quizzes

PVC &amp; HEART FAILURE THERAPEUTIC OPTIONS AND NOVEL APPROACHES Alireza Ghorbani Sharif, MD

9/22/20 Kratom or Baitem: History, Pharmacology, PK, and Regulation Revisited Jeffrey Fudin,

REDDING LIBRARY GREEN ROOF GREEN ROOF PLANNING PLANNING Citizen need Grant

Landscaping Building pollinator habitats for Pollinators Design considerations

OF NE REGIONS DBT-NECAB Workshop Assam Agricultural University, Jorhat 1 September 12 -14

Extracting drug-drug interactions from pharmacological texts. Isabel Segura Bedmar, Cesar de

Building a Digital First Future: Digital Primary Care Congress Chamber of Commerce, Manchester 5

PVC & HEART FAILURE THERAPEUTIC OPTIONS AND NOVEL APPROACHES Alireza Ghorbani Sharif, MD