Overview of Performance Prediction Tools for Better Development and - PowerPoint PPT Presentation

Overview of Performance Prediction Tools for Better Development and Tuning Support Universidade Federal Fluminense Rommel Anatoli Quintanilla Cruz / Master's Student Esteban Clua / Associate Professor th GTC 2016, San Jose, CA, USA, April 7 , 2016

What you will learn from this talk ...

Outline • Motivation • Performance models • Applications • Challenges

Performance Optimization Cycle* 1. Profile Application 2. Identify 5. Change and Performance Test Code Limiters 3. Analyze Profile 4. Reflect & Find Indicators * Adapted from S5173 CUDA Optimization with NVIDIA NSIGHT ECLIPSE Edition – GTC 2015

Performance Analysis Tools The PAPI NVIDIA Visual CUDA Profiler Component The NVIDIA CUDA Profiling Tools Interface

Performance tools are still evolving CUDA 7.5 Instruction-level profiling NVIDIA Visual Profiler

Performance tools are still evolving But it's still not enough ... Concurrent Kernel Execution Power Streaming

Performance models Performance model

Performance models Pseudocode Source code PTX Performance model CUBIN Target Device Information Input

Performance models Pseudocode Power consumption Source code estimation PTX Execution time prediction Performance model on a target device CUBIN Performance bottlenecks Target Device identification Information Input Output

Types of performance models Analytical Simulation Models Statistical Models Advantages & Disadvantages

Analytical models The MWP-CWP model [Hong & Kim 2009] MWP: Memory warp parallelism CWP: Computation warp parallelism

Statistical models * GPGPU performance and power estimation using machine learning. - Wu, Gene, et al.

Simulation PTX Emulation PTX Kernel GPU Execution LLVM Translation GPU Ocelot

Applications of performance models Successfully used to … schedule concurrent kernels • Today ! make auto-tuning • estimate power consumption • identify performance bottlenecks • make workload balancing •

Auto-tuning • Optimization goals • Parameters • Large search space

Concurrent Kernel Execution Supported since Fermi Limitations: Registers, Shared Memory, Occupancy * Image from http://www.turkpaylasim.com/cevahir

Challenges Multiple-gpu systems, heterogeneous systems • Each microarchitecture has its own features • More complex execution behavior is harder to • model accurately

References and Further reading Hong, Sunpyo, and Hyesoon Kim. "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness." ACM SIGARCH Computer Architecture News. Vol. 37. No. 3. ACM, 2009. Kim, Hyesoon, et al. "Performance analysis and tuning for general purpose graphics processing units (GPGPU)." Synthesis Lectures on Computer Architecture 7.2 (2012): 1-96. Lopez-Novoa, Unai, Alexander Mendiburu, and José Miguel-Alonso. "A survey of performance modeling and simulation techniques for accelerator-based computing." Parallel and Distributed Systems, IEEE Transactions on 26.1 (2015): 272-281. Zhong, Jianlong, and Bingsheng He. "Kernelet: High-throughput gpu kernel executions with dynamic slicing and scheduling." Parallel and Distributed Systems, IEEE Transactions on 25.6 (2014): 1522-1532.

Acknowledgements

Thank you! #GTC16 http://medialab.ic.uff.br Contact: rquintanillac@ic.uff.br esteban@ic.uff.br

Questions & Answers

Backup Slides

Simplified compilation flow CUDA Compiler .cu nvcc CUDA Front End cudafe .cpu .gpu Host code Device code cicc High level optimizer and PTX generator .ptx Virtual Instruction Set ptxas PTX Optimizing Assembler .cubin CUDA Binary File Host Compiler .fatbinary CUDA Executable

Concurrent Kernel Execution Leftover policy Timeline K1 4 blocks K2 K1 K1 ... ... 16 blocks 16 blocks 16 blocks K2 12 blocks Kernel slicing Timeline K1 6 blocks K1 6 blocks K1 6 blocks K1 6 blocks ... K2 10 blocks K2 10 blocks K2 10 blocks K2 10 blocks * Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS - Jiao, Qing, et al.

Overview of Performance Prediction Tools for Better Development and - PowerPoint PPT Presentation

Overview of Performance Prediction Tools for Better Development and Tuning Support Universidade Federal Fluminense Rommel Anatoli Quintanilla Cruz / Master's Student Esteban Clua / Associate Professor th GTC 2016, San Jose, CA, USA, April 7 ,

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

Better Data, Better Tools, Better Decisions: Introduction to the Office of Computational Science

Better tools for content editors Petr ILLEK Morpht Better tools for content editors Modifiers

CARLIN-TYPE GOLD DISTRICT TSX-V: ATC April 2015 FORWARD LOOKING STATEMENTS FORWARD-LOOKING

Welcome to 3 rd Grade ! Thank You for Coming! Thank you for your support! We look forward to the

2018 City of Davis Davis Chamber of Commerce State of the City Mike Webb City Manager Parks

EVENTS YOU WANT TO SEE & FEEDBACK ON RECENT EXPERIENCES C HARTER Specific Purposes The TCAC

Developing Canadas Only Carlin-Type Gold District April 2019 TSX-V: ATC Forward-Looking

Memory Systems Survey on the Off-Chip Scheduling of Memory Accesses in the Memory Interface of

IEE5008 Autumn 2012 Memory Systems Survey on Memory Access Scheduling For On-Chip Cache

Emission Offsets and TIER Alberta Environment and Parks November 21, 2019 Topics

Overview of Performance Prediction Tools for Better Development and - PowerPoint PPT Presentation

Overview of Performance Prediction Tools for Better Development and Tuning Support Universidade Federal Fluminense Rommel Anatoli Quintanilla Cruz / Master's Student Esteban Clua / Associate Professor th GTC 2016, San Jose, CA, USA, April 7 ,

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

DeepLoc Data set statistics &amp; performance Protein prediction II Gregor Sturm, Johannes Rest,

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

Better Data, Better Tools, Better Decisions: Introduction to the Office of Computational Science

Better tools for content editors Petr ILLEK Morpht Better tools for content editors Modifiers

CARLIN-TYPE GOLD DISTRICT TSX-V: ATC April 2015 FORWARD LOOKING STATEMENTS FORWARD-LOOKING

Welcome to 3 rd Grade ! Thank You for Coming! Thank you for your support! We look forward to the

2018 City of Davis Davis Chamber of Commerce State of the City Mike Webb City Manager Parks

EVENTS YOU WANT TO SEE &amp; FEEDBACK ON RECENT EXPERIENCES C HARTER Specific Purposes The TCAC

Developing Canadas Only Carlin-Type Gold District April 2019 TSX-V: ATC Forward-Looking

Memory Systems Survey on the Off-Chip Scheduling of Memory Accesses in the Memory Interface of

IEE5008 Autumn 2012 Memory Systems Survey on Memory Access Scheduling For On-Chip Cache

Emission Offsets and TIER Alberta Environment and Parks November 21, 2019 Topics

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

EVENTS YOU WANT TO SEE & FEEDBACK ON RECENT EXPERIENCES C HARTER Specific Purposes The TCAC