Performance Evaluation of a Multithreaded GPU Using CUDA GPU - PowerPoint PPT Presentation

Dec 18, 2023 •135 likes •211 views

Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU 16 Streaming multiprocessors 8 Streaming processors pr SM 8192 registers pr SM 768 threads pr SM

Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA
GPU architecture
GeForce 8800 GPU • 16 Streaming multiprocessors • 8 Streaming processors pr SM • 8192 registers pr SM • 768 threads pr SM • 8 blocks can be run at a time pr SM • 32 thread warp
Example • 4K by 4K matrix multiplication • 768 threads. • 10 registers pr thread • Potential throughput of 43.2 FLOPS • Performance is 10.58 FLOPS • Global memory access
Tiling • Doing computations on smaller «tiles» • Put tiles in shared memory • 4x4 – Only 16 threads, half warps • 8x8 – Occupies 2 warps, but need 12 blocks to use all threads. Can only use 8. • 12x12 – 144 threads which is not divisible by 32 (warp size). • 16x16 – 256/32 = 8 warps. Use three blocks: 256*3 = 768 • Reduced global load by a factor of 16 • 46.49 GFLOPS • 3 blocks/SM * 256 threads/block * 11 registers = 8488 registers > 8192
Unrolling • Unroll the loop • Removing • Loop address calculation instructions • Iterator variable increments (register) • Potential throughput: 93.72 GFLOPS • Performance: 91.14 GFLOPS

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

3/16/2015 Employee Performance Evaluations and Their Importance Presented by Summer Randall March 18, 2015 What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation Management One time event Ongoing

175 views • 6 slides

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B.

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be DVFS Performance PredicEon performance many applicaEons here memory

301 views • 27 slides

Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded

Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs? Eddy Zheng Zhang Yunlian Jiang Xipeng Shen (presenter) Computer Science Department The College of William and Mary, VA, USA Cache Sharing

909 views • 44 slides

Title goes here Tools for Performance Evaluation Timing and performance evaluation has been

Title goes here Tools for Performance Evaluation Timing and performance evaluation has been an art Experiences and Lessons Learned Resolution of the clock with a Portable Interface to Issues about cache effects Hardware Performance

594 views • 5 slides

RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors Sander De

RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors Sander De Pestel * , Sam Van den Steen * , Shoaib Akram, Lieven Eeckhout * Intel, Belgium Ghent University ISPASS March 25-26, 2019 MadisonWisconsin

414 views • 29 slides

Motivation for Multithreaded Architectures Processors not executing code at their hardware

Motivation for Multithreaded Architectures Processors not executing code at their hardware potential late 70s: performance lost to memory latency 90s: performance not in line with the increasingly complex parallel hardware as

591 views • 32 slides

High-Performance Execution of Multithreaded Workloads on CMPs M. Aater Suleman Advisor: Yale

High-Performance Execution of Multithreaded Workloads on CMPs M. Aater Suleman Advisor: Yale Patt HPS Research Group The University of Texas at Austin 1 How do we use the transistors? More transistors Higher performance core

691 views • 53 slides

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation Telematics 2 / Performance Evaluation (WS 17/18): 04 PE Introduction 1 Overview q Goals of Performance Evaluation q Basic Notions: System and Model

317 views • 27 slides

Factors Impacting Performance of Multithreaded Triangular Solve Sandia National Laboratories is a

Factors Impacting Performance of Multithreaded Triangular Solve Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin company, for the U.S. Department of Energys

644 views • 25 slides

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk for Loops 2 Marc Moreno Maza Scheduling Theory and Implementation 3 University of Western Ontario, London, Ontario (Canada) Measuring Parallelism

531 views • 16 slides

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline Use cases for multithreaded

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline Use cases for multithreaded programs Kernel vs. user-mode threads Concurrencys problems Recall: Why Processes & Threads? Go Goals ls: Mu Multiprogramming

549 views • 39 slides

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User criteria: - Small response times Computing center criteria: - High throughputs 4.1.1 Evaluation of CPU Performance 4.1.1 Evaluation of CPU Performance

478 views • 24 slides

Fararano Fararano DFAP DFAP Final Performance Final Performance Evaluation Evaluation TANGO

Fararano Fararano DFAP DFAP Final Performance Final Performance Evaluation Evaluation TANGO International Meet our Presenters John Dunlop Mission Director, USAID Madagascar Kevin Henry Team Leader, Madagascar DFAP Evaluation Independent

788 views • 40 slides

Performance Evaluation of Performance Evaluation of Phasor Measurement Systems Phasor

PNNL-SA-60867 Panel: International Experience in PMU Applications Performance Evaluation of Performance Evaluation of Phasor Measurement Systems Phasor Measurement Systems Henry Huang (PNNL), Bogdan Kasztenny (GE), Vahid Madani (PG&E), Ken

452 views • 29 slides

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

636 views • 36 slides

The JMT Simulator for The JMT Simulator for Performance Evaluation of Performance Evaluation of

Politecnico di Milano - DEI Milan, Italy The JMT Simulator for The JMT Simulator for Performance Evaluation of Performance Evaluation of Non-Product-Form Queueing Networks Non-Product-Form Queueing Networks Marco Bertoli, Giuliano Casale,

475 views • 32 slides

Deep Learning on GPU Mattias Flt Dept. of Automatic Control Lund Institute of Technology

Deep Learning on GPU Mattias Flt Dept. of Automatic Control Lund Institute of Technology Mattias Flt Deep Learning on GPU Overview What is the difference between CPU and GPU? What is CUDA, and how does it relate to cuBLAS and cuDNN? How

329 views • 13 slides

Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 Office of Science ORNLs Titan Hybrid System: Cray XK7 with AMD

573 views • 12 slides

Discussion of Vector-based Computers and Applicability of Different Types of Programs Weston

Discussion of Vector-based Computers and Applicability of Different Types of Programs Weston Lahr & Matt Myers Agenda Vector Processor vs Super Scalar Scientific Programs Evaluation Metrics Results & Analysis

226 views • 18 slides

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM,

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM, KARI PULLI, JAN KAUTZ NVIDIA RESEARCH DRIVER DISTRACTION GESTURE INTERFACE (http://www.softkinetic.com) DAY DAY NIGHT NO SUNLIGHT NO SUNLIGHT

518 views • 30 slides

SUPERCOMPUTERS TO SUPERCARS Bill Veenhuis Sr. Solutions Architect, Automotive

SUPERCOMPUTERS TO SUPERCARS Bill Veenhuis Sr. Solutions Architect, Automotive bveenhuis@nvidia.com Sept 10, 2015 NVIDIA GAMING AUTO ENTERPRISE HPC & CLOUD THE WORLD LEADER IN VISUAL COMPUTING PARALLEL PROCESSING IS THE KEY CPU GPU

671 views • 32 slides

Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms e , Luis P. Garc a, Javier

Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms e , Luis P. Garc a, Javier Cuenca and Domingo Gregorio Bernab Gim enez Universidad de Murcia/Universidad Polit ecnica de Cartagena Scientific Computing and Parallel

424 views • 24 slides

GPU Servers for Research in Quantum Fluids L. Galantucci HPC & Quantum Summit QEII Centre,

Introduction GPU Servers Research Conclusion GPU Servers for Research in Quantum Fluids L. Galantucci HPC & Quantum Summit QEII Centre, London, 5 February 2019 Introduction GPU Servers Research Conclusion Overview Introduction 1

479 views • 20 slides

OpenCL Application on Mobile GPU: A Case Study Elena Barreras, Juan M. Jimenez, Arian Maghazeh,

12-13 MAY, 2014 Studying Energy Consumption of an OpenCL Application on Mobile GPU: A Case Study Elena Barreras, Juan M. Jimenez, Arian Maghazeh, Unmesh Bordoloi eGPU Compute Applications Conventional Domains Potential Domains GPU ADAS

428 views • 29 slides