SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD - PowerPoint PPT Presentation

HETEROGENEOUS SYSTEM ARCHITECTURE (HSA) AND THE SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD

OUTLINE Motivation HSA architecture v1 Software stack Workload analysis Software Ecosystem 2

PARADIGM SHIFTS…. Heterogeneous Single-Core Era Multi-Core Era Systems Era Temporarily Enabled by: Constrained by: Enabled by: Constrained by: Enabled by:  Moore’s  Moore’s Law Constrained by: Power Power  Abundant data  SMP Parallel SW Programming Law parallelism Complexity architecture  Voltage Scalability  Power efficient models Comm.overhead GPUs Scaling pthreads  OpenMP / TBB … Shader  CUDA  OpenCL  !!! Assembly  C/C++  Java … Modern Application Single-thread Performance Throughput Performance Performance ? we are here we are here we are here Time Time (# of processors) Time (Data-parallel exploitation) 3

WITNESS DISCRETE CPU AND DISCRETE GPU COMPUTE PCIe CPU CPU CPU … GPU 1 2 N CPU Memory (Coherent) GPU Memory  Compute acceleration works well for large offload  Slow data transfer between CPU and GPU  Expert programming necessary to take advantage of the GPU compute 4

FIRST AND SECOND GENERATION APU S High speed CPU CPU CPU … GPU 1 2 N Internal Bus CPU Partition (Coherent) GPU Partition  First integration of CPU and GPU on-chip  Common physical memory but not to programmer  Faster transfer of data between CPU and GPU to enable more code to run on the GPU 5

COMMON PHYSICAL MEMORY BUT NOT TO PROGRAMMER  CPU explicitly copies data to GPU memory  GPU completes computation  CPU explicitly copies result back to CPU memory GPU CPU | | | | | | | | | | | | | | | | | | | | CPU Memory GPU Memory 6

WHAT ARE THE PROBLEMS WE ARE TRYING TO SOLVE SOCs are quickly following into the same  many CPU core bottlenecks of the PC To move beyond this we need to look at  right processor(s) and/or execution device for given workload at reasonable power While addressing the core issues of  Easier to program  Easier to optimize  Easier to load balance  High performance  Lower power  7

COMBINE INTO UNIFIED PROGRAMMING MODEL Encode Audio Video CPU Decode Processor Hardware Engines Shared Memory, Coherency, User Mode Queues Fixed Image GPU DSP Function Signal Accelerator Processing 8

WHO IS DOING THIS? HSA FOUNDATION MEMBERSHIP – JUNE 2013 Founders Promoters Supporters Contributors Academic Associates 9

HSA FOUNDATION’S FOCUS Identify design features to make accelerators first class processors Attract mainstream programmers Create a platform architecture for ALL accelerators 10

HSA ARCHITECTURE V 1 GPU compute C++ support Coherency, User Shared Memory User Mode Scheduling Encode Audio Video CPU Mode Queues Decode Processor Hardware Fully coherent memory between CPU & GPU GPU uses pageable system Fixed Image memory via CPU pointers GPU Function Signal DSP Acctr Processing GPU graphics pre-emption GPU compute context switch 11

HSA KEY FEATURES Coherent Memory: C P GPU U Ensures CPU and HW Cache Cache GPU Coherency caches both see an up-to-date view of data Pageable memory: Physical Memory The GPU can seamlessly access virtual memory addresses that are not (yet) present in physical memory Virtual Memory Entire memory space: Both CPU and GPU can access and allocate any location in the system’s virtual memory space 12

WITH HSA  CPU simply passes a pointer to GPU  GPU completes computation  CPU can read the result directly – no copying needed! GPU CPU | | | | | | | | | | CPU / GPU Uniform Memory 13

HSA Software Stack HSA ARCHITECTURE V 1 AppsAppsAppsAppsAppsApps GPU compute C++ support HSA Domain Libraries, OpenCL ™ 2.x Runtime User Mode Scheduling HSA Runtime Task Queuing HSA JIT Fully coherent memory Librarie s HSA Kernel between CPU & GPU Mode Driver GPU uses pageable system Coherency, User Shared Memory Encode Audio Video memory via CPU pointers CPU Mode Queues Decode Processor Hardware GPU graphics pre-emption GPU compute context switch Fixed Image GPU Function Signal DSP Acctr Processing 14

HETEROGENEOUS COMPUTE DISPATCH How compute dispatch operates today in the driver model How compute dispatch improves under HSA 15

TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer GPU A HARDWARE Hardware Queue 16

TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware Queue User Kernel Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 17

TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow B A B C User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware Queue User Kernel Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 18

TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow B A B C User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware Queue User Kernel Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 19

HSA COMMAND AND DISPATCH FLOW C C C C Application  Application codes to the C C hardware  User mode queuing Hardware Queue Optional Dispatch  Hardware scheduling Buffer B B  Low dispatch times B GPU Application HARDWARE B  No APIs Hardware Queue  No Soft Queues A A  No User Mode Drivers A Application  No Kernel Mode Transitions A  No Overhead! Hardware Queue 20

COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 21

MAKING GPUS AND APUS EASIER TO PROGRAM: TASK QUEUING RUNTIMES Popular pattern for task and data parallel  programming on SMP systems today Characterized by:  A work queue per core  Runtime library that divides large  loops into tasks and distributes to queues A work stealing runtime that keeps  the system balanced HSA is designed to extend this pattern to  run on heterogeneous systems 22

TASK QUEUING RUNTIME ON CPU S Work Stealing Runtime Q Q Q Q CPU CPU CPU CPU Worker Worker Worker Worker X86 CPU X86 CPU X86 CPU X86 CPU CPU Threads GPU Threads Memory 23

TASK QUEUING RUNTIME ON THE HSA PLATFORM Work Stealing Runtime Q Q Q Q Q CPU CPU CPU CPU GPU Memory Worker Worker Worker Worker Manager X86 CPU X86 CPU X86 CPU X86 CPU Fetch and Dispatch S S S S S I I I I I M M M M M CPU Threads GPU Threads Memory D D D D D 24

Driver Stack HSA Software Stack Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps HSA Domain Libraries, Domain Libraries OpenCL ™ 2.x Runtime OpenCL™ 1.x, DX Runtimes, HSA Runtime User Mode Drivers Task Queuing HSA JIT Libraries HSA Kernel Graphics Kernel Mode Driver Mode Driver Hardware - APUs, CPUs, GPUs User mode component Kernel mode component Components contributed by third parties 25

HSA INTERMEDIATE LANGUAGE - HSAIL HSAIL is the intermediate language for parallel compute in HSA  Generated by a high level compiler (LLVM, gcc, Java VM, etc)  Compiled down to GPU ISA or other parallel processor ISA by an IHV  Finalizer Finalizer may execute at run time, install time or build time, depending  on platform type HSAIL is a low level instruction set designed for parallel compute in a  shared virtual memory environment. HSAIL is SIMT in form and does not dictate hardware microarchitecture HSAIL is designed for fast compile time, moving most optimizations to  HL compiler HSAIL is at the same level as PTX: an intermediate assembly or  Virtual Machine Target Represented as bit-code in in a Brig file format with support late  binding of libraries 26

HSA BRINGS A MODERN OPEN COMPILATION FOUNDATION OpenCL ™ Cuda EDG or CLANG EDG or CLANG NVVM IR SPIR LLVM LLVM PTX HSAIL Hardware HARDWARE This bring about fully competitive rich complete compilation stack architecture for  the creation of a broader set of GPU Computing tools, languages and libraries. HSAIL supports LLVM and other compilers – GCC, Java VM  27

SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD - PowerPoint PPT Presentation

HETEROGENEOUS SYSTEM ARCHITECTURE (HSA) AND THE SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD OUTLINE Motivation HSA architecture v1 Software stack Workload analysis Software Ecosystem 2 PARADIGM SHIFTS.

2 Theory of Ecosystem Services Speaker Dr. Stephen Polasky 2011 ECOSYSTEM SERVICES SEMINAR

HUTAN HUTAN HARAPAN HUTAN HUTAN HARAPAN HARAPAN HARAPAN Ecosystem Restoration Ecosystem

5 Ecosystem Services in Practice: Market-Based Ecosystem Services - From Theory to Application

4 Policy and Management Tools for Ecosystem Services Speaker Pavan Sukhdev 2011 ECOSYSTEM

3 Valuation of Ecosystem Services Speaker Dr. James Boyd 2011 ECOSYSTEM SERVICES SEMINAR

1 Background and History: Ecosystem Services Speaker Barton H. Buzz Thompson, Jr. 2011

Session 3: Ecosystem service classification and links to ecosystem functions and conditions Mark

Lotus Farm/Farm Manager Nobuyuki HIRANO Ecosystem Ecosystem & Economic & Economic

Ecosystem management in the boreal forest Editor : Nicolas Lecomte, PhD Ecosystem management in

Ecosystem Restoratio ion Grant Oppor ortuni tunities es Fiscal Year 2018 Ecosystem

Capturing Coral Reef and Related Ecosystem Services What are ecosystem goods and services?

7 Forecasting and Whats Next for Ecosystem Services Speaker Janet Ranganathan 2011

Development of advice towards ecosystem based fisheries management - ICES advice and ecosystem

and Ecosystem Services Ecosystem services valuation of community forest of Sarpang Dzongkhags: An

Forest ecosystem restoration achieved by large area plantation in South Korea Korea National

Ecosystem Services Assessment Multifunctionality Valuation of Ecosystem services Methods and

Ab initio methods: how/why do they work Ab i iti th d h / h d th k D.Svergun Small Small-

The Economics of Taxation A course on understanding and evaluating tax proposals Friday

Determining Income for the HOME Program June 2018 Welcome Trainer Shawna LaRue Moraille,

Fast Control Plane Analysis Using an Abstract Representation Aaron Gember-Jacobson, Raajay

Andrea Pagnani pagnani@isi.it ISI Foundation Turin Outlook Metabolic modeling Inferring

Agricultural Climate Change Impacts on Moroccan Agriculture and the Economy Including an Analysis

Security of Cyber-Physical Systems Henrik Sandberg hsan@kth.se Department of Automatic Control,

A Workflow Enactment Portal for Bioinformatics Paolo Romano Bioinformatics and Structural

SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD - PowerPoint PPT Presentation

HETEROGENEOUS SYSTEM ARCHITECTURE (HSA) AND THE SOFTWARE ECOSYSTEM MANJU HEGDE, CORPORATE VP, PRODUCTS GROUP, AMD OUTLINE Motivation HSA architecture v1 Software stack Workload analysis Software Ecosystem 2 PARADIGM SHIFTS.

2 Theory of Ecosystem Services Speaker Dr. Stephen Polasky 2011 ECOSYSTEM SERVICES SEMINAR

HUTAN HUTAN HARAPAN HUTAN HUTAN HARAPAN HARAPAN HARAPAN Ecosystem Restoration Ecosystem

5 Ecosystem Services in Practice: Market-Based Ecosystem Services - From Theory to Application

4 Policy and Management Tools for Ecosystem Services Speaker Pavan Sukhdev 2011 ECOSYSTEM

3 Valuation of Ecosystem Services Speaker Dr. James Boyd 2011 ECOSYSTEM SERVICES SEMINAR

1 Background and History: Ecosystem Services Speaker Barton H. Buzz Thompson, Jr. 2011

Session 3: Ecosystem service classification and links to ecosystem functions and conditions Mark

Lotus Farm/Farm Manager Nobuyuki HIRANO Ecosystem Ecosystem &amp; Economic &amp; Economic

Ecosystem management in the boreal forest Editor : Nicolas Lecomte, PhD Ecosystem management in

Ecosystem Restoratio ion Grant Oppor ortuni tunities es Fiscal Year 2018 Ecosystem

Capturing Coral Reef and Related Ecosystem Services What are ecosystem goods and services?

7 Forecasting and Whats Next for Ecosystem Services Speaker Janet Ranganathan 2011

Development of advice towards ecosystem based fisheries management - ICES advice and ecosystem

and Ecosystem Services Ecosystem services valuation of community forest of Sarpang Dzongkhags: An

Forest ecosystem restoration achieved by large area plantation in South Korea Korea National

Ecosystem Services Assessment Multifunctionality Valuation of Ecosystem services Methods and

Ab initio methods: how/why do they work Ab i iti th d h / h d th k D.Svergun Small Small-

The Economics of Taxation A course on understanding and evaluating tax proposals Friday

Determining Income for the HOME Program June 2018 Welcome Trainer Shawna LaRue Moraille,

Fast Control Plane Analysis Using an Abstract Representation Aaron Gember-Jacobson, Raajay

Andrea Pagnani pagnani@isi.it ISI Foundation Turin Outlook Metabolic modeling Inferring

Agricultural Climate Change Impacts on Moroccan Agriculture and the Economy Including an Analysis

Security of Cyber-Physical Systems Henrik Sandberg hsan@kth.se Department of Automatic Control,

A Workflow Enactment Portal for Bioinformatics Paolo Romano Bioinformatics and Structural

Lotus Farm/Farm Manager Nobuyuki HIRANO Ecosystem Ecosystem & Economic & Economic