- A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli*
*NUCAR Group, Northeastern Universiy **AMD
Visualization of OpenCL Application Execution on CPU-GPU Systems
Northeastern University Computer Architecture Research Group
Visualization of OpenCL Application Execution on CPU-GPU Systems - - PowerPoint PPT Presentation
Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research Group Introduction and
*NUCAR Group, Northeastern Universiy **AMD
Northeastern University Computer Architecture Research Group
WCAE 2015 2
WCAE 2015 3
The Multi2Sim Simulation Framework
Islands, NVIDIA Kepler, HSA intermediate language
Full-system simulation Application-level simulation
WCAE 2015 4
Four-Stage Simulation Process
WCAE 2015 5
WCAE 2015 6
The OpenCL CPU Host Program
performs an OpenCL API call.
WCAE 2015 7
The OpenCL Runtime Library
handles the call, and communicates with the driver through system calls ioctl, read, write, etc. These are referred to as ABI calls.
library, running with guest code, transparently intercepts the call. It communicates with the Multi2Sim driver using system calls with codes not reserved in Linux.
WCAE 2015 8
The OpenCL Device Driver
(kernel module) handles the ABI call and communicates with the GPU through the PCIe bus
(Multi2Sim code) intercepts the ABI call and communicates with the GPU emulator
WCAE 2015 9
The GPU Emulator
GPU handles the messages received from the driver
its internal state based on the message received from the driver
WCAE 2015 10
Transferring Control
OpenCL command queue object. A user-level thread associated with the command queue eventually processes the command, performing a LaunchKernel ABI call
launches the GPU emulator
completes
WCAE 2015 11
WCAE 2015 12
Execution Model
communicate efficiently
with each other and executing in any order
WCAE 2015 13
Compute Device
WCAE 2015 14
the compute units.
each compute unit contains a copy of the OpenCL kernel
instructions, partly decodes them, and sends them to the appropriate execution unit
following execution units: scalar unit, vector-memory unit, branch unit, LDS (local data store) unit
Compute Unit
WCAE 2015 15
WCAE 2015 16
17 WCAE 2015
window for navigation
shows workgroups mapped to compute units
Visualization tool - Main Panel
18 WCAE 2015
window for navigation
shows workgroups mapped to compute units
with a pipeline diagram
Visualization tool - Main Panel
memory, write to register and complete
WCAE 2015 19
Visualization tool – GPU pipeline
levels
through default switch cross-bar interconnects,
interconnect configurations
memory module
WCAE 2015 20
The Memory Hierarchy
WCAE 2015 21
sets
The Memory Hierarchy
WCAE 2015 22
The Interconnection Network
is associated to an access from memory hierarchy
can be followed by clicking the detail button on the main panel
containing the network graph
cycle
to the nature of OpenCL application
WCAE 2015 23
The Interconnection Network
clicking detail button on the node panel
WCAE 2015 24
The Memory Snapshot – Identifying application patterns
WCAE 2015 25
The Network Snapshot - Identifying application patterns
WCAE 2015 26
Disasm. Emulation Timing Simulation Graphic Pipelines ARM
X
In progress – – MIPS
X
In progress – – x86
X X X X
AMD Evergreen
X X X X
AMD Southern Islands
X X X X
NVIDIA Fermi
X X
– – NVIDIA Kepler
X
x
– – HSA Intermediate Language
X
x – –
Supported Architectures
WCAE 2015 27
Supported Benchmarks
WCAE 2015 28
WCAE 2015 29
Collaboration Opportunities
de Valencia (Spain), Boston University, AMD, NVIDIA
WCAE 2015 30
Sponsors
WCAE 2015 31