Integrating Acceleration Devices using CometCloud Thomas Beach - - PowerPoint PPT Presentation

integrating acceleration devices using cometcloud
SMART_READER_LITE
LIVE PREVIEW

Integrating Acceleration Devices using CometCloud Thomas Beach - - PowerPoint PPT Presentation

Integrating Acceleration Devices using CometCloud Thomas Beach School of Computer Science & Informatics Cardiff University June 2013 Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 1 / 38 Introduction


slide-1
SLIDE 1

Integrating Acceleration Devices using CometCloud

Thomas Beach

School of Computer Science & Informatics Cardiff University

June 2013

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 1 / 38

slide-2
SLIDE 2

Introduction A heterogeneous cloud system often contains many different types of computing resources. From standard multi core systems, to many core systems, GPUs, physics co-processing devices and FPGAs. GPU Accelerated compute nodes are already available in mainstream cloud providers i.e. Amazon EC2. Often making best use of these devices is difficult within a large system. This presentation will give an overview of our work in utilising CometCloud to integrate these devices into a Cloud Computing system.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 2 / 38

slide-3
SLIDE 3

1

CometCloud

2

Integrating Acceleration Devices

3

Initial Results

4

Future Work

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 3 / 38

slide-4
SLIDE 4

CometCloud

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 4 / 38

slide-5
SLIDE 5

Comet Cloud Comet Cloud is an autonomic computing engine for Cloud environments. Self Managing. Utilises several programming models such as Master/Worker, Workflow and MapReduce/Hadoop.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 5 / 38

slide-6
SLIDE 6

Comet Cloud Structured into a set of masters and secure workers, linked by a distributed coordination space. Unsecured workers are connected to the coordinating space via a Proxy (Request Handler). Task Based approach is used. Each task is an XML tuple, which is inserted into the coordination space.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 6 / 38

slide-7
SLIDE 7

System Architecture

Adopted from Kim et al Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 7 / 38

slide-8
SLIDE 8

Comet Cloud

  • ut (ts, t): a non-blocking operation that inserts tuple t into space

ts in (ts, t): a blocking operation that removes a tuple t matching template t from the space ts and returns it rd (ts, t): a blocking operation that returns a tuple t matching template t from the space ts. The tuple is not removed from the space

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 8 / 38

slide-9
SLIDE 9

CloudBursting and Cloud-Bridging Supports CloudBursting - The ability to dynamically scale out to utilise additional resources. This can be governed by user defined policies. CloudBridging - create a virtual cloud consisting of multiple smaller clouds. Controlled via a scheduler.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 9 / 38

slide-10
SLIDE 10

System Architecture

Adopted from Kim et al Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 10 / 38

slide-11
SLIDE 11

System Architecture

Adopted from Kim et al Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 11 / 38

slide-12
SLIDE 12

Integrating Acceleration Devices

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 12 / 38

slide-13
SLIDE 13

Integration of Acceleration Devices Currently CometCloud will attempt to balance the task load across multiple workers. However, many workers have different characteristics and more intelligent selection is necessary. Selection by worker capability is the first step and works well.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 13 / 38

slide-14
SLIDE 14

Integration of Acceleration Devices Currently CometCloud will attempt to balance the task load across multiple workers. However, many workers have different characteristics and more intelligent selection is necessary. Selection by worker capability is the first step and works well. There scope to gain performance improvement from finer grained decision making.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 13 / 38

slide-15
SLIDE 15

System Architecture

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 14 / 38

slide-16
SLIDE 16

Steps 1,2 - Code Analysis Analyse code and extract kernels (possible parallelisable loops) Extract metrics from each kernel within the application. Generate XML tuples for each kernel containing metrics and any device restrictions. Parallelisable loops could either be detected automatically or specified by the user

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 15 / 38

slide-17
SLIDE 17

Code Analysis Analysing the input code (or specifying via annotation) keeps the input device independent: This gives far more flexibility in terms of matching to a device. Utilising a specific API will restrict what devices can be used, i.e. CUDA. Frees the programmer from having to learn multiple APIs.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 16 / 38

slide-18
SLIDE 18

System Architecture

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 17 / 38

slide-19
SLIDE 19

Step 4,5 - Task Analysis XML tuples representing kernels are only examined by workers which have the capability to execute them. This means only workers that have the capability to actually execute the task will reply. Decisions made on simple rules - could include: software versions, presence of hardware or more detailed hardware specification i.e. CUDA5 compatible device. Device must then reply with Estimated Time to execute and Estimated time until device is available

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 18 / 38

slide-20
SLIDE 20

Performance Prediction In order to undertake performance prediction a series of metrics are extracted by the code analyser: Intensity - A count of mathematical operations per iteration. I/O - A count of the number of memory accesses per iteration(read and write). The number of times branching that occurs per iteration. The number of iterations of the kernel that are performed. Size of data that must be loaded to/from the device.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 19 / 38

slide-21
SLIDE 21

Performance Prediction Performance prediction made utilising WEKA library and the K-Nearest Neighbour algorithm. This enables each device to estimate how long a particular application may take and how long their current job may take. Local decisions on whether application should be accelerated processed with a decision tree.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 20 / 38

slide-22
SLIDE 22

Performance Prediction

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 21 / 38

slide-23
SLIDE 23

System Architecture

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 22 / 38

slide-24
SLIDE 24

Steps 6,7 - Device Selection Currently, one device is selected that gives best overall performance for the application. Estimated runtime + Waiting time for for device to be free Other methods of selection could be: cost, power consumption.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 23 / 38

slide-25
SLIDE 25

Final Step - Code Generation and Execution Once a device has been selected: CUDA bootstrapping code can be generated. Identified Kernels can be wrapped in CUDA method stubs. This allows the kernels extracted to remain device independent.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 24 / 38

slide-26
SLIDE 26

Collecting Performance Data The availability of accurate performance data therefore is critical: The system can self update. New devices entering the system or existing devices that may be under trained can update their training dataset. Inter worker communication using the CometSpace is used to achieve this. Takes advantage of idle runtime.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 25 / 38

slide-27
SLIDE 27

Self-Updating

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 26 / 38

slide-28
SLIDE 28

Initial Results

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 27 / 38

slide-29
SLIDE 29

System Architecture

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 28 / 38

slide-30
SLIDE 30

Step 1: Code Analysis - of un-annotated code Step 2 - Metric Extraction Intensity-111 I/O- 27 Branching-1

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 29 / 38

slide-31
SLIDE 31

Step 4-5: Performance Prediction on metrics Knowledge base of performance data was trained with results from similar applications. Total Number of Performance Measures 48 Measurements within 0.5s 13 Measurements within 1.0s 28 Measurements within 1.5s 7 Mean absolute error 0.70s

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 30 / 38

slide-32
SLIDE 32

Step 6-7: Decision Decision made on device to utilise for application. Currently decision is made based on lowest execution time. In the case of our example - device offering best performance is generally NVidia FERMI GPU.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 31 / 38

slide-33
SLIDE 33

Initial Results - Canny Edge Detector

2 4 6 8 10 12 14 16 1000 2000 3000 4000 5000 6000 7000 8000 Execution Time/s Size of N in NxN dataset CPU Quad Core CPU TESLA GPU FERMI GPU

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 32 / 38

slide-34
SLIDE 34

Initial Results Utilising the system provides a performance improvement of a single core implementation of the algorithm. Performance results are comparable with quad-core implementation.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 33 / 38

slide-35
SLIDE 35

Future Work

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 34 / 38

slide-36
SLIDE 36

Key Questions Is the method of performance prediction used here the most appropriate?

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 35 / 38

slide-37
SLIDE 37

Key Questions Is the method of performance prediction used here the most appropriate? Is it feasible to execute kernels from the same application across multiple devices?

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 35 / 38

slide-38
SLIDE 38

Acceleration Devices as Part of a Larger Federated Cloud When operating in a federated environment, multiple cloud sites would only want to take tasks that they are capable of executing. It is in the interest of the each site within the federation to make best use of its resources internally. CometClouds support of CloudBursting also allows for the possibility of utilising devices from external public clouds.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 36 / 38

slide-39
SLIDE 39

Future Work Multiple cloud systems each with workers with varying capabilities.

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 37 / 38

slide-40
SLIDE 40

Any Questions?

Thomas Beach (COMSC) Integrating Acceleration Devices using CometCloud June 2013 38 / 38