Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters - PowerPoint PPT Presentation

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 7, 2012

◮ Work is overdecomposed in objects ◮ Fine-grain task parallelism ◮ Ideal for CPU ◮ Overlap of communication and computation ◮ GPUs rely on massive data-parallelism ◮ Fine grains decrease performance ◮ Each kernel instantiation has substantial overhead ◮ To reduce overhead ◮ Combine fine-grain work units for the GPU ◮ Delay may be insignificant if the work is low priority Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 2/22

Terminology ◮ Agglomeration —composition of distinct work units ◮ Static agglomeration —fixed number of work units are agglomerated ◮ Dynamic agglomeration —number of work units agglomerated varies at runtime Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 3/22

Work Unit Pool CPUs Scheduler Accelerators Accelerator FIFO Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 4/22

scheduleWork( ) Accelerator FIFO agglomerateWork() Work Unit Accelerator Agglomeration Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 5/22

Programmer/Runtime Division ◮ Programmer ◮ Writes GPU kernel for agglomeration ◮ Creates an offset array ◮ Each task’s input might be a different size ◮ Store the offset of each task’s beginning and ending index in the contiguous data arrays ◮ System ◮ Decide what work to execute and when Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 6/22

Application's messages Higher priority GPU message Low-priority agglomeration Application's message messages Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 7/22

Dynamic Agglomeration ◮ Uses the following heuristic ◮ If the “accelerator FIFO” reaches a size limit, work is agglomerated ◮ Typically set based on memory limitations ◮ Else enqueue a low priority message that causes agglomeration ◮ When higher-priority work is being generated, it goes into the FIFO ◮ When it lets up, work is agglomerated ◮ Since low priority work is assumed, not agglomerating aggressively should not impact performance Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 8/22

Non-Agglomerated Data Input A Input B Output Agglomerated Data Input A' Offset A Input B' Offset B Output' Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 9/22

Case study: Molecular2D Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 10/22

Molecular2D ◮ Cells ◮ Execute on CPU ◮ Interactions ◮ Execute on GPU Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 11/22

Molecular 2D Interaction Kernel __global__ void interact(...) { int i = blockIdx.x * blockDim.x + threadIdx.x; // For loop added for agglomeration for(int j = start[i]; j < end[i]; j++) // interaction work } Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 12/22

140 CPU only GPU without agglomeration GPU with agglomeration 120 Execution Time (seconds) 100 80 60 40 20 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Number of Particles Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 13/22

1.18 Speedup of Agglomeration 1.15 1.12 1.09 1.06 1.03 1 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Number of Particles Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 14/22

120 GPU without agglomeration GPU with agglomeration 115 Execution Time (seconds) 110 105 100 95 90 85 500 1000 1500 2000 2500 Number of Particles per Work Unit Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 15/22

Dynamic Scheduled Agglomeration 5.2 Static Agglomeration 5 Execution Time (seconds) 4.8 4.6 4.4 4.2 4 3.8 0 5 10 15 20 25 30 Static Agglomeration Packet Size Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 16/22

Case study: LU Factorization without pivoting Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 17/22

❅ ❅ ❅ ❅ A 1 , 1 A 1 , 2 A 2 , 1 A 2 , 2 Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 18/22

LU Factorization ◮ CPU ◮ Diagonal ◮ Triangular solves ◮ GPU ◮ Matrix-matrix multiples Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 19/22

40 CPU GPU without agglomeration GPU with agglomeration Execution Time (seconds) 30 20 10 0 4096 6144 8192 10240 Matrix Size (X by X) Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 20/22

Dynamic Agglomeration 50 Static Agglomeration 48 Execution Time (seconds) 46 44 42 40 38 36 0 20 40 60 80 100 120 Static Packet Size Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 21/22

Conclusion ◮ For both benchmarks, agglomerating work increases performance ◮ Agglomeration does not need to be application-specific ◮ Statically selecting work units to agglomerate is difficult and may reduce performance ◮ Runtimes can agglomerate automatically ◮ An agglomerating kernel still must written ◮ Obtains better performance than static Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters � Jonathan Lifflander (UIUC) � 22/22

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters - PowerPoint PPT Presentation

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 7, 2012 Work is overdecomposed in objects Fine-grain task

Tokyo Agglomeration Ebisu (1990) Odaiba (1990-2000) Roppongi Hills (2003) Omotesando Hills

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Agglomeration Seminar NyKoSi 22-23 November, 2016 Trondheim NORWAY EURAGGLO KOMAREK

Agglomeration and Trade: State-Level Evidence from U.S. Industries Hakan Yilmazkuday Journal of

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

2020 Lectures on Urban Economics Lecture 3: The Benefits and Costs of Cities Gilles Duranton

s s t rt

The wealth of Australian generations NZ Treasury John Daley Grattan Institute 2 September 2015

Big Data and Internet Thinking Chentao Wu Associate Professor Dept. of Computer Science and

RuralUrban Migration and Urban Unemployment Cities will increasingly become the main players

Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Optimal Transport Networks in Spatial Equilibrium Pablo D. Fajgelbaum Edouard Schaal UCLA/NBER,

Geographical Advantage: Home Market Effect in a Multi-Region World By Kiminori Matsuyama