dws demand aware work stealing in multi programmed multi
play

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core - PowerPoint PPT Presentation

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures Quan Chen, Long Zheng, Minyi Guo Shanghai Jiao Tong University, China 1 PMAM 2014 Outline Background Problem & Motivation Demand-aware


  1. DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures � Quan Chen, Long Zheng, Minyi Guo Shanghai Jiao Tong University, China � 1 PMAM 2014

  2. Outline � • Background • Problem & Motivation • Demand-aware Work-Stealing (DWS) • Evaluation • Conclusions � 2

  3. Background � � Hardware: Multi-core/Many-core Architectures � Scenario: Multiple parallel programs � … P 1 … P i … P n 3

  4. Background-parallel programs � � Traditional parallel programs • Hard to adjust the number of threads at runtime � Task-based parallel programs • Dynamic task scheduling � 4

  5. Work-sharing � Task Task Central task pool Task Task Task Unlock Lock Unlock Lock Worker 1 Worker 2 Worker 3 Worker 4 Lock the central task pool when getting a task 5

  6. Work-stealing � Unlock Lock Task Task Task Task Task Task Task Task Task Task Task Thread 1 Thread 2 Thread 4 Thread 3 6

  7. Problem & Motivation � � Aggressive feature of work-stealing • On a k -core computer, k threads/workers are launched � Existing solutions • Time-sharing - ABP yielding mechanism • Space-sharing - Equal-partitioning � 7

  8. Time-sharing � � ABP yielding mechanism • If a thread fails to steal a task, it goes to sleep � Sleep Active Thread 3 Thread 2 Thread 1 C Cache 8

  9. Space-sharing � � Equal-partitioning mechanism � If m programs co-run on a k -core computer, each program is allocated k/m cores. � … … … P 1 P i P m 9

  10. Demand-aware Work-Stealing (DWS) � � Start from Equal-partitioning � Dynamically balance cores at runtime • If p i cannot fully-utilized a core, it release the core • If p i has too many tasks, it tries to obtain more cores � Obtain Release Runtime Arch. of DWS 10

  11. Stealing algorithm - (Release) � � A worker decides whether to release its core by itself � If a worker fails too many times (T_SLEEP) to steal a new task, it goes to sleep 11

  12. Coordinator - (Obtain) � � The coordinator decides whether to obtain more cores • If a program has too many queued tasks, it should try to get some free cores � How Which? Many? C1: The more queued tasks in a program, the more cores should the program obtain C2: A program can take its allocated cores back C3: A program cannot obtain the busy cores 12

  13. Coordinator - How Many? � � C1: The more queued tasks in a program, the more cores should the program obtain � Num of active workers � N a � Num of queued tasks � N b � How many: Num of free cores � N f � Num of released cores � N r � Num of cores expected � N w � 13

  14. Coordinator - Which? � � N w <= N f • Randomly select N w free cores � N f < N w <= N f +N r (C2) • Select N f free cores + its ( N w -N f ) released core � N w > N f +N r (C3) � • N f free cores+its N r released cores Num of active workers � N a � Num of queued tasks � N b � Num of free cores � N f � Num of released cores � N r � Num of cores expected � N w � 14

  15. Evaluation platform � � A Dual-socket Quad-core computer with Hyper- Threading Technology � Each socket is a Quad-Core Intel Xeon E5620 � Hardware & Configuration � Size/Version � L1/L2 cache size (each core) � 256 KB/1MB � L3 cache size (each socket) � 12 MB � Main memory size � 32 GB � Operation system � Linux 2.6.32-38 � 15

  16. Benchmarks � Calculate execution time: 16

  17. Performance of DWS � DWS can significantly improve the performance of the benchmarks 17

  18. Effectiveness of the coordinator � Without the coordinator, the performance of the benchmarks is degraded 18

  19. Impact of T_SLEEP � We should choose T_SLEEP = k or 2k on a k-core computer 19

  20. Contributions & conclusions � • A modified work-stealing algorithm that enables a program to release the under-utilized cores. • A coordinator to manage the workers. It enables a program to grab and use the under-utilized cores released by other programs. • We have implemented DWS, which achieves a performance gain of up to 32.3% in the best cases compared to traditional work-stealing schedulers. � 20

  21. Thanks! Questions? �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend