architectures

Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) - PowerPoint PPT Presentation

Heterogeneous- and NUMA-aware Scheduling for Many-Core Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) Computer Science and (*) Computer Science Engineering Chalmers Department University of Technology University of Cyprus


  1. Heterogeneous- and NUMA-aware Scheduling for Many-Core Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) Computer Science and (*) Computer Science Engineering Chalmers Department University of Technology University of Cyprus CASPER: Computer Architecture System Performance Evaluation Research

  2. Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 2

  3. Motivation Many-core Array Multi-core CMP with -10s -100s Array low power cores CMP with10 cores Quad Core Intel SCC CMP with 48 low Dual Core power cores 3

  4. Motivation 1. Distance of Core to the Memory Controller Non Uniform Memory Access  2. Resources of Different Core Frequency 3. Memory Controller Accesses Contention 4

  5. Motivation Executing SPEC CPU2006 and NAS Benchmark Suites on Intel SCC 5

  6. Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 6

  7. Scheduling Policy – Characterizing Applications Determine how the distance and core frequency factors influence applications execution Both Frequency and Distance change in a linear way 7

  8. Scheduling Policy – Characterizing Applications System Prerequisites in order to determine factors of influence: Discrete Couples of cores with one factor varying and the other one constant 8

  9. Scheduling Policy: Implementation In order to determine applications behavior we • monitor their execution Construct at each monitor phase the corresponding • queues of a and b 9

  10. Scheduling Policy: Implementation 10

  11. Scheduling Policy: Implementation 11

  12. Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 12

  13. Experimental Setup Intel SCC Processor • • 48-core P54C Core Architecture 4 DDR3 Memory Controllers per 12-cores • Linux kernel running at each core • • Applications from SPEC CPU2006 and NAS benchmarks (medium working size sets) Povray (compute-bound) • Tile Tile Tile Tile Tile Tile Sphinx (Medium memory-bound) • 5,3 0,3 R R R R R R Tile Tile Tile Tile Tile Tile Libquantum (High memory-bound) • DIMM DIMM MC MC R R R R R R Tile Tile Tile Tile Tile Tile Checkpointing/Resuming using • R R R R R R Tile Tile Tile Tile Tile Tile DIMM DIMM CryoPID library MC 0,0 5,0 MC R R R R R R System Interface • Migration overhead < 1% Tile P54C 256KB Traffic Gen L2 (16K L1) System P C I e FPGA MIU Router Management Console PC P54C 256KB MPB L2 (16K L1) 13

  14. Evaluating Scheduling Policy Scenario 1: Compute-bound and Memory-bound applications Scenario 1 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP povray sphinx combined 14

  15. Evaluating Scheduling Policy Scenario 2: Compute-bound and Memory-bound applications Scenario 2 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP povray libquantum combined 15

  16. Evaluating Scheduling Policy Scenario 3: 1 Compute-bound and 2 Memory-bound applications 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP RND SP povray libquantum sphinx combined 16

  17. Evaluating Scheduling Policy Scenario 4: 2 Memory-bound applications 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP sphinx libquantum combined 17

  18. Evaluating Scheduling Policy Scenario 5: 1 Compute-bound application 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP povray combined 18

  19. Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 19

  20. Conclusions And Future Work  We proposed an online scheduling policy which addresses application demands and characteristics  Implementation on a real many-core architecture using real workloads  Performance Improvement  Compute-bound up to 36%  Memory-bound up to 15% 20

  21. Thank You! CASPER Group University of Cyprus Computer Architecture, Systems and Performance Evaluation Research Visit us: www.cs.ucy.ac.cy/carch/casper 21

Recommend


More recommend