Heterogeneous- and NUMA-aware Scheduling for Many-Core Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) Computer Science and (*) Computer Science Engineering Chalmers Department University of Technology University of Cyprus CASPER: Computer Architecture System Performance Evaluation Research
Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 2
Motivation Many-core Array Multi-core CMP with -10s -100s Array low power cores CMP with10 cores Quad Core Intel SCC CMP with 48 low Dual Core power cores 3
Motivation 1. Distance of Core to the Memory Controller Non Uniform Memory Access 2. Resources of Different Core Frequency 3. Memory Controller Accesses Contention 4
Motivation Executing SPEC CPU2006 and NAS Benchmark Suites on Intel SCC 5
Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 6
Scheduling Policy – Characterizing Applications Determine how the distance and core frequency factors influence applications execution Both Frequency and Distance change in a linear way 7
Scheduling Policy – Characterizing Applications System Prerequisites in order to determine factors of influence: Discrete Couples of cores with one factor varying and the other one constant 8
Scheduling Policy: Implementation In order to determine applications behavior we • monitor their execution Construct at each monitor phase the corresponding • queues of a and b 9
Scheduling Policy: Implementation 10
Scheduling Policy: Implementation 11
Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 12
Experimental Setup Intel SCC Processor • • 48-core P54C Core Architecture 4 DDR3 Memory Controllers per 12-cores • Linux kernel running at each core • • Applications from SPEC CPU2006 and NAS benchmarks (medium working size sets) Povray (compute-bound) • Tile Tile Tile Tile Tile Tile Sphinx (Medium memory-bound) • 5,3 0,3 R R R R R R Tile Tile Tile Tile Tile Tile Libquantum (High memory-bound) • DIMM DIMM MC MC R R R R R R Tile Tile Tile Tile Tile Tile Checkpointing/Resuming using • R R R R R R Tile Tile Tile Tile Tile Tile DIMM DIMM CryoPID library MC 0,0 5,0 MC R R R R R R System Interface • Migration overhead < 1% Tile P54C 256KB Traffic Gen L2 (16K L1) System P C I e FPGA MIU Router Management Console PC P54C 256KB MPB L2 (16K L1) 13
Evaluating Scheduling Policy Scenario 1: Compute-bound and Memory-bound applications Scenario 1 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP povray sphinx combined 14
Evaluating Scheduling Policy Scenario 2: Compute-bound and Memory-bound applications Scenario 2 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP povray libquantum combined 15
Evaluating Scheduling Policy Scenario 3: 1 Compute-bound and 2 Memory-bound applications 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP RND SP povray libquantum sphinx combined 16
Evaluating Scheduling Policy Scenario 4: 2 Memory-bound applications 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP RND SP sphinx libquantum combined 17
Evaluating Scheduling Policy Scenario 5: 1 Compute-bound application 120.0% 100.0% Normalized Execution Time 80.0% Migration 60.0% 800MHz 533MHz 40.0% 266MHz 20.0% 0.0% RND SP RND SP povray combined 18
Outline Motivation • Scheduling Policy • Experimental Results • Conclusions • 19
Conclusions And Future Work We proposed an online scheduling policy which addresses application demands and characteristics Implementation on a real many-core architecture using real workloads Performance Improvement Compute-bound up to 36% Memory-bound up to 15% 20
Thank You! CASPER Group University of Cyprus Computer Architecture, Systems and Performance Evaluation Research Visit us: www.cs.ucy.ac.cy/carch/casper 21
Recommend
More recommend