Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) - - PowerPoint PPT Presentation

architectures
SMART_READER_LITE
LIVE PREVIEW

Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) - - PowerPoint PPT Presentation

Heterogeneous- and NUMA-aware Scheduling for Many-Core Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) Computer Science and (*) Computer Science Engineering Chalmers Department University of Technology University of Cyprus


slide-1
SLIDE 1

Heterogeneous- and NUMA-aware Scheduling for Many-Core Architectures

Panayiotis Petrides(*) Pedro Trancoso (*)(**)

(*) Computer Science Department University of Cyprus

CASPER: Computer Architecture System Performance Evaluation Research

(**) Computer Science and Engineering Chalmers University of Technology

slide-2
SLIDE 2

Outline

  • Motivation
  • Scheduling Policy
  • Experimental Results
  • Conclusions

2

slide-3
SLIDE 3

Motivation

3

Dual Core Quad Core Multi-core Array

CMP with10 cores

Many-core Array

CMP with -10s -100s low power cores

Intel SCC

CMP with 48 low power cores

slide-4
SLIDE 4

Motivation

  • 1. Distance of Core to the Memory Controller
  • Non Uniform Memory Access
  • 2. Resources of Different Core Frequency
  • 3. Memory Controller Accesses Contention

4

slide-5
SLIDE 5

Motivation

Executing SPEC CPU2006 and NAS Benchmark Suites on Intel SCC

5

slide-6
SLIDE 6

Outline

  • Motivation
  • Scheduling Policy
  • Experimental Results
  • Conclusions

6

slide-7
SLIDE 7

Scheduling Policy – Characterizing Applications

Determine how the distance and core frequency factors influence applications execution Both Frequency and Distance change in a linear way

7

slide-8
SLIDE 8

Scheduling Policy – Characterizing Applications

System Prerequisites in order to determine factors of influence: Discrete Couples of cores with one factor varying and the other one constant

8

slide-9
SLIDE 9

Scheduling Policy: Implementation

  • In order to determine applications behavior we

monitor their execution

  • Construct at each monitor phase the corresponding

queues of a and b

9

slide-10
SLIDE 10

Scheduling Policy: Implementation

10

slide-11
SLIDE 11

Scheduling Policy: Implementation

11

slide-12
SLIDE 12

Outline

  • Motivation
  • Scheduling Policy
  • Experimental Results
  • Conclusions

12

slide-13
SLIDE 13

Experimental Setup

  • Intel SCC Processor
  • 48-core P54C Core Architecture
  • 4 DDR3 Memory Controllers per 12-cores
  • Linux kernel running at each core
  • Applications from SPEC CPU2006 and NAS

benchmarks (medium working size sets)

  • Povray (compute-bound)
  • Sphinx (Medium memory-bound)
  • Libquantum (High memory-bound)
  • Checkpointing/Resuming using

CryoPID library

  • Migration overhead < 1%

13

Management Console PC System FPGA

P C I e

Tile Tile R R Tile R Tile R 0,0 0,3 Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R Tile R 5,0 5,3

System Interface

DIMM DIMM DIMM DIMM MC MC MC MC

P54C

(16K L1) 256KB L2

P54C

(16K L1) 256KB L2

MIU

Traffic Gen MPB Router

Tile

slide-14
SLIDE 14

Evaluating Scheduling Policy

Scenario 1: Compute-bound and Memory-bound applications

14 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% RND SP RND SP RND SP povray sphinx combined Normalized Execution Time

Scenario 1

Migration 800MHz 533MHz 266MHz

slide-15
SLIDE 15

Evaluating Scheduling Policy

Scenario 2: Compute-bound and Memory-bound applications

15 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% RND SP RND SP RND SP povray libquantum combined Normalized Execution Time

Scenario 2

Migration 800MHz 533MHz 266MHz

slide-16
SLIDE 16

Evaluating Scheduling Policy

Scenario 3: 1 Compute-bound and 2 Memory-bound applications

16 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% RND SP RND SP RND SP RND SP povray libquantum sphinx combined Normalized Execution Time Migration 800MHz 533MHz 266MHz

slide-17
SLIDE 17

Evaluating Scheduling Policy

Scenario 4: 2 Memory-bound applications

17 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% RND SP RND SP RND SP sphinx libquantum combined Normalized Execution Time Migration 800MHz 533MHz 266MHz

slide-18
SLIDE 18

Evaluating Scheduling Policy

Scenario 5: 1 Compute-bound application

18 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% RND SP RND SP povray combined Normalized Execution Time Migration 800MHz 533MHz 266MHz

slide-19
SLIDE 19

Outline

  • Motivation
  • Scheduling Policy
  • Experimental Results
  • Conclusions

19

slide-20
SLIDE 20

Conclusions And Future Work

  • We proposed an online scheduling policy

which addresses application demands and characteristics

  • Implementation on a real many-core

architecture using real workloads

  • Performance Improvement
  • Compute-bound up to 36%
  • Memory-bound up to 15%

20

slide-21
SLIDE 21

21

Thank You!

CASPER Group University of Cyprus Computer Architecture, Systems and Performance Evaluation Research Visit us: www.cs.ucy.ac.cy/carch/casper