Snatch : Opportunistically Reassigning Power Allocation between - - PowerPoint PPT Presentation

snatch opportunistically reassigning power allocation
SMART_READER_LITE
LIVE PREVIEW

Snatch : Opportunistically Reassigning Power Allocation between - - PowerPoint PPT Presentation

Snatch : Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks Dimitrios Skarlatos , Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim, and Josep Torrellas


slide-1
SLIDE 1

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim, and Josep Torrellas UIUC, OSU, UMN, NVIDIA

1

slide-2
SLIDE 2

Motivation: Cost of Power/Ground Pins in 3D stacks

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs

2

slide-3
SLIDE 3

Motivation: Cost of Power/Ground Pins in 3D stacks

  • Size & cost of packages is proportional to # of pins

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs

3

slide-4
SLIDE 4

Motivation: Cost of Power/Ground Pins in 3D stacks

  • Size & cost of packages is proportional to # of pins
  • 3D Stacks: Disjoint Power/Ground pins for

Processor and Memory

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs

4

slide-5
SLIDE 5

Motivation: Cost of Power/Ground Pins in 3D stacks

  • Size & cost of packages is proportional to # of pins
  • 3D Stacks: Disjoint Power/Ground pins for

Processor and Memory

  • Each dimensioned for the worst case

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs

5

slide-6
SLIDE 6

Motivation: Underutilization of Power Budget

  • High Processor or Memory Power phases

6

slide-7
SLIDE 7

Contribution: Snatch

  • Dynamically and opportunistically divert power

between processor and memory

Conventional

Processor die Mem0 die Mem1 die Proc PDN TSVs Mem PDN

Processor VR Memory VR

7

slide-8
SLIDE 8

Contribution: Snatch

  • Dynamically and opportunistically divert power between processor and

memory

  • On-chip voltage regulator connects the two Power Delivery Networks
  • Processor or Memory can consume more power for the same # of

pins

Snatch

Processor die Mem0 die Mem1 die Proc PDN TSVs Mem PDN

Processor VR Memory VR

On-chip VR

8

slide-9
SLIDE 9

Impact Compared to Conventional 3D Stacks

  • For same # of power/ground pins:
  • Application can consume more power
  • Up to 23% application speedup
  • For the same maximum power in Processor and

Memory

  • Fewer pins, about 30% package cost reduction

9

slide-10
SLIDE 10

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

10

slide-11
SLIDE 11

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

11

slide-12
SLIDE 12

Conventional Implementation

0.8-0.95V 5.5W 4.5W 1.1V 12V 12V

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs

Cross-section

12

slide-13
SLIDE 13

Snatch implementation

12V 12V

Cross-section

5.5W 4.5W 0.8-0.95V 1.1V

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Single On-Chip VR

13

slide-14
SLIDE 14

Snatch implementation

  • Small 2W on-chip bidirectional VR on Proc die
  • Bulk of work from off-chip VRs

12V 12V

Cross-section

5.5W 4.5W 0.8-0.95V 1.1V

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Single On-Chip VR

14

slide-15
SLIDE 15

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Snatch: Dynamic power reassignment

  • Up/Down convert power Snatched

Single On-Chip VR

15

slide-16
SLIDE 16

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Snatch: Dynamic power reassignment

  • Up/Down convert power Snatched

0.8-0.95V 1.1V

Single On-Chip VR

2W

16

slide-17
SLIDE 17

Snatch: Cross-Section

  • Small 2W on-chip bidirectional VR on Proc die

Cross-section

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Single On-Chip VR

17

slide-18
SLIDE 18

Snatch: Top Down

  • Small 2W on-chip bidirectional VR on Proc die

0.8-0.95V 5.5W 4.5W 1.1V Top Down

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

Memory VR PCB substrate Processor die C4 bumps DRAM0 die DRAM1 die microbumps Processor VR BGA pins TSVs Single on-chip VR

Single On-Chip VR

Cross-section

18

slide-19
SLIDE 19

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

19

slide-20
SLIDE 20

Snatching Memory Power

  • On processor intensive phase

5.5W 4.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

20

slide-21
SLIDE 21

Snatching Memory Power

  • On processor intensive phase
  • Snatch Memory Power TurboBoost Processor

7.5W 2.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

2W

21

slide-22
SLIDE 22

Snatching Processor Power

  • On memory intensive phase
  • Snatch Processor Power TurboBoost Memory

3.5W 6.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

2W

22

slide-23
SLIDE 23

Snatching Decisions

  • Processor or Memory Intensive Phase?

5.5W 4.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W

23

slide-24
SLIDE 24

Snatching Decisions

  • Processor or Memory Intensive Phase?
  • How much Power is available?

5.5W 4.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W

24

slide-25
SLIDE 25

Snatching Decisions

  • Processor or Memory Intensive Phase?
  • How much Power is available?
  • How much Power can we Snatch?

5.5W 4.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W

25

slide-26
SLIDE 26

Conservative Snatching Algorithm

  • Keep track of past power values of 10µs epochs

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W 5.5W 4.5W

26

slide-27
SLIDE 27

Conservative Snatching Algorithm

  • Keep track of past power values of 10µs epochs
  • Average for activity detection

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W 5.5W 4.5W

27

slide-28
SLIDE 28

Conservative Snatching Algorithm

  • Keep track of past power values of 10µs epochs
  • Average for activity detection
  • MAX for power availability

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W 5.5W 4.5W

28

slide-29
SLIDE 29

Conservative Snatching Algorithm

  • Keep track of past power values of 10µs epochs
  • Average for activity detection
  • MAX for power availability
  • Avoid hysteresis

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

?W 5.5W 4.5W

29

slide-30
SLIDE 30

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

30

slide-31
SLIDE 31

Conventional Power Provisioning

  • Processor provisioned for 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

31

slide-32
SLIDE 32

Conventional Power Provisioning

  • Processor provisioned for 7.5W

7.5W 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

32

slide-33
SLIDE 33

Conventional Power Provisioning

  • Processor provisioned for 7.5W
  • Memory provisioned for 6.5W

6.5W 6.5W 7.5W 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

33

slide-34
SLIDE 34

Conventional Power Provisioning

  • Processor provisioned for 7.5W
  • Memory provisioned for 6.5W
  • Total = Processor + Memory = 14W

6.5W 6.5W 7.5W 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

34

slide-35
SLIDE 35

Snatch: Provisioning 3D Stacks Just Right

  • Processor provisioned for 7.5W
  • Memory provisioned for 6.5W
  • Total = Processor + Memory = 14W

6.5W 6.5W 7.5W 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

35

slide-36
SLIDE 36
  • Processor provisioned for 7.5W
  • Memory provisioned for 6.5W
  • Total = Processor + Memory = 14W

Snatch: Provisioning 3D Stacks Just Right

Snatch 2W 6.5W 6.5W 7.5W 7.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

36

slide-37
SLIDE 37

7.5W 5.5W

Snatch: Provisioning 3D Stacks Just Right

5.5W +-2W Snatch 2W 6.5W 6.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

  • Processor provisioned for 7.5W - 2W = 5.5W
  • Memory provisioned for 6.5W
  • Total = Processor + Memory = 14W

37

slide-38
SLIDE 38

6.5W

Snatch: Provisioning 3D Stacks Just Right

4.5W +-2W 7.5W 5.5W 5.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

4.5W

  • Processor provisioned for 7.5W - 2W = 5.5W
  • Memory provisioned for 6.5W - 2W = 4.5W
  • Total = Processor + Memory = 14W

38

slide-39
SLIDE 39

Snatch: Provisioning 3D Stacks Just Right

Reduce Total Provisioning from 14W to 10W, approx same performance 6.5W 4.5W +-2W 7.5W 5.5W 5.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

4.5W

  • Processor provisioned for 7.5W - 2W = 5.5W
  • Memory provisioned for 6.5W - 2W = 4.5W
  • Total = Processor + Memory = 14W - 4W = 10W

39

slide-40
SLIDE 40

Snatch: Provisioning 3D Stacks Just Right

Reduce Total Provisioning from 14W to 10W, approx same performance 30% Reduction in Package Power/Ground Pins 6.5W 4.5W +-2W 7.5W 5.5W 5.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

4.5W

  • Processor provisioned for 7.5W - 2W = 5.5W
  • Memory provisioned for 6.5W - 2W = 4.5W
  • Total = Processor + Memory = 14W - 4W = 10W

40

slide-41
SLIDE 41

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

41

slide-42
SLIDE 42

Conventional Power Provisioning

  • Processor & Memory provisioned for 5.5W & 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

42

slide-43
SLIDE 43

Conventional Power Provisioning

  • Processor & Memory provisioned for 5.5W & 4.5W

5.5W 4.5W 5.5W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

43

slide-44
SLIDE 44

Snatch: Provide Additional Power

  • Processor & Memory provisioned for 5.5W & 4.5W
  • Snatch power

Snatch 2W 4.5W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

5.5W 4.5W 5.5W

44

slide-45
SLIDE 45

Snatch: Opportunistically boost performance

  • Processor & Memory provisioned for 5.5W & 4.5W
  • Snatch power and boost performance

5.5W +-2W 4.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

5.5W 4.5W

45

slide-46
SLIDE 46

Snatch: Boost Performance with Same # of Pins

  • Processor & Memory provisioned for 5.5W & 4.5W
  • Snatch power and boost performance
  • Same # of pins as conventional

5.5W +-2W 4.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

5.5W 4.5W

46

slide-47
SLIDE 47

Snatch: Boost Performance with Same # of Pins

  • Processor & Memory provisioned for 5.5W & 4.5W
  • Snatch power and boost performance
  • Same # of pins as conventional

Higher performance for the same package cost 5.5W +-2W 4.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

5.5W 4.5W

47

slide-48
SLIDE 48

Snatch: Boost Performance with Same # of Pins

  • Processor & Memory provisioned for 5.5W & 4.5W
  • Snatch power and boost performance
  • Same # of pins as conventional

Higher performance for the same package cost IR-drop and EM characteristics remain the same 5.5W +-2W 4.5W +-2W Snatch 2W

Processor

On-chip VR

Memory

Processor VR Memory VR

PDN PDN

5.5W 4.5W

48

slide-49
SLIDE 49

Snatch Outline

  • Implementation
  • Operation
  • Case 1:
  • Same Max Power in Processor and Memory, reduced # of

pins

  • Case 2:
  • Same # of pins, improved performance
  • Evaluation

49

slide-50
SLIDE 50

Evaluation Methodology

  • Case 2: Same # of pins, improved performance
  • Processor: 22nm LP 8-core w/ SESC + McPAT
  • Memory: 4GB 2-layer WideIO2 w/ DRAMSim2
  • Benchmarks: SPLASH-2, NAS, and SPEC

50

slide-51
SLIDE 51

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

51

slide-52
SLIDE 52

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

52

slide-53
SLIDE 53

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline Turbo Boost

53

slide-54
SLIDE 54

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS and Snatch up to 2W P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline Turbo Boost Snatch

54

slide-55
SLIDE 55

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS and Snatch up to 2W P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline Turbo Boost Snatch

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

25% 8%

Snatch boosts performance on average, by 25% against Baseline and 8% against Turbo Boost for Splash and NAS benchmarks

10%

55

slide-56
SLIDE 56

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS and Snatch up to 2W P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline Turbo Boost Snatch

Performance

Speedup 0.4 0.6 0.7 0.9 1.0 1.2 1.3 Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

25% 8%

Snatch boosts performance on average, by 10% against Baseline for SPEC benchmarks Negligible gains against Turbo Boost

10%

56

slide-57
SLIDE 57

Snatching Activity Overview

% of Total Time Snatching 22.5 45 67.5 90 Barnes BT CG Cholesky FFT FMM FT IS LU LU(NAS) MG Radiosity Radix Raytrace SP W-Nsquared W-Spatial AvgM->P mcf milc lbm bzip2 AvgP->M

M->P P->M

57

slide-58
SLIDE 58

Snatching Activity Overview

% of Total Time Snatching 22.5 45 67.5 90 Barnes BT CG Cholesky FFT FMM FT IS LU LU(NAS) MG Radiosity Radix Raytrace SP W-Nsquared W-Spatial AvgM->P mcf milc lbm bzip2 AvgP->M

M P P M

58

slide-59
SLIDE 59

Snatching Activity Overview

% of Total Time Snatching 22.5 45 67.5 90 Barnes BT CG Cholesky FFT FMM FT IS LU LU(NAS) MG Radiosity Radix Raytrace SP W-Nsquared W-Spatial AvgM->P mcf milc lbm bzip2 AvgP->M

M P P M

59

slide-60
SLIDE 60

Snatching Activity Overview

% of Total Time Snatching 22.5 45 67.5 90 Barnes BT CG Cholesky FFT FMM FT IS LU LU(NAS) MG Radiosity Radix Raytrace SP W-Nsquared W-Spatial AvgM->P mcf milc lbm bzip2 AvgP->M

M P P M

Application Snatch on average, 30% for Splash+NAS 9.4% for SPEC

60

slide-61
SLIDE 61

More On the Paper

  • Design and Implementation:
  • On-chip Voltage Regulator
  • Snatch Algorithm
  • Additional Evaluation:
  • Snatch Algorithm
  • Power Delivery Network
  • Pin Reliability
  • 3D Stack Thermals

61

slide-62
SLIDE 62

Summary

  • Snatch: An opportunistic power reassignment

design for 3D Stacked architectures

  • Small on-chip bidirectional VR
  • Processor - Memory phase detection and power

availability estimation

  • Up to 23% application speedup
  • Alternatively, about 30% package cost reduction

62

slide-63
SLIDE 63

63

Image Source : http://www.hutui6.com/data/out/178/67609941-snatch-wallpapers.jpg