Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct - - PowerPoint PPT Presentation

safe limits on voltage reduction efficiency in gpus a
SMART_READER_LITE
LIVE PREVIEW

Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct - - PowerPoint PPT Presentation

Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct Measurement Approach Jingwen Leng, Alper Buyuktosunoglu, Ramon Bertran, Pradip Bose, Vijay Janapa Reddi This work is sponsored in part by Defense Advanced Research Projects Agency


slide-1
SLIDE 1

Jingwen Leng,

Alper Buyuktosunoglu, Ramon Bertran, Pradip Bose, Vijay Janapa Reddi

Safe Limits on Voltage Reduction Efficiency in GPUs: a Direct Measurement Approach

This work is sponsored in part by Defense Advanced Research Projects Agency (DARPA), Microsystems Technology Office (MTO), under contract number HR0011-13-C-0022, National Science Foundation (NSF), under grant CCF-1218474, and Semiconductor Research Corporation (SRC). The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense, the NSF, the SRC or the U.S. Government. This document is: Approved for Public Release, Distribution Unlimited.

slide-2
SLIDE 2

2

GPU Energy Efficiency Optimization

slide-3
SLIDE 3

2

GPU Energy Efficiency Optimization

Circuit

DVFS Clock Gating Power Gating

slide-4
SLIDE 4

2

GPU Energy Efficiency Optimization

Circuit

DVFS Clock Gating Power Gating

(Micro)architecture

Control Divergence Warp Scheduler Cache Locality

slide-5
SLIDE 5

2

GPU Energy Efficiency Optimization

Circuit

DVFS Clock Gating Power Gating

(Micro)architecture

Control Divergence Warp Scheduler Cache Locality

Software/ Compiler

Approximation Data Transfer Multi-tasking

slide-6
SLIDE 6

2

GPU Energy Efficiency Optimization

Circuit

DVFS Clock Gating Power Gating

(Micro)architecture

Control Divergence Warp Scheduler Cache Locality

Software/ Compiler

Approximation Data Transfer Multi-tasking Sethia et al. [MICRO’14] Leng et al. [ISCA’13] Majeed et al. [MICRO’13] Fung et al. [MICRO’07] Rogers et al. [MICRO’13] Rhu et al. [MICRO’13] Samadi et al. [MICRO’14] Rossbach et al. [SOSP’11] Park et al. [ASPLOS’15]

slide-7
SLIDE 7

3

Energy Inefficiency at the Voltage Guardband

slide-8
SLIDE 8

3

Energy Inefficiency at the Voltage Guardband

Required Supply Voltage Voltage Guardband Operating Supply Voltage

slide-9
SLIDE 9

3

Energy Inefficiency at the Voltage Guardband

Required Supply Voltage Voltage Guardband Process Tempera

  • ture

Voltage Operating Supply Voltage

slide-10
SLIDE 10

3

Energy Inefficiency at the Voltage Guardband

Required Supply Voltage Voltage Guardband Operating Supply Voltage

slide-11
SLIDE 11

3

Energy Inefficiency at the Voltage Guardband

Required Supply Voltage Voltage Guardband Operating Supply Voltage

slide-12
SLIDE 12

3

Energy Inefficiency at the Voltage Guardband

Required Supply Voltage Voltage Guardband Operating Supply Voltage Reduced voltage ➔ energy savings

slide-13
SLIDE 13

4

Our Contributions

slide-14
SLIDE 14

4

Our Contributions

slide-15
SLIDE 15

4

Our Contributions

  • Voltage guardband measurement

GTX 480 GTX 580 GTX 680 GTX 780

slide-16
SLIDE 16

4

Our Contributions

  • Voltage guardband measurement

GTX 480 GTX 580 GTX 680 GTX 780

slide-17
SLIDE 17

4

Our Contributions

  • Voltage guardband measurement
  • Guardband analysis

?% ?% ?%

GTX 480 GTX 580 GTX 680 GTX 780

slide-18
SLIDE 18

4

Our Contributions

  • Voltage guardband measurement
  • Guardband analysis
  • Program-driven predictive guardbanding

?% ?% ?%

Kernel Kernel Kernel

Actual Required Voltage

Predicted Voltage Energy Saving Nominal Voltage Energy Saving GTX 480 GTX 580 GTX 680 GTX 780

slide-19
SLIDE 19

5

Voltage Guardband Measurement

slide-20
SLIDE 20

5

Voltage Guardband Measurement

  • Eight GPU cards in

total

  • Four generations
  • Two different

architectures

GTX 480 x1 GTX 580 x1 GTX 680 x1 GTX 780 x5

slide-21
SLIDE 21

5

Voltage Guardband Measurement

  • Eight GPU cards in

total

  • Four generations
  • Two different

architectures

GTX 480 x1 GTX 580 x1 GTX 680 x1 GTX 780 x5

  • Fifty-seven representative

CUDA programs

  • Regular/irregular
  • Memory/arithmetic

intensive

slide-22
SLIDE 22

6

Vmin Measurement

CUDA Programs

slide-23
SLIDE 23

6

Vmin Measurement

CUDA Programs

GPU VDD

Stock Setting

Nominal VDD

slide-24
SLIDE 24

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

slide-25
SLIDE 25

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

slide-26
SLIDE 26

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-27
SLIDE 27

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-28
SLIDE 28

6

Vmin Measurement

CUDA Programs Program Output Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-29
SLIDE 29

Check Correctness

6

Vmin Measurement

CUDA Programs Program Output Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-30
SLIDE 30

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-31
SLIDE 31

Program Output Check Correctness

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-32
SLIDE 32

Program Output Check Correctness

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

slide-33
SLIDE 33

Program Output Check Correctness

6

Vmin Measurement

CUDA Programs Program Output

GPU VDD

Stock Setting

Nominal VDD

Undervolting

GPU VDD

Nominal VDD

Vmin: minimal working voltage at nominal frequency

slide-34
SLIDE 34

7

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs measured Vmin on GTX 680 card @ 1.1 GHz nominal VDD 1.09 V GPU VDD (V)

slide-35
SLIDE 35

7

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 18% of nominal VDD myocyte measured Vmin on GTX 680 card @ 1.1 GHz nominal VDD 1.09 V GPU VDD (V)

slide-36
SLIDE 36

8

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 9% of nominal VDD convolutionFFT2D GPU VDD (V) measured Vmin on GTX 680 card @ 1.1 GHz

slide-37
SLIDE 37

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 9% of nominal VDD 18% of nominal VDD

9

GPU VDD (V)

slide-38
SLIDE 38

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 9% of nominal VDD 18% of nominal VDD

9

GPU VDD (V)

  • Voltage guardband: 9% - 18%
slide-39
SLIDE 39

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 9% of nominal VDD 18% of nominal VDD

9

GPU VDD (V)

  • Voltage guardband: 9% - 18%
  • Energy savings: 14% - 25% at the card level
slide-40
SLIDE 40

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs

10

GPU VDD (V)

slide-41
SLIDE 41

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 0.1 V

10

GPU VDD (V)

slide-42
SLIDE 42

Measurement Results

0.85 0.9 0.95 1 1.05 1.1 57 Programs 0.1 V

10

GPU VDD (V)

Vmin is program dependent

slide-43
SLIDE 43

11

  • Guardband measurement
  • Guardband analysis
  • Guardband optimization

Executive Summary

slide-44
SLIDE 44

12

Voltage Guardband Analysis

slide-45
SLIDE 45

12

Voltage Guardband Analysis

Process Temperature Voltage

slide-46
SLIDE 46

12

Voltage Guardband Analysis

Process Temperature Voltage

?% ?% ?%

slide-47
SLIDE 47

12

Voltage Guardband Analysis

Programs

Process Temperature Voltage

?% ?% ?%

GPU VDD (V)

slide-48
SLIDE 48

13

Voltage Guardband Analysis

Programs

Process Temperature Voltage

?% ?% ?%

GPU VDD (V)

slide-49
SLIDE 49

14

Voltage Guardband Analysis

Programs

Process Temperature Voltage

?% ?% ?%

GPU VDD (V)

slide-50
SLIDE 50

15

Process Variation Impact

slide-51
SLIDE 51

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-52
SLIDE 52

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-53
SLIDE 53

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-54
SLIDE 54

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-55
SLIDE 55

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-56
SLIDE 56

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V)

slide-57
SLIDE 57

0.85 0.9 0.95 1 1.05 Programs Card 1 Card 2 Card 3 Card 4 Card 5

15

Process Variation Impact

GPU VDD (V) Process variation ➔ 0.07 V maximum difference

slide-58
SLIDE 58

Temperature Variation Impact

16

slide-59
SLIDE 59

0.8 0.85 0.9 0.95 1 Programs 40 C 70 C

Temperature Variation Impact

GPU VDD (V)

16

slide-60
SLIDE 60

0.8 0.85 0.9 0.95 1 Programs 40 C 70 C

Temperature Variation Impact

GPU VDD (V)

16

slide-61
SLIDE 61

0.8 0.85 0.9 0.95 1 Programs 40 C 70 C

Temperature Variation Impact

GPU VDD (V)

16

slide-62
SLIDE 62

0.8 0.85 0.9 0.95 1 Programs 40 C 70 C

Temperature Variation Impact

GPU VDD (V)

16

Temperature variation ➔ 0.04 V maximum difference

slide-63
SLIDE 63

17

Combined PVT Analysis

slide-64
SLIDE 64

17

Combined PVT Analysis

0.85 0.9 0.95 1 57 Programs GPU VDD (V)

slide-65
SLIDE 65

17

Combined PVT Analysis

0.85 0.9 0.95 1 57 Programs

  • Process and temperature variation ➔ relatively uniform

impact on ALL programs GPU VDD (V)

slide-66
SLIDE 66

17

Combined PVT Analysis

0.85 0.9 0.95 1 57 Programs 0.1 V

  • Process and temperature variation ➔ relatively uniform

impact on ALL programs GPU VDD (V)

slide-67
SLIDE 67

17

Combined PVT Analysis

0.85 0.9 0.95 1 57 Programs 0.1 V

  • Process and temperature variation ➔ relatively uniform

impact on ALL programs

  • Voltage variation ➔ 0.1 V difference across programs

GPU VDD (V)

slide-68
SLIDE 68

18

Combined PVT Analysis

Process Tempera

  • ture

Voltage ?% ?% ?%

slide-69
SLIDE 69

19

Combined PVT Analysis

Process Tempera

  • ture

Voltage 0.07V 0.1V 0.04V

slide-70
SLIDE 70

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

slide-71
SLIDE 71

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR

slide-72
SLIDE 72

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR

slide-73
SLIDE 73

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR Current

slide-74
SLIDE 74

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR Current VDD

slide-75
SLIDE 75

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR Current VDD IR drop

slide-76
SLIDE 76

20

Voltage Noise Background

Voltage Regulator Bulk Caps

Board Caps

Die C4s Package PCB Board

Package Caps

VVR Current di/dt droop VDD IR drop

slide-77
SLIDE 77

21

Where does di/dt droop come from?

GPU Program

Kernel Kernel Kernel

slide-78
SLIDE 78

21

  • Kernel based activity patterns

Where does di/dt droop come from?

GPU Program

Kernel Kernel Kernel

slide-79
SLIDE 79

21

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

Where does di/dt droop come from?

GPU Program

Kernel Kernel Kernel

slide-80
SLIDE 80

22

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel
slide-81
SLIDE 81

22

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

slide-82
SLIDE 82

22

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

GPU Current

slide-83
SLIDE 83

23

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

slide-84
SLIDE 84

23

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

GPU Current

slide-85
SLIDE 85

23

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

GPU Current

slide-86
SLIDE 86

24

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel
slide-87
SLIDE 87

24

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

slide-88
SLIDE 88

24

Where does di/dt droop come from?

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

GPU Current

slide-89
SLIDE 89

24

Where does di/dt droop come from?

Intra kernel

  • Kernel based activity patterns
  • Inter kernel
  • Initial kernel
  • Intra kernel

GPU Program

Kernel Kernel Kernel

GPU Current

slide-90
SLIDE 90

25

Kernel Level Vmin Measurement

slide-91
SLIDE 91

25

Kernel Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

slide-92
SLIDE 92

25

Kernel Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

Supply Voltage Undervolting

slide-93
SLIDE 93

25

Kernel Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

Supply Voltage Undervolting

Kernel-Level Vmin

slide-94
SLIDE 94

26

Kernel Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

slide-95
SLIDE 95

26

Kernel Level Vmin Measurement

Supply Voltage Undervolting

GPU Program

Kernel Kernel Kernel

slide-96
SLIDE 96

26

Kernel Level Vmin Measurement

Supply Voltage Undervolting

Kernel-Level Vmin

GPU Program

Kernel Kernel Kernel

slide-97
SLIDE 97

27

Program Level Vmin Measurement

slide-98
SLIDE 98

27

Program Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

slide-99
SLIDE 99

27

Program Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

Supply Voltage

slide-100
SLIDE 100

27

Program Level Vmin Measurement

GPU Program

Kernel Kernel Kernel

Supply Voltage

Undervolting

slide-101
SLIDE 101

27

Program Level Vmin Measurement

Program-Level Vmin

GPU Program

Kernel Kernel Kernel

Supply Voltage

Undervolting

slide-102
SLIDE 102

28

Program/Kernel Level Vmin Comparison

slide-103
SLIDE 103

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

GPU VDD (V)

slide-104
SLIDE 104

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

GPU VDD (V)

slide-105
SLIDE 105

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

GPU VDD (V)

slide-106
SLIDE 106

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

GPU VDD (V)

  • Program-level Vmin same as maximum kernel-level Vmin
slide-107
SLIDE 107

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

GPU VDD (V)

  • Program-level Vmin same as maximum kernel-level Vmin
  • Inter-kernel activity does not determine Vmin value
slide-108
SLIDE 108

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

concurrentKernels GPU VDD (V)

  • Program-level Vmin same as maximum kernel-level Vmin
  • Inter-kernel activity does not determine Vmin value
slide-109
SLIDE 109

28

Program/Kernel Level Vmin Comparison

0.88 0.91 0.94 0.97 1

Program-level Max Kernel-level

concurrentKernels GPU VDD (V)

  • Program-level Vmin same as maximum kernel-level Vmin
  • Inter-kernel activity does not determine Vmin value
slide-110
SLIDE 110

29

  • Guardband measurement
  • Guardband analysis
  • Guardband optimization

Executive Summary

slide-111
SLIDE 111

30

Predictive Guardbanding

slide-112
SLIDE 112

30

Predictive Guardbanding

Kernel Kernel Kernel

Nominal Voltage

slide-113
SLIDE 113

30

Predictive Guardbanding

Kernel Kernel Kernel

Nominal Voltage

Actual Required Voltage

slide-114
SLIDE 114

30

Predictive Guardbanding

Kernel Kernel Kernel

Nominal Voltage

Actual Required Voltage

Energy Saving Opportunity

slide-115
SLIDE 115

31

Predictive Guardbanding

Kernel Kernel Kernel

Predicted Voltage

Actual Required Voltage

slide-116
SLIDE 116

31

Predictive Guardbanding

Kernel Kernel Kernel

Nominal Voltage Predicted Voltage

Actual Required Voltage

slide-117
SLIDE 117

31

Predictive Guardbanding

Kernel Kernel Kernel

Nominal Voltage Predicted Voltage

Actual Required Voltage

Energy Saving

slide-118
SLIDE 118

31

Predictive Guardbanding

  • Exploit program-dependent Vmin behavior
  • Program/kernel level Vmin prediction

Kernel Kernel Kernel

Nominal Voltage Predicted Voltage

Actual Required Voltage

Energy Saving

slide-119
SLIDE 119

32

Performance Counter Based Vmin Prediction

slide-120
SLIDE 120

32

  • Use all available performance counters to

construct a Vmin prediction model

Performance Counter Based Vmin Prediction

slide-121
SLIDE 121

32

  • Use all available performance counters to

construct a Vmin prediction model

Performance Counter Based Vmin Prediction

Neural network RMSE: 0.5%, max error: 3%

slide-122
SLIDE 122

33

Energy Efficiency Optimization Potential

slide-123
SLIDE 123

33

Energy Efficiency Optimization Potential

Energy Savings (%) 10 14 18 22 26 30

Oracle Neural network

slide-124
SLIDE 124

33

Energy Efficiency Optimization Potential

Energy Savings (%) 10 14 18 22 26 30

Oracle Neural network

slide-125
SLIDE 125

33

Energy Efficiency Optimization Potential

Energy Savings (%) 10 14 18 22 26 30

Oracle Neural network

slide-126
SLIDE 126

33

Energy Efficiency Optimization Potential

Energy Savings (%) 10 14 18 22 26 30

Oracle Neural network

Average

21% 17%

slide-127
SLIDE 127

33

Energy Efficiency Optimization Potential

Energy Savings (%) 10 14 18 22 26 30

Oracle Neural network

Average

21% 17%

slide-128
SLIDE 128

34

Conclusion

slide-129
SLIDE 129

34

Large amount (up to 20%) of voltage guardband for GPUs

Conclusion

slide-130
SLIDE 130

34

Large amount (up to 20%) of voltage guardband for GPUs

Conclusion

Intra-kernel di/dt droop is the largest guardband determinant

slide-131
SLIDE 131

34

Large amount (up to 20%) of voltage guardband for GPUs

Conclusion

Intra-kernel di/dt droop is the largest guardband determinant We show the potential of program-driven predictive guardbanding

Kernel Kernel Kernel

Predicted Voltage

Actual Required Voltage

Energy Saving Nominal Voltage Energy Saving