Snatch : Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks Dimitrios Skarlatos , Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim, and Josep Torrellas UIUC, OSU, UMN, NVIDIA 1
Motivation: Cost of Power/Ground Pins in 3D stacks DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory VR VR 2
Motivation: Cost of Power/Ground Pins in 3D stacks • Size & cost of packages is proportional to # of pins DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory VR VR 3
Motivation: Cost of Power/Ground Pins in 3D stacks • Size & cost of packages is proportional to # of pins • 3D Stacks: Disjoint Power/Ground pins for Processor and Memory DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory VR VR 4
Motivation: Cost of Power/Ground Pins in 3D stacks • Size & cost of packages is proportional to # of pins • 3D Stacks: Disjoint Power/Ground pins for Processor and Memory • Each dimensioned for the worst case DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory VR VR 5
Motivation: Underutilization of Power Budget • High Processor or Memory Power phases 6
Contribution: Snatch • Dynamically and opportunistically divert power between processor and memory Mem1 die Conventional Mem0 die Mem PDN TSVs Processor die Processor Memory VR VR Proc PDN 7
Contribution: Snatch • Dynamically and opportunistically divert power between processor and memory • On-chip voltage regulator connects the two Power Delivery Networks • Processor or Memory can consume more power for the same # of pins Mem1 die Snatch Mem0 die Mem PDN TSVs Processor die Processor Memory VR VR On-chip VR Proc PDN 8
Impact Compared to Conventional 3D Stacks • For same # of power/ground pins: • Application can consume more power • Up to 23% application speedup • For the same maximum power in Processor and Memory • Fewer pins, about 30% package cost reduction 9
Snatch Outline • Implementation • Operation • Case 1: • Same Max Power in Processor and Memory, reduced # of pins • Case 2: • Same # of pins, improved performance • Evaluation 10
Snatch Outline • Implementation • Operation • Case 1: • Same Max Power in Processor and Memory, reduced # of pins • Case 2: • Same # of pins, improved performance • Evaluation 11
Conventional Implementation DRAM1 die 1.1V TSVs DRAM0 die microbumps 0.8-0.95V Processor die C4 bumps PCB substrate BGA pins Processor Memory 5.5W 4.5W VR VR 12V 12V Cross-section 12
Snatch implementation DRAM1 die 1.1V TSVs DRAM0 die microbumps 0.8-0.95V Processor die C4 bumps PCB substrate BGA pins Processor Memory Single On-Chip VR Single on-chip VR VR 5.5W 4.5W VR 12V 12V Cross-section 13
Snatch implementation • Small 2W on-chip bidirectional VR on Proc die • Bulk of work from off-chip VRs DRAM1 die 1.1V TSVs DRAM0 die microbumps 0.8-0.95V Processor die C4 bumps PCB substrate BGA pins Processor Memory Single On-Chip VR Single on-chip VR VR 5.5W 4.5W VR 12V 12V Cross-section 14
Snatch : Dynamic power reassignment • Up/Down convert power Snatched DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory Single on-chip VR VR VR Single On-Chip VR 15
Snatch : Dynamic power reassignment • Up/Down convert power Snatched 2W 1.1V 0.8-0.95V DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory Single on-chip VR VR VR Single On-Chip VR 16
Snatch : Cross-Section • Small 2W on-chip bidirectional VR on Proc die DRAM1 die TSVs DRAM0 die microbumps Processor die C4 bumps PCB substrate BGA pins Processor Memory Single on-chip VR VR VR Single On-Chip VR Cross-section 17
Snatch : Top Down • Small 2W on-chip bidirectional VR on Proc die 0.8-0.95V DRAM1 die Processor Processor PDN VR TSVs DRAM0 die 5.5W microbumps Processor die On-chip VR C4 bumps PCB substrate Memory BGA pins VR PDN Processor Memory 4.5W Memory Single on-chip VR VR VR Single On-Chip VR 1.1V Top Down Cross-section 18
Snatch Outline • Implementation • Operation • Case 1: • Same Max Power in Processor and Memory, reduced # of pins • Case 2: • Same # of pins, improved performance • Evaluation 19
Snatching Memory Power • On processor intensive phase 5.5W 5.5W Processor Processor PDN VR On-chip VR Memory VR PDN 4.5W Memory 4.5W 20
Snatching Memory Power • On processor intensive phase • Snatch Memory Power TurboBoost Processor 5.5W 7.5W Processor Processor PDN VR On-chip VR 2W Memory VR PDN 4.5W Memory 2.5W 21
Snatching Processor Power • On memory intensive phase • Snatch Processor Power TurboBoost Memory 5.5W 3.5W Processor Processor PDN VR On-chip VR 2W Memory VR PDN 4.5W Memory 6.5W 22
Snatching Decisions • Processor or Memory Intensive Phase? 5.5W 5.5W Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 4.5W 23
Snatching Decisions • Processor or Memory Intensive Phase? • How much Power is available? 5.5W 5.5W Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 4.5W 24
Snatching Decisions • Processor or Memory Intensive Phase? • How much Power is available? 5.5W 5.5W • How much Power can we Snatch ? Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 4.5W 25
Conservative Snatching Algorithm • Keep track of past power values of 10µs epochs 5.5W Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 26
Conservative Snatching Algorithm • Keep track of past power values of 10µs epochs • Average for activity detection 5.5W Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 27
Conservative Snatching Algorithm • Keep track of past power values of 10µs epochs • Average for activity detection 5.5W • MAX for power availability Processor Processor PDN VR On-chip ?W VR Memory VR PDN 4.5W Memory 28
Conservative Snatching Algorithm • Keep track of past power values of 10µs epochs • Average for activity detection 5.5W • MAX for power availability Processor Processor PDN VR • Avoid hysteresis On-chip ?W VR Memory VR PDN 4.5W Memory 29
Snatch Outline • Implementation • Operation • Case 1: • Same Max Power in Processor and Memory, reduced # of pins • Case 2: • Same # of pins, improved performance • Evaluation 30
Conventional Power Provisioning • Processor provisioned for 7.5W Processor Processor PDN VR On-chip VR Memory VR PDN Memory 31
Conventional Power Provisioning • Processor provisioned for 7.5W 7.5W Processor 7.5W Processor PDN VR On-chip VR Memory VR PDN Memory 32
Conventional Power Provisioning • Processor provisioned for 7.5W • Memory provisioned for 6.5W 7.5W Processor 7.5W Processor PDN VR On-chip VR 6.5W Memory VR PDN Memory 6.5W 33
Conventional Power Provisioning • Processor provisioned for 7.5W • Memory provisioned for 6.5W • Total = Processor + Memory = 14W 7.5W Processor 7.5W Processor PDN VR On-chip VR 6.5W Memory VR PDN Memory 6.5W 34
Snatch : Provisioning 3D Stacks Just Right • Processor provisioned for 7.5W • Memory provisioned for 6.5W • Total = Processor + Memory = 14W 7.5W Processor 7.5W Processor PDN VR On-chip VR 6.5W Memory VR PDN Memory 6.5W 35
Snatch : Provisioning 3D Stacks Just Right • Processor provisioned for 7.5W • Memory provisioned for 6.5W • Total = Processor + Memory = 14W 7.5W Processor 7.5W Processor PDN VR On-chip Snatch 2W VR 6.5W Memory VR PDN Memory 6.5W 36
Snatch : Provisioning 3D Stacks Just Right • Processor provisioned for 7.5W - 2W = 5.5W • Memory provisioned for 6.5W • Total = Processor + Memory = 14W 5.5W 5.5W +-2W Processor 7.5W Processor PDN VR On-chip Snatch 2W VR 6.5W Memory VR PDN Memory 6.5W 37
Snatch : Provisioning 3D Stacks Just Right • Processor provisioned for 7.5W - 2W = 5.5W • Memory provisioned for 6.5W - 2W = 4.5W • Total = Processor + Memory = 14W 5.5W 5.5W +-2W Processor 7.5W Processor PDN VR On-chip Snatch 2W VR 6.5W Memory VR 4.5W PDN Memory 4.5W +-2W 38
Snatch : Provisioning 3D Stacks Just Right • Processor provisioned for 7.5W - 2W = 5.5W • Memory provisioned for 6.5W - 2W = 4.5W • Total = Processor + Memory = 14W - 4W = 10W 5.5W 5.5W +-2W Reduce Total Provisioning Processor 7.5W Processor PDN VR from 14W to 10W, approx same performance On-chip Snatch 2W VR 6.5W Memory VR 4.5W PDN Memory 4.5W +-2W 39
Recommend
More recommend