[PPT] - The future of graphic and mobile memory for new applications August PowerPoint Presentation

SLIDE 1

August 21st, 2016 l JIN KIM l Samsung Electronics

The future of graphic and mobile memory for new applications

SLIDE 2

2/24

Disclaimer

This presentation is intended to provide information concerning memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this

presentation. Samsung reserves the right to make improvements, corrections and/or changes to this

presentation at any time. The information in this presentation or accompanying oral statements may include forward-looking

statements. These forward-looking statements include all matters that are not historical facts, statements

regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.

SLIDE 3

3/24

Memory technology trend

SLIDE 5

5/24

Higher Performance

Autonomous Artificial Intelligence Computer Vision Virtual Reality Memory

Centric

Computing

Lower Power x10 Bandwidth

1 0.5

x0.5 Power Efficiency

Memory is at the core of new applications

256GB/s 30GB/s LP4X LP3 0.7 HBM2 GDDR5 LP4

Source: Samsung

SLIDE 6

6/24

Memory Evolution

Memory-centric system evolution

Extreme B/W, performance/power, data processing, cost effective solutions

Core Clock

SoC

Multi-Core Time

Memory Wall

Efficiency (Perform./Power, Cost)

A.I., VR/MR, Vision Data Traffic, Cost, Thermal

ff-loading

customized processing lower power noise immune high speed Value, UX Perform extension PC/Server, Mobile, Gfx

DDR5/LP5/GDDR6 Low Cost HBM/PIM

SLIDE 7

7/24

Memory technology trend

Power Efficiency [mW/GBps] 100% 80% 60% 40% 20% 2020 2016 2018 Performance [Gbps/pin] 15 12 6 3 LP5 LP4X LP4 2016 2018 2020 DDR4 9 DDR5 GDDR5 LP4

GDDR6 with over 14Gbps, beyond 10Gbps GDDR5
LP5, 20% more power-efficient than LP4X

LP5 GDDR6 DDR5 LP4X GDDR5 DDR4 LP3 DDR3

Source: ISCA2016, Samsung

GDDR6

SLIDE 8

8/24

High Bandwidth Memory: HBM

PCB DRAM Buffer Logic Processor Si Interposer

HBM TSV Technology 1,024 I/O Architecture

Benefits

Microbump

8H stacked 20nm 8GB HBM

HBM GDDR5

X 0.8 Power Efficiency

High Bandwidth 1TB/s

X 2.7 Performance

HBM GDDR5

Source: Samsung

SLIDE 9

9/24

Processing In Memory: PIM

Fill the performance gap and deliver energy-efficient solutions

Processing In-Memory

Better parallelism and lower bus traffic

CPU DRAM Source: Samsung Processing In Buffer Processing In DRAM AP GPU/VPU

Memory off-loading for lower frequency and power

SLIDE 10

10/24

High speed graphic technology ( >10Gbps)

Graphic application requirement
Asymmetric System, Crosstalk, EQ tuning
GDDR6, Low cost HBM, PIM

SLIDE 11

11/24

High speed memory requirement

For 4K real infographic virtual reality, 13.2GB, 1TB/s memory needed
For 4K 3D mixed reality, +3.5GB, 151GB/s memory needed

90 462 3,216 215

1064

3,640 QHD 4K UHD 8K UHD 2 8 13 6

13.2

23.6 QHD 4K UHD 8K UHD Main H/E 1.0 2.7 9.0 1.6

3.5

11.6 QHD 4K UHD 8K UHD Main H/E 28 101 527 42

151

791 QHD 4K UHD 8K UHD

[ Gfx Capacity, GB ] [ B/W, GB/s ] [ Added Capacity, GB ] [ B/W, GB/s ]

Gaming Virtual Reality memory Mixed Reality memory

Source: Samsung

Variable Assumption Poly count, fps, # of texture per fragment, cache hit rate, tri-linear filtered, # of virtual light source, Reflection/refraction ratio, ray bounce depth

SLIDE 12

12/24

Asymmetric system for higher data rate

Focus on the respectively dedicated features to maximize data rate

‒ Smart GPU : Training (Per-bit Timing/EQ) for minimizing static offset/noise ‒ Noise immune DRAM : minimizing dynamic noise (Jitter, ISI/x-talk, clock duty/skew)

CMD/AMD

PLL/DLL Data Tx/Rx

D Q

Clock Phase controller DQ

D Q

Phase Detector

CTLE

DQ[0:7]

D Q D Q

DRAM Core To EDC pin

D Q D Q

DRAM Core WCK_t WCK_c CK_t CK_c CA[0:9] Calibration data

Noise immune circuit/PKG Jitter ISI X-talk Training(Timing/EQ) Board/PKG SI/PI

GPU DRAM

Source: Samsung

SLIDE 13

13/24

X-talk reduction for Board/PKG design

Small X-talk Package : reduction of X-talk with better return path
Crosstalk Reduction with coding : 3B4B, 8B9B

Small X-talk PKG requirement 3B4B encoding

Crosstalk Reduction

GDDR5

Source: Samsung

ICR: Insertion loss to Crosstalk Ratio

SLIDE 14

14/24

DFE for return-loss reduction on system

CTLE & DFE

CTLE and DFE Periodically Calibrated by GPU

Quarter rate DFE with summer in sampler

Adopt merged summer/sampler for fast feedback

Source: Samsung

Single ended signaling requires noise immune equalizer

‒ DFE* is more suitable than CTLE**

* Decision Feedback Equalization ** Continuous Time Linear Equalization

DQ

FIFO

8GHz WCK/WCKB /2 4 4GHz RX EQ

FIFO

MUX TX

CLK buffer

4 4

SLIDE 15

15/24

GDDR6 ideas

High Speed Signaling, 14Gbps ~ 16Gbps, 1.35V

‒ Low jitter clocking with WCK/byte, Per-bit RX/TX equalizer training, X-talk reduction ‒ 2 channel with BL16, same Clock/ADD freq., twice of WCK/DQ freq. Target Timing WCK Clocking

RX

WCK tree

WCK

GPU DRAM

GPLL

TX

WR

RD

Noise immune DRAM

Word  Byte

DQ

14Gbps ~16Gbps 7GHz ~8GHz

GDDR5 GDDR6

CK : 1.75Gbps CMD : 1.75Gbps ADDR : 3.5Gbps WCK : 3.5Gbps DQ : 7Gbps CK : 1.75Gbps CA : 3.5Gbps WCK : 7Gbps DQ : 14~Gbps

Source: Samsung

SLIDE 16

16/24

Low cost HBM for consumer segment

~ 200GBps with smaller # of TSV compared to HBM2

‒ Cost competitiveness ; remove buffer die, reduce # of TSV, organic interposer, etc.. ‒ Need inputs from Client segment for specific features

Challenges

1. IO reduction, Smaller # of TSV
2. Remove buffer die
3. Master/Slave structure
4. Remove ECC
5. Si or organic Interposer

Challenge for HBM Comparison

HBM2 Low cost HBM I/O 1024 ~512 Pin speed 2Gbps 3Gbps ~ BW (GB/s) 256 ~ 200 Cost/GB 1 0.X

PCB Si Interposer

HBM

5 1 2 3 4 Buffer DRAM Logic Processor Source: Samsung

SLIDE 17

17/24

PIM, Deep Learning in DRAM

Parallel processing in buffer to reduce extreme-bandwidth

‒ convolution, subsampling, matrix calculation

Collaborate with accelerator for performance/cost

Processing in Buffer Extreme B/W Requirement

GPGPU

HBM/GDDRx

CPU

Mem

CPU

Mem

Accelerator xHBM xHBM

CPU + GPU+HBM/GDDRx CPU + Acc.s+xHBMs*

X10

(# of core)

Convolution / Subsampling Deep Learning In Buffer

DRAM DRAM DRAM DRAM

Accelerator

Accelerator xHBM xHBM

X10

(# of core)

* xHBM: Extreme HBM

Data movement reduction

SLIDE 18

18/24

Low power mobile technology ( >20%)

Motivation for low power mobile
LP4X / LP5
PIM

SLIDE 19

19/24

Motivation for low power mobile

10 1 10-1 10-2

Power Dissipation [W]

‘00 ‘20 ‘10 ‘15 ‘05 Thermal Limit (hand-held device) Static Power Dynamic Power Power Gap

Lower Power design

[Year]

Power Dissipation Trend

TDP [Watt] GFLOPS (GPU) 300 100 200 3K 4K 1K 5K 2K Desktop Notebook Mobile Oculus Rift (+GTX Card) PC Graphic Performance

PC-level graphic performance and mobile power budget
Power is continuously increasing with limited thermal budget

Source: Samsung

Performance vs. TDP

*TDP(Thermal Design Power)

SLIDE 20

20/24

Lower power solution, LP4X

LP4X Power Reduction LP4X Idea

LP4X : 4266Mbps, VDDQ/VDD = 0.6V/1.1V

‒ IO power reduction with 0.6V VDDQ, Good example of small change but big gain

Source: Samsung

CHANNEL VOH Pre-driver DQ Rterm MNDW MNUP VO0 VDDQ (=1.1V) VOH =VDDQ/3 VREF =VOH/2 1-UI GND VO CHANNEL VO Pre-driver DQ Rterm VDDQL (=0.6V) from AP MNDW MNUP VO0 VDDQ (=1.0V) VOH =VDDQ/2 VREF =VOH/2 1-UI GND VO

LP4X

Conditions : IDD4R(VDDQ+VDD2) Spec Value / 50% Data change each burst transfer / Included process node contribution

LP4 3200 LP4 3733 LP4 4266 LP4X 4266 IO Core

18% Total Power Saving!!!!

LP4

45%

Same Swing Same VOH Half-level VDDQ

1.1V 0.6V

SLIDE 21

21/24

LP5 target & ideas

LP5 : 6400Mbps, VDDQ/VDD < 0.6V/1.1V

‒ Extremely high band-width(~6.4Gbps) and smart power reduction(~20%) LP5 ideas Power Efficiency Trend

Source: Samsung LP2 LP3 LP4 LP4X LP5 [mW/Gbps]

* Pin Speed

LP2 : 800Mbps ~ 1066
LP3 : 1600Mbps ~ 1866
LP4 : 3200Mbps ~ 3733
LP4X : 4266Mbps

20% 35% 39% 18%

CMD Based Data CLK(WCK) WCK Center-tap term Deep Sleep Mode

IDD4W/R reduction
IDD6 reduction
IDD2N reduction

CK/CK R4 : BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15 DQS_C/T DQ 5 : BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15 WCK DQ CK (Single-Ended)

Over 50% IDD2N Reduction Over 5% IDD4W/4R Reduction Over 30% IDD6 Reduction

SLIDE 22

22/24

PIM, Lower power processing

Memory off-loading for reduced power consumption

‒ Reduce the unnecessary data transfer and frame rate control

Collaborate with SoC/AP for performance/power

‒ PoC with special memory for post/pre-processing

AP CIS Display AMBA AHB Display CIS AP Pre/Post Processing In Memory

Memory Off-loading Solution Memory B/W Traffic

VPU

Recognition Distortion FRC Correction Limited Power Budget Severe Data Traffic

SLIDE 23

23/24

Conclusion

SLIDE 24

24/24

Conclusion

Memory requirements have become more strict in time with respect to

performance, power, and cost

Keeps innovating technology to correspond to those requirements

‒ Make efforts to extend the value of traditional memory ‒ Figure out innovative memory solution

Close collaboration with partners is essential for delivering the right

memory solution.

kjh5555@samsung.com

SLIDE 25

The future of graphic and mobile memory for new applications August - - PowerPoint PPT Presentation

The future of graphic and mobile memory for new applications

Disclaimer

Contents

Memory technology trend

Memory is at the core of new applications

Memory-centric system evolution

Memory technology trend

High Bandwidth Memory: HBM

Processing In Memory: PIM

High speed graphic technology ( >10Gbps)

High speed memory requirement

Asymmetric system for higher data rate

X-talk reduction for Board/PKG design

DFE for return-loss reduction on system

GDDR6 ideas

Low cost HBM for consumer segment

PIM, Deep Learning in DRAM

Low power mobile technology ( >20%)

Motivation for low power mobile

Lower power solution, LP4X

LP5 target & ideas

PIM, Lower power processing

Conclusion

Conclusion