Expanding the Boundaries of the AI Revolution:
An In-depth Study of High Bandwidth Memory
Nayoung Lee & Sung Lee | March 2018
An In-depth Study of High Bandwidth Memory Nayoung Lee & Sung - - PowerPoint PPT Presentation
Expanding the Boundaries of the AI Revolution: An In-depth Study of High Bandwidth Memory Nayoung Lee & Sung Lee | March 2018 Table of Contents 1 2 3 1 1 Deep Neural Network Fundamental Concepts Deep Neural Network Simple View
Nayoung Lee & Sung Lee | March 2018
1
3 Source: Standford
Deep Neural Network
(Activation function, Compute) = Multiply & Accumulate sum Weights x Input
Output
Layer Weights x Input Weights x Input
…… …… ……
Simple View
MEM Write MEM Read
GPU Computing Performance bottleneck
4
Δ2x Bandwidth = Δ1.7x performance
1) In-Datacenter Performance Analysis of a Tensor Processing Unit, Norm P. Jouppi et. al, (Google)
5
GDDR/DDR/LPDDR HBM
7
PHY TSV DA ball
DRAM Slice DRAM Slice DRAM Slice DRAM Slice
Interposer
SoC
PHYSide Molding Side Molding
Substrate Soldered on PCB directly Or Use as DIMM Type
Mold DRAM DRAM PCB Substrate8
To Achieve 1TB Bandwidth …
160ea of DDR4-3200 40ea of DDR4-3200 Module 4ea HBM2 in a single 50mm x 50mm Sip
Note: Advil is a registered trademark
HBM provides highest bandwidth compare to other DRAM memories per unit area
GDDR5(X) HBM2
9
Density
8Gb x 12 = 12GB
IO speed
8Gbps - 11Gbps
# of IO
384 bits
Bandwidth
384 – 528GB
Density
8GB x 4 = 32GB
IO speed
2Gbps
# of IO
1024*4 = 4096
Bandwidth
1TB
10
HBM overcomes all DRAM bandwidth challenges
Bandwidth Challenges High Bandwidth + High I/O
11
HBM low speed per pin & Cio reduces power consumption and increases power efficiency
100%
Power Efficiency Power Consumption
(mW/Gbps/pin)
12
HBM and 2.5D integration unlock new system architectures
HPC & Server
(B/W & Capacity)
Network & Graphics
(B/W)
Client-DT & NB
(B/W & Cost) +
Bandwidth Solution Cost Solution
+
Bandwidth Solution Bandwidth Solution
+
Bandwidth Solution Capacity Solution Post-DDR4
+
Post-DDR4
+
B/W B/W B/W B/W & Capacity B/W & Cost
HBM
1) Innovative Design 2) Revolutionary Technological Features 3) Next Generation Line-up Considerations
14
HBM standard adopted by the Joint Electron Device Engineering Council(JEDEC) in 2013, and the current 2nd generation HBM in 2016. Total HBM (+HMC) market expected to increase from $922.7M in 2018 to $3,842.5M by 2023, resulting in CAGR 33%. (Source: RESEARCH AND MARKETS) High bandwidth, high power efficiency and compact form factors have propelled HBM collaboration engagements covering all IT sectors. e.g. Graphics, AI/Deep Learning, HPC, SVR, NTW Router/Switches etc.
Introduction
15
Innovative Design
TSV TSV CH5 CH5 CH7 CH7 CH5 CH7 CH4 CH6 CH1 CH3 CH0 CH2 CH0 CH2 CH1 CH3 CH4 CH6 CH5 CH7 SID1 SID0 BASE DIE CORE DIE 11.87mm 0.72mm
7x7.75x0.72mm x0.72mm PKG KG dimens mension
307GB/s B/W performance
16
Innovative Design
PC0 PC0 PC1 PC1 CH0/1/4/5 CH2/3/6/7
3mm x 6.65m 5mm
16 banks
Burst Length 4
7mm x 8.87m 7mm
Built-In Self Test
17
Innovative Design
18
Underfill TSV Formation Temporary Bond/Debonding Vertical Chip Stacking Wafer Molding
Revolutionary Technical Features
19
Wire Bonding Through Silicon Via
Revolutionary Technical Features
20
Revolutionary Technical Features
Wafer-level Process Qualification PKG-level Product Qualification
Time Dependent Dielectric Breakdown EFR, HTOL, LTOL (Lifetime) Hot Carrier Injection TC, THB, HAST, uHAST, HTS w/ Preconditioning (Environmental) Negative Bias Temp Instability Electrostatic Discharge Electro Migration Latch-up Stress Migration Package Construction Analysis TSV, uBump Electromigration Electrical Characterization
21
Revolutionary Technical Features
Type Direction T0.1% Lifetime Criteria Core Die VDD >> 10 years
@ use condition VSS Base Die VDD VSS TSV VDD VSS
22
Revolutionary Technical Features
Method Target Human Body Model ≥ 2,000V Charged Device Model ≥ 500V
VF-TLP(CDM like) : 1.25ns
Method Target VF-TLP (CDM-like) It2 ≥ ~ 1.xA
* Very Fast Transmission Line Pulse
Direct Access Bump PHY Bump
23
Revolutionary Technical Features
Core Die Base Die
WFBI Logic Test Hot & Cold Test Repair
KGSD
TSV Scan Built-In Stress Hot & Cold Test Speed Test
KGSD HBM Test Flow
24
Revolutionary Technical Features
Area Type Comment PHY
Function Test RD/WT,CL,BL Margin Test Speed, VDD, Setup/Hold Timing
TSV
Function Test RD/WT,CL,BL,TSV interface OS Check TSV Open/Short Check
Logic
Function Test IEEE1500, Function, BIST, Repair Margin Test VDD, Speed, Setup/Hold
Core
Function Test RD/WT, Self Ref, Power Down Margin Test Speed, VDD, Async, Refresh Repair Cell Repair
KGSD HBM Test Coverage
25
Next Generation Line-up
(2.8Gbps~3Gbps may be the realistic max speed on DRAM)
Speed Power Density Scaling
26
Next Generation Line-up Cost Effective Solutions
Sub CPU ROM DRAM SRAM FLASH Analog DSP RF Chip MEMS CMOS Image Sensor Substrate
High Speed Signal Transmission
HBM Logic Organic Substrate (Fine Pitch) Logic HBM Organic Substrate
Si Interposer (TSVless)
TSVless Si-Interposer 2.1D SiP Fan Out SiP on Sub.
BEoL layer (as RDL)
interconnection w/o interposer
pitch RDL trace of Fan Out Package
Source : CEA-Leti
through embedded wave guide in Si-interposer
Low Power and Small Form Factor
with TSV stack
Si Photonics in 2.5D SiP Hetero-generous 3D Stack
HBM Logic Organic Substrate
Come visit us at Booth #711 and learn more about SK hynix memory solutions