Expanding the Boundaries of the AI Revolution: Changyong Ahn & - - PowerPoint PPT Presentation
Expanding the Boundaries of the AI Revolution: Changyong Ahn & - - PowerPoint PPT Presentation
Expanding the Boundaries of the AI Revolution: Changyong Ahn & Nayoung Lee | March 2019 Outline 1 2 Machine Learning/Deep Learning Use Cases 3 Memory Challenges of Deep Learning Simple View Deep Neural Network Fundamental Concepts
1
Outline
2
3
Machine Learning/Deep Learning Use Cases
4 Source: Standford
Deep Neural Network
Simple View
Σ
(Activation function, Compute) = Multiply & Accumulate sum Weights x Input Output Layer Weights x Input Weights x Input … ……
MEM Write MEM Read
Ye Year CN CNN # of layers rs # of Pa Param rameters rs Mem Memory ry size (MB MB) Top5 5 Er Error Ra Rate 1998 LeNet 8 60K 2012 AlexNet 7 60 million 240 15.3% 2014 GoogleNet 19 4 million 6.67% 2014 VGG Net 16 138 million 574 7.3% 2015 ResNet 50/152 519 3.6%
Deep Neural Network Fundamental Concepts
Memory Challenges of Deep Learning
…
5
Memory Solution for ML/DL Systems
* Source : SK hynix
Memory Sub system hierarchy change
Co Conventio ional DRAM IPM IPM Target Market/Price Broad & Cheap Specific & high Premium Standardization JEDEC Semi Custom Qualification Period Relatively short Relatively long Key factors Price Competitiveness Reliability / Performance
TSV Region
Interposer Substrate(“PCB”)
“In Package Memory”
Fast Storage(SSD) HDD
LLC
DRAM
1) In-Package Memory 2) SCM(“Storage Class Memory”) : 3DXP, PCRAM
Storage Class Memory
Fast Storage(SSD) HDD LLC
IPM
DRAM
<10ns 50ns 100ns – 1us 50-100us ~10ms
6
7
HBM, What’s the difference?
GDDR/DDR/LPDDR HBM
- FBGA
- KGSD
- HBM in 2.5D SiP
PHY TSV DA ball
DRAM Slice DRAM Slice DRAM Slice DRAM Slice
Interposer
SoC
PHY
Side Molding Side Molding
Substrate
Directly soldered on PCB or used as a DI MM
Mold DRAM DRAM PCB Substrate
8
HBM Advantages
More e Bandwid idth th High h Power r Efficien iciency Small ll Form m Factor tor
DDR DDR4 LPDD PDDR4(X (X) GDD GDDR6 HBM BM2 HBM BM2E (JED EDEC) HBM BM3 (TBD) BD) Data rate 3200Mbps 3200Mbps (up to 4266 Mbps) 14Gbps (up to 16Gb ps) 2.4Gbps 2.8Gbps >3.2Gbps (TBD) Pin count x4/x8/x16 x16/ch (2ch per die) x16/x32 x1024 x1024 x1024 Bandwidth 5.4GB/s 12.8(17)GB/s 56GB/s 307GB/s 358GB/s >500GB/s Density (per package) 4Gb/8Gb 8Gb/16Gb/2 4Gb/32Gb 8Gb/16Gb 4GB/8GB 8GB/16GB 8GB/16GB/ 24GB (TBD)
To Achieve 1TB Bandwidth ……
160ea of DDR4-3200 40ea of DDR4-3200 Module 4ea HBM2 in a single 50mm x 50mm Sip
Note: Advil is a registered trademark
9
10
HBM Architecture
HBM2 core
- re die suppor
- rts
ts 4 ps pseudo
- channel
nels s or 2 c chann nnels els Each ch channe nel l consis sists ts of 2 P Pseudo
- Channe
nels ls. . Only y BL4 i is suppor
- rte
ted
Items Target
# of Stack 4/8(Core) + 1(Base) Ch./Slice 2 Total Ch. for KGSD 8/16 (8ch based operation) IO/Ch. 128 Total IO/KGSD 1024(=128 x 8) Address/CMD Dual CMD Data Rate DDR
CH-A CH-B
B0 B1 B2 B3 B4 B5 B6 B7
64 I/O ADD CMD 64 I/O
CH-A CH-B
B0 B1 B2 B3 B4 B5 B6 B7
64 I/O ADD CMD 64 I/O
Next-Gen. System Architecture Leveraging HBM
HPC & Server
(B/W & Capacity)
Network & Graphics
(B/W)
Client-DT & NB
(B/W & Cost) +
Bandwidth Solution Cost Solution
+
Bandwidth Solution Bandwidth Solution
+
Bandwidth Solution Capacity Solution Post-DDR4
+
Post-DDR4
+
B/W B/W B/W B/W & Capacity B/W & Cost
HBM
HBM and 2.5D SiP integration unlock new system architecture
HBM Test Flow
General DRAM Test Flow HBM Test Flow
13
Quality and Reliability Features
SoC Substrate
PMBIST Cell Repair BISS (Built In Self Stress) Microbump Repair Error Correcting Code Storage
HBM
PHY DRAM DRAM DRAM PHY Logic die PHY TSV micro bump
PKG Substrate
Proxy Package
HBM Features enable high quality and reliability at post 2.5D assembly
1 2 3 4 5 6
Interposer
1 2 3 4 5 6
14
Collaterals Available from HBM vendors
Item Remarks Functionality Datasheet (Jedec/Vendor) Verilog (mission mode and DFT) IBIS Hspice Mechanical/Interposer design GDS Bump pad netlist Bump Ballout Thermal Simulation Flotherm Icepak
Future of HBM Solution
HBM would penetrate various market segments in the short future.
Auto motiv e
HBM1
GFX
HBM2
GFX
NT W
HP C Client
HBM3
SVR GFX
NT W
HPC SVR Client PC
Consum er
128GBps 1GB 256GBps 1G/2G/4G/8GB 105C ECC 95C 512GBps 2G/4G/8G/16GB 105C ECC/ In DRAM ECC Optimized Base die Low Cost Ver.
Expansion to various Applications Moving to Volume Market
Con sumer 15
16
17