Pivotal Memory Technologies Enabling New Generation of AI Workloads - PowerPoint PPT Presentation

Pivotal Memory Technologies Enabling New Generation of AI Workloads Tien Shiah Memory Product Marketing Samsung Semiconductor Inc.

Legal Disclaimer This presentation is intended to provide information concerning the memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward- looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward- looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative of developments in future periods.

Applications drive Changes in Architectures FPGA’s GPU/TPU x86 Non-x86 Apps processors Processors Processors & platforms CPU-centric Data-centric NOW 4 th Wave 3 rd Wave AI 2 nd Wave Mobile 1 st Wave Internet PC Era MS-DOS MS DOS...

Artificial Intelligence → MAINSTREAM Speech, Natural Language Deep Learning Amazon Echo & Alexa Google Smart Home Devices Siri & Cortana Smart Assistants Screening Genomics Prediction Game Theory Image / Facial Recognition Autonomous Driving

AI – What has Changed? Source: Tuples Edu, buzzrobot.com Source: Nvidia, FMS 2017 Deep Learning algorithms require high memory bandwidth

Faster Computation  Multi-core High performance compute requires high memory bandwidth

Memory Bandwidth Comparison 2500 * Based high performance configurations Memory Bandwidth (GB/s) HBM of HBM, GDDR, and DDR HBM3 2000 HBM2E 1500 HBM2 1000 GDDR GDDR6 HBM1 500 GDDR5 DDR 0 2000 2004 2008 2012 2016 2020

HBM: High Bandwidth Memory • Stacked MPGA (micro-pillar grid array) memory solution for high performance applications • Samsung launched HBM2 in Q1 2016 • Uses DDR4 die with TSV (Through Silicon Vias) • Available in 4H or 8H stacks • Key Features: – 1024 I/O’s (8 Channel, 128bits per channel) – Per stack: 307GB/s (current generation) • 77X the speed of a PCIe 3.0 x4 slot, or • 77 HD movies transferred per second ** Announced HBM2E: +33% throughput (410GB/s), 2X density (16GB stack) **

HBM Basics: 2.5D System In Package • A typical HBM SiP consists of a processor (or ASIC) and 1 or more HBM stacks mounted on a Silicon Interposer • The HBM consists of 4 or 8 DRAM die mounted on a buffer die SiP (System in Package) HBM Stack Samsung C4 C3 Core DRAM Die Stack manufactures Processor C2 C1 and sells the Buffer Die B Si Interposer HBM stack Package PCB • The entire system (Processor + HBM stack + Si Interposer) is encapsulated into one larger package by the customer

MPGA: Micro-Pillar Grid Array Four High Stack (4H) Eight High Stack (8H) ~ 720um ~ 720um

Not just about speed: Space Efficiency GDDR5 HBM2E Real estate savings Density 1 GB x 12 = 12GB Density 16 GB x 4 = 64GB Speed/pin 1 GB/s Speed/pin 0.4 GB/s Pin count 4096 Pin count 384 B/W 384 GB/s B/W 1,640 GB/s

AI: Compute vs. Memory Constrained Roofline Model for TPU ASIC Roofline Model • Point below slope = memory bandwidth constrained Memory constrained • Point below horizontal = compute constrained Neural Network Characteristic Use Case MLP Structured input features Ranking CNN Spatial processing Image recognition RNN Sequence processing Language translation * LSTM (Long Short-Term Memory) is subset of RNN Many Deep Learning applications are MEMORY bandwidth constrained  Need High Bandwidth Memory Source: Google ISCA 2017

Memory Drives AI Performance  Faster Training, More Bandwidth  Better Accuracy, More Capacity Memory allocation size (GB) Required Memory BW (GB/s) 1600 40 Memory Allocation Size(GB) 1400 35 HBM2E 8H HBM 32GB Bandwidth(MB/s) 1200 30 1000 HBM2 25 800 20 4H HBM 16GB 600 15 400 10 200 5 0 0 5.2 7 10 15 23 38 TFLOPS, 10 layers 110 layers 210 layers 310 layers 410 layers ? ? 2880 3072 3584 5120 7680 11520 # Core, Product K110 M200 P100 V100 - - Deeper Network

HBM Presence – Some Examples Traffic sign recognition AI Cities Datacenter (Acceleration, AI/ML) Datacenter (Acceleration, AI/ML) Image synthesizer Healthcare - Radeon Instinct MI25 - Tesla P100, V100 Object classifier Retail - Project 47 - DGX Station, DGX1, DGX2 Model conversion Robotics - GPU Cloud Autonomous cars - Titan V Professional Visualization VR content creation - Radeon Pro WX, SSG, Vega Architecture Graphics rendering Professional Visualization Engineering/Construction - Quaddro GP100, GV100 Education Consumer Graphics Gaming, AR/VR Manufacturing - Radeon Rx Vega64, Vega56 Media & Entertainment Cloud TPU for Training Datacenter (Acceleration, AI/ML) Datacenter (Acceleration, AI/ML) & Inference ASIC FPGA - Nervana Neural Net Processor - TPU2 - Stratix10 MX (FPGA) TPU2: 4 ASICs, 64GB HBM2 TPU POD: 4TB HBM2 Consumer Graphics CPU/GPU Hybrid - KabyLake-G H/E GFX in notebooks Thin/light Extended battery life Sources: Tom’s Hardware, Anandtech, PC World, Trusted Reviews

HBM2: Market Outlook • Bandwidth needs of High-Performance Computing/AI, High-end Graphics, and new applications continue to expand 512GB/S 410GB/S 307GB/S 256GB/S Bandwidth and market BW 179GB/S for HBM growing rapidly HBM2 HBM2E HBM3 TAM 2016 2017 2018 2019 202X Applications HBM adoption started HPC/AI HPC/AI HPC/AI HPC/AI HPC/AI Networking Networking Networking Networking with HPC, expanding VGA VGA VGA VGA into other markets Others Others Others Source: Samsung

AI Inference: GDDR6 • Inference less computationally & memory intensive than AI Training • GDDR6 is a good option – double the bandwidth of GDDR5 • Up to 16Gbps per pin  64GB/s per device • Samsung is first to market with 16Gb GDDR6 • Nvidia T4 cards • 16GB GDDR6 • AWS G4 Inference

Foundry Services • Latest process nodes, testing, packaging, design services • WW partners to complement solutions with IP and EDA tools Mobile PKGs AI/Server/HPC PKGs Core Tech Memory AP HBM Logic HBM Interposer PoP W/B FBGA 4H Thermal Si Interposer PSI Sim Si Interposer Memory AP W/B SbS Stack Panel RDL HBM Large Chip Bonding FOPLP-PoP Grinding HBM Logic HBM Wafer WLP Wheel Logic RDL-Interposer BOC Thinning DRAM TSV 3-Stacked CIS-CoW RDL Interposer Mechanical(Warp.) Logic 2 Logic 1 or DRAM 3D SiP FO-SiP Fine Pitch Flexible PKG

Summary • AI workloads rely on Deep Learning algorithms that are memory bandwidth constrained • HBM has become the memory of choice for AI training applications in the data center • GDDR6 provides an “off -the- shelf” alternative for AI inference workloads Make the smart choice: AI hardware powered by these technologies

Thank You… Contact: t.shiah@Samsung.com

Pivotal Memory Technologies Enabling New Generation of AI Workloads - PowerPoint PPT Presentation

Pivotal Memory Technologies Enabling New Generation of AI Workloads Tien Shiah Memory Product Marketing Samsung Semiconductor Inc. Legal Disclaimer This presentation is intended to provide information concerning the memory industry. We do our

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Annual General Meeting May 27, 2015 2014 A PIVOTAL YEAR FOR ADVANTAGE 2 WHY PIVOTAL?

Transforming Nyrstar Port Pirie: reaching a pivotal milestone reaching a pivotal milestone

A pivotal year for Generalized Parton Distributions Pivotal year for GPDs J. Ball, G. Charles,

Health-enabling technologies for pervasive health care: A pivotal field for future medical

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

T T T The CDO Blueprint: Enabling the he CDO Blueprint: Enabling the he CDO Blueprint:

Further Together: Curated Pairing Culture @Pivotal Neha Batra @nerdneha

Higher Education Trends August 11, 2016 Pivotal Juncture Louis Soares Vice President for

The Natural Fuel for Transportation LNG - Liquefied Natural Gas David Jaskolski Manager, Fuels

5/29/2018 Fostering Communication Skills in Preschool Children with Pivotal Response Training

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

Welcome to 2017 Analyst Day September 21, 2017 Safe Harbor for Forward-Looking Statements This

Membranous Nephropathy: The clinical syndrome and risk markers of progression

COMPOSITE ARTIFICIAL WING MIMICKING A BEETLE HIND- WING Q.V. Nguyen 1 , N.S. Ha 1 , H.C. Park 1* ,

If each of these people said this, who would you trust? Blueberries are great they can

Brief Structure of Symposium Presentation Havent been able to do a draft because I am still

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Memory in Python Announcements For This Lecture Assignment 1 More Assignments Assignment 2

Sambuz

Useful Links

Newsletter

Mail Us