Flash Memory Summit 2017 Santa Clara, CA 1
OpenCAPI
TM TM Overv
TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 - - PowerPoint PPT Presentation
Open Coherent Accelerator Processor In Interface TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 Flash Memory Summit 2017 Santa Clara, CA 1 Acceler lerated Co Computin ing and Hig igh Perf rform rmance Bu Bus
Flash Memory Summit 2017 Santa Clara, CA 1
TM TM Overv
Flash Memory Summit 2017 Santa Clara, CA 2
Computation Data Access
Flash Memory Summit 2017 Santa Clara, CA 3
1. High performance coherent bus needed
in/out very quickly
coherency and performance attributes
with I/O or Accelerator devices 2. These challenges must be addressed in an open architecture allowing full industry participation
Flash Memory Summit 2017 Santa Clara, CA 4
to block
memory paradigms
(e.g., compute-in-memory)
Whe here ar are we e com
ing fr from
CAPI Tec echnology Unlo nlocks the the Next xt Level l of
erformance for
Flash
Identical hardware with 3 different paths to data
FlashSystem Conventional I/O (FC) Legacy CAPI 1.0 – External Flash Drawer IBM POWER S822L Legacy CAPI 1.0 - Integrated Card
IBM's Legacy CAPI 1.0 NVMe Flash Accelerator is almost 5X more efficient in performing IO vs traditional storage.
21% 35% 56% 100%
0% 25% 50% 75% 100%
CAPI NVMe Traditional NVMe Traditional Storage
IO Traditional Storage
Relative CAPI vs. NVMe Instruction Counts per IO
Kernel Instructions User Instructions
Legacy CAPI 1.0 - accelerated NVMe Flash can issue 3.7X more IOs per CPU thread than regular NVMe flash. Improves scaling and resiliency Caching with persistent data frames New solutions via large scaling
Needle-in-a-haystack Engine
Main Memory
Processor Chip DDR4/5
Data DLx/TLx Example: Basic DDR attach
Processor Chip
DLx/TLx
Emerging Storage Class Memory
Processor Chip
Data DLx/TLx
Tiered Memory
SCM DDR4/5
Data DLx/TLx
SCM
OpenCAPI WINS due to Bandwidth, best of breed latency, and flexibility of an Open architecture
JOIN TODAY! www.opencapi.org
Examples: Encryption, Compression, Erasure prior to network or storage Processor Chip Acc Data Egress Transform DLx/TLx Processor Chip Acc Data Bi-Directional Transform Acc TLx/DLx Examples: NoSQL such as Neo4J with Graph Node Traversals, Needle-in-a-haystack Engine Examples: Machine or Deep Learning potentially using OpenCAPI attached memory Memory Transform Processor Chip Acc Data DLx/TLx Example: Basic work offload Processor Chip Acc Needles DLx/TLx Examples: Database searches, joins, intersections, merges Ingress Transform Processor Chip Acc Data DLx/TLx Examples: Video Analytics, HFT, VPN/IPsec/SSL, Deep Packet Insp Data Plane Accelerator (DPA), Video Encoding (H.265), etc Needle-In-A-Haystack Engine Haystack Data
Flash Memory Summit 2017 Santa Clara, CA 8
TM TM
Allan Cantle – CTO & Founder Nallatech a.cantle@Nallatech.com
9
Server qualified accelerator cards featuring FPGAs, network I/O and an open architecture software/firmware framework. Design Services/Application Optimisation
Da Data Centric Architectures - Fu Fundamental l Prin rincip iple les
Our guiding light……….
& the CPU core can often be effectively free!
11
Da Data Center Architectures, Ble lendin ing Evolu lutionary ry with ith Revolu lutionary
OpenCAPI OpenCAPI OpenCAPI
FPGA FPGA FPGA
Emerging Data Centric Enhancements
SCM / Flash SCM / Flash SCM / Flash
CPU CPU CPU
Existing DataCenter Infrastructure
Memory Memor y Memory
Existing DataCenter Infrastructure Emerging Data Centric Enhancements
Na Nalla latech Hyp yperConverged & Di Disa saggregatable le Se Server
200GBytes/s 200GBytes/s 170GB/s 170GB/s 4x OpenCAPI Channels 200GBytes/s
Reconfigurable le Har ardware Da Datapla lane, , Fla Flash Storage Accelerator – FSA
128GByte RDIMM DDR4 Memory @ 2400MTPS
PCIe Gen 3 Switch
Zynq US+
ZU19EG FFVC1760
8GByte DDR4
PCIe G2 x 4 Control Plane Interface
x72
X8
X72
SlimSAS Connector
PCIe x16 G3
100GbE QSFP28 100GbE QSFP28 X4 X4
M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD
OpenCAPI Interface
PCIe x16 G3
8x PCIe x4 G3 128GByte DDR4 RDIMM
GenZ Data Plane I/O
x72
MPSoC
*PFD = Peak Full Duplex
you can help to decide on big topics like whether :- Separate Control and Data Planes -- are better than -- Converged ones