TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 - - PowerPoint PPT Presentation

tm overv
SMART_READER_LITE
LIVE PREVIEW

TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 - - PowerPoint PPT Presentation

Open Coherent Accelerator Processor In Interface TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 Flash Memory Summit 2017 Santa Clara, CA 1 Acceler lerated Co Computin ing and Hig igh Perf rform rmance Bu Bus


slide-1
SLIDE 1

Flash Memory Summit 2017 Santa Clara, CA 1

OpenCAPI

TM TM Overv

rview

Fla lash Memory ry Summit 2017 Open Coherent Accelerator Processor In Interface

slide-2
SLIDE 2

Flash Memory Summit 2017 Santa Clara, CA 2

Acceler lerated Co Computin ing and Hig igh Perf rform rmance Bu Bus

  • Attributes driving Accelerators
  • Emergence of complex storage and memory solutions
  • Introduction of device coherency requirements (IBM’s introduction in 2013)
  • Growing demand for network performance
  • Various form factors (e.g., GPUs, FPGAs, ASICs, etc.)
  • Driving factors for a high performance bus - Consider the environment
  • Increased industry dependence on hardware acceleration for performance
  • Hyperscale datacenters and HPC are driving need for much higher network bandwidth
  • Deep learning and HPC require more bandwidth between accelerators and memory
  • New memory/storage technologies are increasing the need for bandwidth with low latency

Computation Data Access

slide-3
SLIDE 3

Flash Memory Summit 2017 Santa Clara, CA 3

Two Bus Challe llenges

1. High performance coherent bus needed

  • Hardware acceleration will become commonplace, but….
  • If you are going to use Advanced Memory/Storage technology and Accelerators, you need to get data

in/out very quickly

  • Today’s system interfaces are insufficient to address this requirement
  • Systems must be able to integrate multiple memory technologies with different access methods,

coherency and performance attributes

  • Traditional I/O architecture results in very high CPU overhead when applications communicate

with I/O or Accelerator devices 2. These challenges must be addressed in an open architecture allowing full industry participation

  • Architecture agnostic to enable the ecosystem growth and adaption
  • Establish sufficient volume base to drive cost down
  • Support broad ecosystem of software and attached devices
slide-4
SLIDE 4

Flash Memory Summit 2017 Santa Clara, CA 4

OpenCAPI Advantages for r St Storage Cla Class Memorie ries

  • Open standard interface enables to attach wide range of devices
  • Ability to support a wide range of access models from byte addressable load/store

to block

  • Extreme bandwidth beyond classical storage interfaces
  • OpenCAPI feature of Home Agent Memory geared specifically for storage class

memory paradigms

  • Agnostic interface allows extension to evolving memory technologies in the future

(e.g., compute-in-memory)

  • Common physical interface between non-memory and memory devices
slide-5
SLIDE 5

Whe here ar are we e com

  • min

ing fr from

  • m tod
  • day?

CAPI Tec echnology Unlo nlocks the the Next xt Level l of

  • f Perf

erformance for

  • r Fl

Flash

Identical hardware with 3 different paths to data

FlashSystem Conventional I/O (FC) Legacy CAPI 1.0 – External Flash Drawer IBM POWER S822L Legacy CAPI 1.0 - Integrated Card

IBM's Legacy CAPI 1.0 NVMe Flash Accelerator is almost 5X more efficient in performing IO vs traditional storage.

21% 35% 56% 100%

0% 25% 50% 75% 100%

CAPI NVMe Traditional NVMe Traditional Storage

  • Direct

IO Traditional Storage

  • Filesystem

Relative CAPI vs. NVMe Instruction Counts per IO

Kernel Instructions User Instructions

Legacy CAPI 1.0 - accelerated NVMe Flash can issue 3.7X more IOs per CPU thread than regular NVMe flash. Improves scaling and resiliency Caching with persistent data frames New solutions via large scaling

slide-6
SLIDE 6

Co Comparis ison of f Memory ry Paradig igms

Needle-in-a-haystack Engine

Main Memory

Processor Chip DDR4/5

Data DLx/TLx Example: Basic DDR attach

Processor Chip

DLx/TLx

Emerging Storage Class Memory

Processor Chip

Data DLx/TLx

Tiered Memory

SCM DDR4/5

Data DLx/TLx

SCM

OpenCAPI WINS due to Bandwidth, best of breed latency, and flexibility of an Open architecture

JOIN TODAY! www.opencapi.org

slide-7
SLIDE 7

Acceler leratio ion Paradig igms with ith Great Perf rform rmance

Examples: Encryption, Compression, Erasure prior to network or storage Processor Chip Acc Data Egress Transform DLx/TLx Processor Chip Acc Data Bi-Directional Transform Acc TLx/DLx Examples: NoSQL such as Neo4J with Graph Node Traversals, Needle-in-a-haystack Engine Examples: Machine or Deep Learning potentially using OpenCAPI attached memory Memory Transform Processor Chip Acc Data DLx/TLx Example: Basic work offload Processor Chip Acc Needles DLx/TLx Examples: Database searches, joins, intersections, merges Ingress Transform Processor Chip Acc Data DLx/TLx Examples: Video Analytics, HFT, VPN/IPsec/SSL, Deep Packet Insp Data Plane Accelerator (DPA), Video Encoding (H.265), etc Needle-In-A-Haystack Engine Haystack Data

slide-8
SLIDE 8

Flash Memory Summit 2017 Santa Clara, CA 8

Data Centric Computing wit ith OpenCAPI

TM TM

Fla lash Memory ry Summit 2017

Allan Cantle – CTO & Founder Nallatech a.cantle@Nallatech.com

slide-9
SLIDE 9

9

Server qualified accelerator cards featuring FPGAs, network I/O and an open architecture software/firmware framework. Design Services/Application Optimisation

  • Nallatech – a Molex company
  • 24 years of FPGA Computing heritage
  • Data Centric High Performance Heterogeneous Computing
  • Real-time, low latency network and I/O processing
  • Intel PSG (Altera) OpenCL & Xilinx Alliance partner
  • Member of OpenCAPI, GenZ & OpenPOWER
  • Server partners: Cray, DELL, HPE, IBM, Lenovo
  • Application porting & optimization services
  • Successfully deployed high volumes of FPGA accelerators

Nall llatech at a Gla lance

slide-10
SLIDE 10

Da Data Centric Architectures - Fu Fundamental l Prin rincip iple les

  • 1. Consume Zero Power when Data is Idle
  • 2. Don’t Move the Data unless you absolutely have to
  • 3. When Data has to Move, Move it as efficiently as possible

Our guiding light……….

The value is in the Data!

& the CPU core can often be effectively free!

slide-11
SLIDE 11

11

Da Data Center Architectures, Ble lendin ing Evolu lutionary ry with ith Revolu lutionary

OpenCAPI OpenCAPI OpenCAPI

FPGA FPGA FPGA

Emerging Data Centric Enhancements

SCM / Flash SCM / Flash SCM / Flash

CPU CPU CPU

Existing DataCenter Infrastructure

Memory Memor y Memory

slide-12
SLIDE 12

Existing DataCenter Infrastructure Emerging Data Centric Enhancements

Na Nalla latech Hyp yperConverged & Di Disa saggregatable le Se Server

  • Leverage Google & Rackspace’s OCP Zaius/Barreleye G2 platform
  • Reconfigurable FPGA Fabric with Balanced Bandwidth to CPU, Storage & Data Plane Network
  • OpenCAPI provides Low Latency & coherent Accelerator / Processor Interface
  • GenZ Memory-Semantic Fabric provides Addressable shared memory up to 32 Zetabytes

200GBytes/s 200GBytes/s 170GB/s 170GB/s 4x OpenCAPI Channels 200GBytes/s

slide-13
SLIDE 13
  • Xilinx Zynq US+ 0.5OU High Storage Accelerator Blade
  • 4 FSAs in 2OU Rackspace Barreleye G2 OCP Storage drawer deliver :-
  • 152 GByte/s PFD* Bandwidth to 1TB of DDR4 Memory
  • 256 GByte/s PFD* Bandwidth to 64TB of Flash
  • 200 GByte/s PFD* Bandwidth through the OpenCAPI channels
  • 200 GByte/s PFD* Bandwidth through the GenZ Fabric IO
  • Open Architecture software/firmware framework

Reconfigurable le Har ardware Da Datapla lane, , Fla Flash Storage Accelerator – FSA

128GByte RDIMM DDR4 Memory @ 2400MTPS

PCIe Gen 3 Switch

Zynq US+

ZU19EG FFVC1760

8GByte DDR4

PCIe G2 x 4 Control Plane Interface

x72

X8

X72

SlimSAS Connector

PCIe x16 G3

100GbE QSFP28 100GbE QSFP28 X4 X4

M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD M.2 22110 SSD

OpenCAPI Interface

PCIe x16 G3

8x PCIe x4 G3 128GByte DDR4 RDIMM

GenZ Data Plane I/O

x72

MPSoC

*PFD = Peak Full Duplex

slide-14
SLIDE 14

Summary

  • OpenCAPI Accelerator to Processor Interface Benefits
  • Coherency
  • Lowest Latency
  • Highest Bandwidth
  • Open Standard
  • Perfect Bridge to blend CPU Centric & Data Centric Architectures
  • Join the Open Community where independent experts innovate together and

you can help to decide on big topics like whether :- Separate Control and Data Planes -- are better than -- Converged ones