Using NVM Express SSDs and CAPI to Accelerate Data Center - - PowerPoint PPT Presentation

using nvm express ssds and capi to accelerate data center
SMART_READER_LITE
LIVE PREVIEW

Using NVM Express SSDs and CAPI to Accelerate Data Center - - PowerPoint PPT Presentation

Using NVM Express SSDs and CAPI to Accelerate Data Center Applications in OpenPOWER Systems Stephen Bates PhD, Technical Director PMC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 Teaser Process your data at 3GB/s with minimal


slide-1
SLIDE 1

Using NVM Express SSDs and CAPI to Accelerate Data Center Applications in OpenPOWER Systems Stephen Bates PhD, Technical Director PMC

Join the conversation at #OpenPOWERSummit 1

#OpenPOWERSummit

slide-2
SLIDE 2

Teaser Process your data at 3GB/s with minimal CPU loading. And the code is

  • pen-source!

Join the conversation at #OpenPOWERSummit 2

slide-3
SLIDE 3

Outline

  • What is NVM Express?
  • What is CAPI?
  • Hardware Setup
  • Low-level Performance Data
  • NVM Express SSD performance
  • P8<-> AFU Performance
  • A Data-Center Application: String Search and

Substitution

  • Summary

Join the conversation at #OpenPOWERSummit 3

slide-4
SLIDE 4

CPU

What is NVM Express?

  • NVM Express runs over

PCIe

  • High Bandwidth and

low latency

  • Support for multi-core

and virtualization

  • In-box driver in most

OSes

Join the conversation at #OpenPOWERSummit 4

Applications NVMe Driver File System Block Layer PCIe Driver PCIe Samsung SM1715 NVMe SSD uses PMC Flashtec Controller

slide-5
SLIDE 5

What is CAPI?

  • CAPI connects the

memory subsystem of a Power8 to IO devices via HW assisted PCIe

  • Simplifies the

programming model and driver for P8<- >AFU communication

Join the conversation at #OpenPOWERSummit 5

The PSL and AFU can be implemented inside an FPGA (e.g. the Altera Stratix in the Nallatech CAPI card) or inside an ASIC (e.g. the Mellanox ConnectX-4). The AFU can perform any data- manipulation task and either return results or manipulated data to P8 memory.

slide-6
SLIDE 6

Hardware Setup

  • IBM Power8 Server,

S822L

  • Ubuntu, kernel 3.18.0-

14-generic

  • Nallatech 385 CAPI

card

  • Samsung SM1715

1.6TB NVM Express SSD

Join the conversation at #OpenPOWERSummit 6

CAPP PCIe

Power8 Processor

PCIe

slide-7
SLIDE 7

Performance – NVMe SSD

Join the conversation at #OpenPOWERSummit 7

fio, ext4 file-system, in-box NVMe driver

slide-8
SLIDE 8

Accelerator Functional Unit

  • We wrote a AFU to do

low-level performance testing and a simple demo

  • AFU monitors queue in

memory, processes jobs as they are placed on queue

  • A snooper allows for

debugging and performance analysis

  • Easy to drop in new

processing blocks

Join the conversation at #OpenPOWERSummit 8

Our AFU consumes about 30% of the logic resources and 11% of the memory resources

  • n a Stratix V (5SGXMA7H2F35C2).

PSL Our AFU mmio Snooper processors lfsr memcpy textswap wqueue

slide-9
SLIDE 9

Performance – P8<->AFU

  • Moving data between P8

memory and the AFU involves AFU initiated reads and writes

  • CAPI allows out of order

completions and the AFU must handle this

  • A tag and credit based

system is used for flow control

Join the conversation at #OpenPOWERSummit 9

slide-10
SLIDE 10

Performance – P8<->AFU

  • Since the data can

reside in a cache, DRAM or even on another CPU the command response time can vary

  • Here we plot the PDF

for reads, writes and mixed workloads

Join the conversation at #OpenPOWERSummit 10

90% of reads and writes complete within 1.5us.

slide-11
SLIDE 11

Text Search Application

  • We can combine the NVMe SSD

and the AFU to perform search

  • n large data-sets
  • In our example we augment the

AFU to return pointers to string match locations

  • This allows both pattern

matching and pattern substitution/annotation to be performed

  • This work is easily extended to

more complex data processes (e.g. encryption, DNA sequencing)

Join the conversation at #OpenPOWERSummit 11

Device GB/s HDD 80MB/s SAS-SSD 237MB/s NVMe-SSD 2950MB/s

slide-12
SLIDE 12

Summary

Join the conversation at #OpenPOWERSummit 12

High Throughput Low and Consistent Latency Low CPU Utilization Easy Programming Model

Try for Yourself! https://github.com/sbates130272/capi-textswap.git

See the demo at the Nallatech Booth #1010!