Dynamic Near Data Processing Framework for SSDs Gunjae Koo *, Kiran - - PowerPoint PPT Presentation

dynamic near data processing framework for ssds
SMART_READER_LITE
LIVE PREVIEW

Dynamic Near Data Processing Framework for SSDs Gunjae Koo *, Kiran - - PowerPoint PPT Presentation

Dynamic Near Data Processing Framework for SSDs Gunjae Koo *, Kiran Kumar Matam*, Te I , H.V. KrishinaGiri Nara*, Jing Li , Hung-Wei Tseng , Steven Swanson , Murali Annavaram* *University of Southern California North


slide-1
SLIDE 1

Dynamic Near Data Processing Framework for SSDs

Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*, Jing Li‡, Hung-Wei Tseng†, Steven Swanson‡, Murali Annavaram*

*University of Southern California

†North Carolina State University ‡University of California, San Diego

slide-2
SLIDE 2

Conventional Storage = Cheap Passive Devices

2

Con Conventional s entional stor torage age devices devices

  • Sl

Slow

  • w,

, limit limited ed bandwidt bandwidth h (SA (SATA 150 A 150 ~ 600 ~ 600 MB/ MB/s) s)

  • Pass

assiv ive e devices devices (r (read, ead, write, write, er erase ase)

* Figures from Intel and Western Digital

slide-3
SLIDE 3

Storage in Modern Server Systems

3

Stor Storage age devices devices for for Big D Big Data ata

  • Huge

Huge volumes

  • lumes of
  • f dat

data a  sl slow

  • w,

, sl slower

  • wer,

, much much sl slower

  • wer
  • Data

Data mov movement ement is is critica critical l for for performance performance

slide-4
SLIDE 4

Intelligent Storage

4

NVM NVM-based st based stor

  • rage

age devices devices

  • No

No se seek ek t time, ime, higher higher bandwidt bandwidth h ov

  • ver

er PCIe PCIe

  • Potent
  • tential

ial to to be a be activ ctive e syst systems ems

* Figures from Intel

slide-5
SLIDE 5

Intelligent Storage

5

NVM NVM-based st based stor

  • rage

age devices devices

  • No

No se seek ek t time, ime, higher higher bandwidt bandwidth h (PCIe PCIe)

  • Potent
  • tential

ial to to be a be activ ctive e syst systems ems

* Figures from Intel

SSD Processor DRAM NAND flash packages

slide-6
SLIDE 6

Stor Storag age Proce Process ssor

  • r

(SP) (SP)

Host

Near Data Processing (NDP)

6

CPU CPU

Storage interface

Data a computa utation

  • n @ h

host Data a transf sfer er fro rom storage age Inter erna nal Externa nal (host t – storage age)

slide-7
SLIDE 7

Host

CPU CPU

Near Data Processing (NDP)

7

Storage interface

Stor Storag age Pr Process

  • cessor
  • r

(SP) (SP)

Data a computa utation

  • n @ h

host Data a transf sfer er fro rom storage age Inter erna nal Externa nal (host t – storage age)

W/O O NDP NDP Wi With th NDP NDP

Data a computa utation

  • n @ st

storage age

slide-8
SLIDE 8

Host

Near Data Processing (NDP) on SSDs

8

CPU CPU

Storage interface

SP SP

Data a computa utation

  • n @ h

host Data a transf sfer er fro rom storage age Inter erna nal Externa nal (host t – storage age)

W/O O NDP NDP Wi With th NDP NDP

Data a computa utation

  • n @ st

storage age

Garbage collection Wear- leveling

Data a computa utation

  • n @ st

storage age

slide-9
SLIDE 9

Host

Near Data Processing (NDP) on SSDs

9

CPU CPU

Storage interface

SP SP

Data a computa utation

  • n @ h

host Data a transf sfer er fro rom storage age Inter erna nal Externa nal (host t – storage age)

W/O O NDP NDP Wi With th NDP NDP

Garbage collection Wear- leveling

Data a computa utation

  • n @ st

storage age

Obstacles Obstacles to to in in-SSD pr SSD proces

  • cessing

sing

  • Less

Less powerful powerful embedded embedded pr process

  • cessor
  • r
  • Dynamic

Dynamic computat computation ion resour esource ce av availa ailabilit bility

  • Manual

Manual work workloa load d pa parti rtitio tioning ning is is difficult difficult

Summarize Summarizer: D r: Dynamic ynamic NDP fr NDP framew amework

  • rk for SSD

for SSD

slide-10
SLIDE 10

Host

CPU CPU

Summarizer – Basic Concept

10

Storage interface

AP AP

Mo Monito tori ring g reso esour urce ces

slide-11
SLIDE 11

Host

CPU CPU

Summarizer – Basic Concept

11

Storage interface

AP AP

Mo Monito tori ring g reso esour urce ces

slide-12
SLIDE 12

Summarizer – Detailed Firmware Architecture

12

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler

SSD Embedded Processors

slide-13
SLIDE 13

Summarizer – Initialization (Function Offloading)

13

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler INIT ( foo) foo() foo() f#1 Function offlo ction offloadin ding Fun unct ction r

  • n reg

egistr strati ation

  • n

New New NVM NVMe co comman and

slide-14
SLIDE 14

Summarizer – Computation (Dynamic mode)

14

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler foo() f#1 RD&PROC( LBA,foo) New New NVM NVMe co comman and New New NVMe NVMe co comman and d de deco code de RD&PROC(PPA,foo) goo() f#2

slide-15
SLIDE 15

Summarizer – Computation (Dynamic mode)

15

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler foo() f#1 RD&PROC(PPA,foo)

RD&P(PPA1,foo) RD&P(PPA2,foo) Page data RD&P(PPA1,foo)

goo() f#2

slide-16
SLIDE 16

Summarizer – Computation (Dynamic mode)

16

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler foo1() f#1 RD&PROC(PPA,foo)

Page data RD&P(PPA1,foo) buf1, foo CC/Proc

Re Registe ster r in n TQ goo() f#2

slide-17
SLIDE 17

Summarizer – Computation (Dynamic mode)

17

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler foo() f#1 RD&PROC(PPA,foo)

Page data RD&P(PPA1,foo) CC

TQ Q is s full full goo() f#2

slide-18
SLIDE 18

Summarizer – Finalization

18

Host Memory

SQ SQ CQ CQ

Host CPU Storage Interface (PCIe / NVMe) SSD Firmware NAND Flash NAND Flash NAND Flash NAND Flash Flash Controller SSD DRAM DRAM Controller Summarizer Us User r Func unctio tions ns

TQ TQ

Request queue Response se queue

I/ I/O C O Controll

  • ntroller

er (NVMe NVMe command command dec decoder

  • der)

SSD SoC Interconnection Flash lash Transla anslation La tion Layer er (F (FTL) TL) NVMe NVMe Host Host Driv Driver er User User Ap Appli plications cations / Ope Operating ating Systems Systems Task Control ask Controlle ler FINAL ( foo) New New NVM NVMe co comman and foo() f#1

Results

goo() f#2

slide-19
SLIDE 19

Evaluation Platform

  • LS2085a intelligent SSD development platform
  • ARM cores running FTL and Summarizerfirmware
  • FPGA implementing NAND flash controller
  • PCIeGen. 3 4x lanes for host communication

19

LS2085a Interconnection DDR4 Memory Controller DRAM DRAM CPU

L1D (32KB) L2 (1MB) L1I (48KB)

CPU

L1D (32KB) L1I (48KB)

PCIe (host – LS2085a) PCIe (LS2085a - FPGA) FPGA (ALTERA Stratix V)

NAND flash DIMM

NAND flash DIMMs CPU

L1D (32KB) L2 (1MB) L1I (48KB)

CPU

L1D (32KB) L1I (48KB)

slide-20
SLIDE 20

Evaluation Platform

  • LS2085a intelligent SSD development platform
  • ARM cores running FTL and Summarizerfirmware
  • FPGA implementing NAND flash controller
  • PCIeGen. 3 4x lanes for host communication

20

LS2085a Interconnection DDR4 Memory Controller DRAM DRAM CPU

L1D (32KB) L2 (1MB) L1I (48KB)

CPU

L1D (32KB) L1I (48KB)

PCIe (host – LS2085a) PCIe (LS2085a - FPGA) FPGA (ALTERA Stratix V)

NAND flash DIMM

NAND flash DIMMs CPU

L1D (32KB) L2 (1MB) L1I (48KB)

CPU

L1D (32KB) L1I (48KB)

ARM Processor NAND flash DIMMs Altera Stratix V PCIe (to host) DRAM

slide-21
SLIDE 21

Evaluation - Performance

21

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

Static Static worklo workload ad of

  • fflo

floadi ading

slide-22
SLIDE 22

Evaluation - Performance

22

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

CPU only CPU only pr proc

  • cessi

essing g (b (bas aselin eline) SSD only SSD only pr proc

  • cessi

essing

slide-23
SLIDE 23

Evaluation - Performance

23

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

Summ Summar arize zer r Dy Dynamic amic Off Offloadin loading

slide-24
SLIDE 24

Evaluation - Performance

24

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

SSD pr SSD proc

  • cessi

essing g + + tr tran ansfer sfer time time (i (inte tern rnal al + + externa xternal + l + In In-SSD SSD pr proc

  • cessi

essing) g) Host CPU pr Host CPU proc

  • cessi

essing g time time

slide-25
SLIDE 25

Evaluation - Performance

25

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

Ex Exec ecuti ution time

  • n time norm
  • rmali

alize zed d to to bas aselin eline e (CPU CPU only

  • nly)

)

slide-26
SLIDE 26

Evaluation - Performance

26

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time Execution time (normalized to baseline)

slide-27
SLIDE 27

Evaluation - Performance

27

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.60 0.30 0.24 0.0 0.2 0.4 0.6 0.8 1.0 1.2 CPU only Dynamic

Chart Title

SDD time Host time

Execution time (normalized to baseline)

slide-28
SLIDE 28

Evaluation - Performance

28

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.62 0.30 0.24 0.0 0.2 0.4 0.6 0.8 1.0 1.2 CPU only Dynamic

Chart Title

SDD time Host time

Data a computa utation

  • n @ h

host Data a transf sfer er fro rom storage age Inter erna nal Externa nal (host t – storage age)

W/O O NDP NDP Wi With th NDP NDP

Data a computa utation

  • n @ st

storage age

slide-29
SLIDE 29

Evaluation - Performance

29

1 2 3 4 0.2 0.4 0.6 0.8 1 Static Dynamic

TPC-H Query6

SDD time Host time

Perfo erforman rmance ce de degr graded aded by y stati static c ND NDP

slide-30
SLIDE 30

Evaluation - Performance

30

16% 16% 10% 10% 20% 20% 7% 7%

Execution time (normalized to baseline) Execution time (normalized to baseline) Execution time (normalized to baseline) Execution time (normalized to baseline)

slide-31
SLIDE 31

Design Exploration – Better SSD Processor

31

Host

CPU CPU

Storage interface

Better embedded processor is cost effective AP AP

slide-32
SLIDE 32

Design Exploration – Higher Internal Bandwidth

32

0% 20% 40% 60% 80% 100% 120% X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average Speedup

Chart Title

Em Embed edde ded d pr proc

  • cessor

essor perfo performan rmance ce

slide-33
SLIDE 33

Design Exploration – Higher Internal Bandwidth

33

0% 20% 40% 60% 80% 100% 120% X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average Speedup

Chart Title

Summarizer is a cost effective NDP solution with powerful storage processors

slide-34
SLIDE 34

Conclusion

34

▪ Dynamic computation offloading framework

  • Opportunistic in-SSD computation
  • Page-level task control
  • Optimal performance improvement

▪ Summrizerprogramming model

✓ Dyna Dynamic NDP fr mic NDP framewor amework k for for SSDs SSDs

  • Opportunistic

Opportunistically ally enables enables in in-SSD SSD proces processing sing

  • Page

age-le level el NDP NDP contro control

  • Autom

Automatic tic workload workload partitioning partitioning

✓ Summarize Summarizer pr r progr

  • gramming

amming mode model

  • Evaluat

Evaluation ion on

  • n the

the real real de developme elopment nt platform platform

  • Explored

Explored design design sp space for future ace for future SSDs SSDs

slide-35
SLIDE 35

Th Than ank you k you

(We thank to Dell EMC for supporting the SSD development board)

Summarizer: Trading Communication with Computing Near Storage (MICRO ‘17)