Csci 5980 Spring 2020 New Storage Technologies/D evices Higher - - PowerPoint PPT Presentation

csci 5980 spring 2020
SMART_READER_LITE
LIVE PREVIEW

Csci 5980 Spring 2020 New Storage Technologies/D evices Higher - - PowerPoint PPT Presentation

Csci 5980 Spring 2020 New Storage Technologies/D evices Higher performan Tape SMR HDD SSD NVRAM Smaller densi 2 Non-Volatile Memory NVRAM Examples of non-volatile memory (NVRAM) 3D Xpoint NVDIMM STT-MRAM (By Intel and Micron) (By


slide-1
SLIDE 1

Csci 5980 Spring 2020

New Storage Technologies/Devices

slide-2
SLIDE 2

2

Tape HDD SMR NVRAM SSD Smaller densi Higher performan

slide-3
SLIDE 3

Non-Volatile Memory NVRAM

slide-4
SLIDE 4

4

Examples of

non-volatile memory (NVRAM)

4

3D Xpoint (By Intel and Micron) NVDIMM (By HPE) STT-MRAM (By Everspin)

slide-5
SLIDE 5

5

Summary of Memory Technologies

HDD DRAM DIMM Flash SSD PCM (25nm) Density (μm2/bit) 0.00006 0.00380 0.00210 0.00250 Read Latency (ns) 3,000,000 55 25,000 48 Write Latency (ns) 3,000,000 55 200,000 150 Read Energy (pJ/bit) 2,500 12.5 250 2 Write Energy (pJ/bit) 2,500 12.5 250 19.2 Static Power Yes Yes No No Endurance >1015 >1015 104 108 Nonvolatility Yes No Yes Yes

slide-6
SLIDE 6

Summary of Different Memory Technologies

6

slide-7
SLIDE 7

7 7

How to innovate our software, architecture and systems to exploit NVRAM technologies?

 Non-volatile  Low power consumption  Fast (close to DRAM)  Byte addressable  Memory or Storage?

slide-8
SLIDE 8

8

NVM Research Issues

  • Data Consistency and Durability against

Systems and Application failures

– Solutions: ACID (Atomicity, Consistency, Isolation, and Durability) Transactions, Appended Logs, and Shadow Update – Challenges: Guarantee Consistency and Durability While Preserve Performance

  • Memory Allocation, De-allocation & Garbage

Collection

  • New Programming Models

8

slide-9
SLIDE 9

9

New Memory/Storage Hierarchy

PCM as main memory PCM as secondary storage Processor

DRAM PCM FLASH Disk

I/O Bus Processor

DRAM PCM FLASH

Disk I/O Bus File system

Virtual memory

File system

PCM as main memory provides: 1) High capacity 2) standby power PCM as secondary storage provides: 1) Low access latency

9

Virtual memory

slide-10
SLIDE 10

10

How to Integrating PCM and Flash Memory into Memory/Storage Hierarchies?

slide-11
SLIDE 11

11

Storage Layer Management and Caching

SATA Disks

  • ff
  • ff

On SSD Read Queues (RT) Read Queues (Prefetch) Write Queues (Offloading) Big Memory with PCM ? ? When/ Where/how much

How this can be done in a HEC environment?

slide-12
SLIDE 12

Flash Memory-based Solid State Drives

slide-13
SLIDE 13

13 January 27, 2020 13

Why Flash Memory?

  • Diversified Application Domains

– Portable Storage Devices – Consumer Electronics – Industrial Applications – Critical System Components – Enterprise Storage Systems

slide-14
SLIDE 14

14

Flash-based SSD Characteristics

 Random read is the same as sequential.  Read and write by the unit of pages  Does not allow overwrite. Need erase before writes. Erase is performed in blocks  Typical block size is 128 K and page size 2K  Write is slower than read. Erase is a very slow

  • peration

 Read takes 25 microseconds, write takes 200 microseconds, and erase takes 1500 microseconds  Limited number of writes per cell. 100 K for SLC and 10K for MLC.  Flash Translation Layer (FTL) sits in between file system and SSD. FTL provides remapping and wear- leveling

Figure Source: “BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage”, Hyojun Kim and Seongjun Ahn, FAST 2008

14

SSD

slide-15
SLIDE 15

15

High-Level View of Flash Memory Design

slide-16
SLIDE 16

FTL (Flash Translation Layer)

slide-17
SLIDE 17

17

Flash Translation Layer (FTL)

17 Application 1 Application 2 Application 3 File Systems Memory Technology Device Layer Flash Memory Address Allocator (Address translation / Block assignment) Garbage Collector Wear leveler Hot Data Identifier fwrite (file, data) block write (LBS, size) Flash write (block, page) control signal

Flash Translation Layer

Address Allocator (Address translation / Block assignment)

slide-18
SLIDE 18

18

Flash Translation Layer (FTL)

  • Flash Translation Layer

– Emulates a block device interface – Hides the presence of erase operation/erase-before-write – Address translation, garbage collection, and wear-leveling

  • Address Translation

– Three types

  • Page-level, block-level, and hybrid mapping FTL

– Mapping table is stored in small RAM within the flash device

18

slide-19
SLIDE 19

19

Page vs. Block Level Mapping

19

Page

Logical Address : LPN Physical Address : PPN Page Level FTL Blocks

PPN

Page Level Mapping Block Level Mapping

Page

Logical Address : LPN Block Level FTL Blocks

PBN

LBN Offset PBN Offset

Flexible but requires a lot of RAM (e.g., 2MB for 1GB SSD) Less RAM (e.g., 32K for 1GB SSD), but inflexible in content placement

slide-20
SLIDE 20

Emerging Disk Drives Including Shingled Magnetic Recording (SMR) Drives and Interlaced Magnetic Recording (IMR) Drives

slide-21
SLIDE 21

21

Shingled Magnetic Recording (SMR)

Platter Read/Write Head Tracks Traditional non-overlapping track design Rotational Disk Shingled tracks SMR Technology Shingled Magnetic Recording: + enables higher data density by overlapping data tracks.

  • requires careful data handling when updating old blocks.
slide-22
SLIDE 22

22

  • Drive Managed

– Black box/drop-in solution: the drive handles all out-of-

  • rder write operations.
  • Host Managed

– White box/application modification needed: the drive reports zone layout information; out-of-order writes will be rejected.

  • Host Aware

– Grey box: the drive reports zone layout information;

  • ut-of-order writes will still be handled internally.

– Applications can use HA-SMR drive as is, and also have the opportunity for zone-layout aware optimizations.

T10 SMR Drive Models

slide-23
SLIDE 23

23

Hybrid SMR Basics

  • Google’s Proposal

– 100GiB Volume creation. < 200ms, typically <

  • 50ms. Query time < 50ms
  • Seagate Flex API

– In a basic unit of one zone. Or a consecutive zone extent.

  • WD Realm API

– 100GiB, same SMR size, but different CMR size.

slide-24
SLIDE 24

24

Google’s Proposal [Brewer’16, Tso ‘17]

slide-25
SLIDE 25

25

  • Must be usable as 100% CMR drive by Legacy Software
  • SMR->CMR conversion

– must be able to support converting a 100 GiB SMR volume back to CMR. OD->ID sequence is sufficient.

  • CMR / SMR sector addressing (see fig.)
  • CMR->SMR conversion

– Must support the creation of 100 GiB SMR volumes (400 SMR zones) – May support smaller granularity – ID -> OD. SMR volume will be adjacent to previous one

  • Performance Requirements

– 100GiB SMR Volume Creation < 200ms – with typical conversion time < 50ms – Conversion back to CMR equally quick. – Query response < 50ms.

  • Conversion Atomicity
  • Fig. CMR / SMR sector addressing [Tso ‘17]
slide-26
SLIDE 26

26

WD’s Realm API [Boyle’17]

slide-27
SLIDE 27
slide-28
SLIDE 28

28

Seagate Flex API [Feldman’17, Feldman’18]

slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

Top Tracks Bottom Tracks Conventional Magnetic Recording (CMR) Shingled Magnetic Recording (SMR) Interlaced Magnetic Recording (IMR) Hard Disk Drive IMR: Higher areal data density than CMR, lower write amplification (WA) than SMR.

HDD icon image: https://www.flaticon.com/

slide-31
SLIDE 31

31

IMR Tracks Width Laser Power Data Density Data Rate Track Capacity Bottom Tracks wider higher higher(+27%)[1] higher higher Top Tracks narrower lower lower lower lower Updating top tracks has no penalty IMR Updating bottom tracks causes Write Amplification (WA) I/O Performance depends on disk usage, and layout design. Only using bottom tracks when disk is not full may reduce WA.

[1]Granz et. al, 2017

slide-32
SLIDE 32

32

TrackPly: Data and Space Management for IMR

IMR Disk Tracks

Top Tracks Bottom Tracks update

Question: How serious is the update overhead?

Read Write Re-Write 5 × operations!

Problem: how to efficiently use IMR drives and alleviate the update overhead?

slide-33
SLIDE 33

34

IMR Disk Tracks

Key Idea: the data management should depend on disk usage in High-Capacity HDDs.

Top Tracks Bottom Tracks

Design (1/3): Zigzag Allocation

1st Phase 2nd Phase 3rd Phase (0~56%) (56~78%) (78~00%)

slide-34
SLIDE 34

35

  • uter track

inner track

Design (2/3): Top-Buffer

Allocated Unallocated The idea: buffer -> accumulate multiple -> writeback wri te buffer write back

IMR Disk Tracks

slide-35
SLIDE 35

36 36

Design (3/3): Block-Swap

  • uter track

inner track Allocated Unallocated hot bottom data cold top data

IMR Disk Tracks

The idea: swap hot bottom-track data with cold top-track data.

slide-36
SLIDE 36

37 37

IMR Disk Tracks

Top Tracks Bottom Tracks

1st Phase 2nd Phase 3rd Phase

Allocated Unallocated

write buffer write back

IMR Disk Tracks

Allocated Unallocated

hot bottom data cold top data

IMR Disk Tracks

Zigzag Allocation: the data management should depend on disk usage in High-Capacity HDDs. Top-Buffer: buffer and accumulate bottom-write requests into unallocated top tracks Block-Swap: swap hot bottom-track data with cold top

slide-37
SLIDE 37

Object Oriented Store and Active Storage

slide-38
SLIDE 38

39

Storage Device OSD Intelligence Storage Device Storage System I/O Application

User

I/O Application

Manager OPEN/CLOSE

Active/Object Storage Device System Architecture (Internet Model)

Network

OSD Partitions the System

The Manager is not in the data path.

slide-39
SLIDE 39

40

Kinetic Drives Implementing An Application on Storage Device

Key-Value Store

slide-40
SLIDE 40

41

Kinetic Drives (Key-Value Store)

  • Nowadays, Key-value store is becoming popular

(e.g., Amazon, Facebook, LinkedIn).

  • Kinetic Drives provide storage for key-value based
  • perations via direct Ethernet connections without

storage servers, which can reduce the management complexity.

  • It is important to scale the Kinetic Drives to a global

key-value store system which can provide service for worldwide users.

Key Value

Traditional Storage Stack Kinetic Storage Stack

slide-41
SLIDE 41

42

  • Install LevelDB on a server with conventional drives
  • Run a common benchmark and test the performance

– YCSB? – Other benchmarks?

  • Performance metrics – Throughput , Latency, Reads, or Writes?

Measure Performance of LevelDB

LevelDB

Kinetic Drives Conventional Drives

LevelDB

K-V Server

Client

Kinetic API

SATA YCSB or other benchmarks ? Client

LevelDB LevelDB

Ethernet Ethernet

slide-42
SLIDE 42

New Type of Tape Drives

slide-43
SLIDE 43

Why Tape Drives

slide-44
SLIDE 44

Archival Storage Devices

slide-45
SLIDE 45

Tape Cartridge

slide-46
SLIDE 46

Tape Model

slide-47
SLIDE 47

Write Order Optimization

slide-48
SLIDE 48

49 49

Thank You! Questions?