Data Management Systems Storage Management The Memory hierarchy - - PowerPoint PPT Presentation

data management systems
SMART_READER_LITE
LIVE PREVIEW

Data Management Systems Storage Management The Memory hierarchy - - PowerPoint PPT Presentation

Data Management Systems Storage Management The Memory hierarchy Memory hierarchy Capacity and latencies Segments and file storage Locality and replacement policies Database buffer cache Hardware evolution Storage


slide-1
SLIDE 1

Data Management Systems

  • Storage Management
  • Memory hierarchy
  • Segments and file storage
  • Database buffer cache
  • Storage techniques in context
  • The Memory hierarchy
  • Capacity and latencies
  • Locality and replacement policies
  • Hardware evolution

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

Storage - Memory Hierarchy 1

slide-2
SLIDE 2

In an ideal world …

The database should have an unlimited amount of memory with plenty

  • f bandwidth for sequential and concurrent access, very low latencies

for random accesses, persistent over time, and at a low cost instead Databases provide the illusion of large memory capacity and try to hide the performance problems created by implementing all those desirable properties through complex architectures and optimizations

Storage - Memory Hierarchy 2

slide-3
SLIDE 3

The memory wall

  • Main memory suffers from several issues:
  • There is never enough of it (application growth)
  • Memory outside the CPU chip (DRAM) is much slower than memory located

in the CPU => memory wall

  • Processor-memory gap: processor speeds increased much faster than

memory speeds

  • Price becomes a problem in the context of data management (DRAM is

expensive)

  • Main memory is not persistent
  • Over time, a complex hierarchy evolved trying to address all these

issues

Storage - Memory Hierarchy 3

slide-4
SLIDE 4

Storage - Memory Hierarchy 4

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage

slide-5
SLIDE 5

Looking at the memory hierarchy

  • The memory hierarchy is a rather complex construct affected by many

parameters

  • Capacity
  • Cost
  • Latency
  • Bandwidth
  • It keeps evolving as the parameters of each component change over

time

  • It keeps evolving as new technology becomes available
  • Disclaimer: numbers provided as a reference (they vary a lot)

Storage - Memory Hierarchy 5

slide-6
SLIDE 6

Capacity

Storage - Memory Hierarchy 6

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage 64-bit architecture 16x64b general purpose 32x512b AVX L1i 32K, L1d 32K, L2 256K - 1MB, L3 8MB - 45MB 1 to 1000 GB Few Terabytes Many Terabytes Petabytes

slide-7
SLIDE 7

Latency

Storage - Memory Hierarchy 7

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage Sub-nanosecond (1 cycle) L1 0.5-1 ns, L2 4-8 ns, L3 15-30 ns 100 ns Microseconds (SSD) Milliseconds (HDD) Milliseconds Seconds, minutes

slide-8
SLIDE 8

Access

Storage - Memory Hierarchy 8

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage Sub-nanosecond (1 cycle) Byte addressable Random access Block addressable Sequential access

slide-9
SLIDE 9

What does this all mean?

  • The performance gaps between layers is huge (difficult to imagine at

human scales)

  • We process an increasing amount of data, resulting in even more

pressure on the memory system

  • Data movement is one of the major sources of energy consumption

and inefficiencies in modern computers (and data centers)

  • Performance and efficiency largely determined by how well the

database manages the movement of data across the hierarchy

Storage - Memory Hierarchy 9

slide-10
SLIDE 10

Locality (spatial and temporal)

  • The unit of transfer between layers

in the memory hierarchy is typically fixed

  • To improve performance, it is

important to exploit

  • Spatial locality (put together what

belongs together)

  • Temporal locality (do at the same time

things that require the same data)

  • Managing the hierarchy amounts to

improving spatial and temporal locality

Storage - Memory Hierarchy 10

A B C D E A B C D E SELECT * FROM T Transfer unit SELECT * FROM T WHERE X > 10 SELECT * FROM T WHERE Y = 20

slide-11
SLIDE 11

What needs to be done?

  • Enhance temporal and spatial locality (data organization, query

scheduling)

  • Make sure the data is available a the layer where it is needed to hide

the latency caused by getting data from lower layer (pre-fetching)

  • Be clever about what to keep at each layer (caching strategies,

replacement strategies)

  • Keep track of modifications and write back to the lower layers (all the

way to persistent storage) when needed

Storage - Memory Hierarchy 11

slide-12
SLIDE 12

Reality is complex and getting even more so

  • Managing the memory hierarchy was never easy
  • No perfect solution
  • Workload dependent
  • Many compromises needed
  • Problem is becoming far more involved due to architectural

developments

  • Multicore and NUMA
  • Non-Volatile Memory
  • Cloud computing and economies of scale
  • Network attached storage
  • Hardware Acceleration

Storage - Memory Hierarchy 12

slide-13
SLIDE 13

Multicore and NUMA

Storage - Memory Hierarchy 13

AMD Bulldozer

slide-14
SLIDE 14

Non-Volatile Memory (NVM)

Storage - Memory Hierarchy 14

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage Sub-nanosecond (1 cycle)

Non-Volatile memory is a new form of memory combining characteristics of DRAM and persistent storage:

  • Cheaper than DRAM
  • Byte addressable
  • Random access
  • Persistent
  • Faster than disks
  • Can be used as
  • Memory
  • Local disk
  • Network attached

NVM

slide-15
SLIDE 15

Cloud computing

  • The ephemeral nature of the

computing infrastructure forces a separation of compute and storage.

  • Gives more flexibility to the cloud

provider

  • Has changed the nature of “disk” and

“storage” in fundamental ways

  • Crucial for cloud native databases

Storage - Memory Hierarchy 15

Compute layer Storage layer Network

slide-16
SLIDE 16

Network attached storage

  • The bandwidth and latencies of storage devices are not very high
  • Motivated by cloud designs, networks are becoming faster and have

more bandwidth

  • Round trip time in a data center is less than a seek operation on a HDD
  • RDMA (Remote Direct Memory Access) reduces latencies by removing OS

related inefficiencies

  • Eventually it might be faster to get data from the memory of a remote

machine or remote storage device than from a local disk.

Storage - Memory Hierarchy 16

slide-17
SLIDE 17

Hardware Acceleration

Storage - Memory Hierarchy 17

Oracle M7 SPARC processor

slide-18
SLIDE 18

Summary

  • Dealing with the memory hierarchy is a key aspect of the architecture
  • f data management systems
  • Very old problem, still relevant
  • Many fundamental concepts still applicable today due to the way

systems are evolving

Storage - Memory Hierarchy 18