Data Management Systems Storage Management The Memory hierarchy - PowerPoint PPT Presentation

Data Management Systems • Storage Management • The Memory hierarchy • Memory hierarchy • Capacity and latencies • Segments and file storage • Locality and replacement policies • Database buffer cache • Hardware evolution • Storage techniques in context Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Storage - Memory Hierarchy 1

In an ideal world … The database should have an unlimited amount of memory with plenty of bandwidth for sequential and concurrent access, very low latencies for random accesses, persistent over time, and at a low cost instead Databases provide the illusion of large memory capacity and try to hide the performance problems created by implementing all those desirable properties through complex architectures and optimizations Storage - Memory Hierarchy 2

The memory wall • Main memory suffers from several issues: • There is never enough of it (application growth) • Memory outside the CPU chip (DRAM) is much slower than memory located in the CPU => memory wall • Processor-memory gap: processor speeds increased much faster than memory speeds • Price becomes a problem in the context of data management (DRAM is expensive) • Main memory is not persistent • Over time, a complex hierarchy evolved trying to address all these issues Storage - Memory Hierarchy 3

CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage Storage - Memory Hierarchy 4

Looking at the memory hierarchy • The memory hierarchy is a rather complex construct affected by many parameters • Capacity • Cost • Latency • Bandwidth • It keeps evolving as the parameters of each component change over time • It keeps evolving as new technology becomes available • Disclaimer: numbers provided as a reference (they vary a lot) Storage - Memory Hierarchy 5

64-bit architecture Capacity 16x64b general purpose 32x512b AVX CPU Registers L1i 32K, L1d 32K, L2 256K - 1MB, L3 8MB - 45MB Caches 1 to 1000 GB Main memory (DRAM) Few Terabytes External storage (local persistent storage) Many Terabytes External storage (remote persistent storage) Petabytes Archive storage Storage - Memory Hierarchy 6

Latency Sub-nanosecond (1 cycle) CPU Registers L1 0.5-1 ns, L2 4-8 ns, L3 15-30 ns Caches 100 ns Main memory (DRAM) Microseconds (SSD) Milliseconds (HDD) External storage (local persistent storage) Milliseconds External storage (remote persistent storage) Seconds, minutes Archive storage Storage - Memory Hierarchy 7

Access Sub-nanosecond (1 cycle) CPU Registers Caches Byte addressable Random access Main memory (DRAM) External storage (local persistent storage) Block addressable Sequential access External storage (remote persistent storage) Archive storage Storage - Memory Hierarchy 8

What does this all mean? • The performance gaps between layers is huge (difficult to imagine at human scales) • We process an increasing amount of data, resulting in even more pressure on the memory system • Data movement is one of the major sources of energy consumption and inefficiencies in modern computers (and data centers) • Performance and efficiency largely determined by how well the database manages the movement of data across the hierarchy Storage - Memory Hierarchy 9

Locality (spatial and temporal) SELECT * FROM T WHERE X > 10 • The unit of transfer between layers SELECT * FROM T SELECT * FROM T in the memory hierarchy is typically WHERE Y = 20 fixed • To improve performance, it is A B C important to exploit D E • Spatial locality (put together what belongs together) • Temporal locality (do at the same time things that require the same data) • Managing the hierarchy amounts to A B C Transfer unit improving spatial and temporal D E locality Storage - Memory Hierarchy 10

What needs to be done? • Enhance temporal and spatial locality (data organization, query scheduling) • Make sure the data is available a the layer where it is needed to hide the latency caused by getting data from lower layer (pre-fetching) • Be clever about what to keep at each layer (caching strategies, replacement strategies) • Keep track of modifications and write back to the lower layers (all the way to persistent storage) when needed Storage - Memory Hierarchy 11

Reality is complex and getting even more so • Managing the memory hierarchy was never easy • No perfect solution • Workload dependent • Many compromises needed • Problem is becoming far more involved due to architectural developments • Multicore and NUMA • Non-Volatile Memory • Cloud computing and economies of scale • Network attached storage • Hardware Acceleration Storage - Memory Hierarchy 12

Multicore and NUMA AMD Bulldozer Storage - Memory Hierarchy 13

Non-Volatile Memory (NVM) Sub-nanosecond (1 cycle) CPU Registers Non-Volatile memory is a new form of memory combining Caches characteristics of DRAM and persistent storage: Main memory (DRAM) • Cheaper than DRAM • Byte addressable • Random access NVM External storage (local persistent storage) • Persistent • Faster than disks • Can be used as External storage (remote persistent storage) • Memory • Local disk • Network attached Archive storage Storage - Memory Hierarchy 14

Cloud computing • The ephemeral nature of the computing infrastructure forces a Compute layer separation of compute and storage. • Gives more flexibility to the cloud provider Network • Has changed the nature of “disk” and “storage” in fundamental ways Storage layer • Crucial for cloud native databases Storage - Memory Hierarchy 15

Network attached storage • The bandwidth and latencies of storage devices are not very high • Motivated by cloud designs, networks are becoming faster and have more bandwidth • Round trip time in a data center is less than a seek operation on a HDD • RDMA (Remote Direct Memory Access) reduces latencies by removing OS related inefficiencies • Eventually it might be faster to get data from the memory of a remote machine or remote storage device than from a local disk. Storage - Memory Hierarchy 16

Hardware Acceleration Oracle M7 SPARC processor Storage - Memory Hierarchy 17

Summary • Dealing with the memory hierarchy is a key aspect of the architecture of data management systems • Very old problem, still relevant • Many fundamental concepts still applicable today due to the way systems are evolving Storage - Memory Hierarchy 18

Data Management Systems Storage Management The Memory hierarchy - PowerPoint PPT Presentation

Data Management Systems Storage Management The Memory hierarchy Memory hierarchy Capacity and latencies Segments and file storage Locality and replacement policies Database buffer cache Hardware evolution Storage

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

CMPUT 391 Database Management Systems Spatial Data Management 1 Dr. Jrg Sander, 2006

BSD Data Systems Report to the School Board, April 2019 Data-Driven ESSA Best Practices Data

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

PUBLIC FINANCE MANAGEMENT SYSTEMS MANAGEMENT SYSTEMS AND RISK MODELLING IN TURKISH TREASURY

Database Management Objectives of Lecture 10 Systems Spatial Data Management Spatial Data

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Robust Applications in Mesos Using External Storage David vonThenen {code} Dell Technologies

MergeSort [5] In the last class Insertion sort Analysis of insertion sorting algorithm

Evaluation of Join Operations Ramakrishnan/Gehrke Chapter 14, Part A (Joins) 340151 Big Data

Steps in Query Processing 1. Translation check SQL syntax check existence of relations and

Secure Android Application Development June 5th 2020 Sophie Tian shuxut@cs.washington.edu

ECE590-03 Enterprise Storage Architecture Fall 2016 Hard disks, SSDs, and the I/O subsystem

Long-baseline searches for sterile neutrinos using neutral current interactions in NOvA Jeremy

CA Tracking S.Gorbunov, I.Kisel, M.Pugach FIAS, Frankfurt am Main PANDA collaboration meeting