DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y D A T A B A S E S Y S T E M S

administrivia • Project ideas – List shared on Piazza – Start looking for team-mates! – Sign up for discussion slots during office hours GT 8803 // Fall 2019 2

LAST CLASS • History of DBMSs – In a way though, it really was a history of data models • Data Models – Hierarchical data model (tree) (IMS) – Network data model (graph) (CODASYL) – Relational data model (tables) (System R, INGRES) • Overarching theme about all these systems – They were all disk-based DBMSs GT 8803 // Fall 2019 3

TODAY’s AGENDA • Disk-centric DBMSs • In-Memory DBMSs GT 8803 // Fall 2019 4

DISK-CENTRIC DBMSs 5 GT 8803 // Fall 2018

ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System GT 8803 // Fall 2019 6

ANATOMY OF A DATABASE SYSTEM • Process Manager – Manages client connections • Query Processor – Parse, plan and execute queries on top of storage manager • Transactional Storage Manager – Knits together buffer management, concurrency control, logging and recovery • Shared Utilities – Manage hardware resources across threads GT 8803 // Fall 2019 7

TOPICS • Implications of availability of large DRAM chips for database systems – Buffer Management – Query Processing – Concurrency Control – Logging and Recovery GT 8803 // Fall 2019 8

BACKGROUND • Much of the history of DBMSs is about dealing with the limitations of hardware. • Hardware was much different when the original DBMSs were designed: – Uniprocessor (single-core CPU) – RAM was severely limited (few MB). – The database had to be stored on disk. – Disk is slow. No seriously, I mean really slow. GT 8803 // Fall 2019 9

BACKGROUND • But now DRAM capacities are large enough that most databases can fit in memory. – Structured data sets are smaller (e.g., tables with numeric data). – Unstructured data sets are larger (e.g., videos). • So why not just use a "traditional" disk- oriented DBMS with a really large cache? GT 8803 // Fall 2019 10

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 11 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL LATCHING LOCKING LOGGING B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 12 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 13 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions BUFFER POOL 14% LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 14 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 15 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 16 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 16% 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 17 GT 8803 // Fall 2018

DISK-ORIENTED DBMS OVERHEAD Measured CPU Instructions 16% BUFFER POOL 14% LATCHING 12% LOCKING LOGGING 16% 34% B-TREE KEYS REAL WORK OLTP THROUGH THE LOOKING GLASS, 7% AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008. 18 GT 8803 // Fall 2018

bUFFER MANAGEMENT • The primary storage location of the database is on non-volatile storage (e.g., SSD). – The database is stored in a file as a collection of fixed-length blocks called slotted pages on disk. • The system uses an volatile in-memory buffer pool to cache blocks fetched from disk. – Its job is to manage the movement of those blocks back and forth between disk and memory. GT 8803 // Fall 2019 19

bUFFER MANAGEMENT • When a query accesses a page, the DBMS checks to see if that page is already in memory in a buffer pool – If it’s not, then the DBMS has to retrieve it from disk and copy it into a free frame in the buffer pool. – If there are no free frames, then find a page to evict guided by the page replacement policy . – If the page being evicted is dirty, then the DBMS has to write it back to disk to ensure the durability (ACI D ) of data. GT 8803 // Fall 2019 20

bUFFER MANAGEMENT • Page replacement policy is a differentiating factor between open-source and commercial DBMSs. – What kind of data does it contain? – Is the page dirty? – How likely is the page to be accessed in the near future? – Examples: LRU, LFU, CLOCK, ARC GT 8803 // Fall 2019 21

bUFFER MANAGEMENT • Once the page is in memory, the DBMS translates any on-disk addresses to their in- memory addresses. (Page Identifier) (Page Pointer) [#100] [0x5050] GT 8803 // Fall 2019 22

bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Slotted Pages 23 GT 8803 // Fall 2018

bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page2 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 24 GT 8803 // Fall 2018

bUFFER MANAGEMENT Index Buffer Pool Database (On-Disk) page6 page0 page1 page4 page2 Page Table Page Id + Slot # Slotted Pages 30 GT 8803 // Fall 2018

bUFFER MANAGEMENT • Every tuple access has to go through the buffer pool manager regardless of whether that data will always be in memory. – Always have to translate a tuple’s record id to its memory location. – Worker thread has to pin pages that it needs to make sure that they are not swapped to disk. GT 8803 // Fall 2019 34

BUFFER MANAGEMENT GT 8803 // Fall 2019 35

BUFFER MANAGEMENT • Q: What do we gain by managing an in- memory buffer? – A: Accelerate query processing by storing frequently-accessed pages in fast memory • Q: Can we “learn” an optimal page replacement policy? – A: Recent paper from Google on learning memory accesses based on LSTM models. GT 8803 // Fall 2019 36

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y D A T A B A S E S Y S T E M S administrivia Project ideas List shared on Piazza Start

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Synchronization Chi Zhang czhang@cs.fiu.edu 1 Cooperating Processes Independent process

CS 356: Introduction to Computer Networks Lecture 16: Transmission Control Protocol (TCP) Chap.

Graphics and Framebuffers Baremetal on the Pi Raspberry Pi A+ ARM processor and memory

CMPSC 497 Buffer Overflow Vulnerabilities Trent Jaeger Systems and Internet Infrastructure

Connecting ROOT to the Python world with Numpy arrays 2018-03-08 1 What is the idea? Numpy

Systems & Research Project Prof. Manos Athanassoulis

1 Store Stages in Dynamic Execution Load Bypassing and Memory Disambiguation 1. Wait in RS

TCP Congestion Control Beyond Bandwidth-Delay Product for Mobile Cellular Networks Wai Kay Leong

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y D A T A B A S E S Y S T E M S administrivia Project ideas List shared on Piazza Start

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Synchronization Chi Zhang czhang@cs.fiu.edu 1 Cooperating Processes Independent process

CS 356: Introduction to Computer Networks Lecture 16: Transmission Control Protocol (TCP) Chap.

Graphics and Framebuffers Baremetal on the Pi Raspberry Pi A+ ARM processor and memory

CMPSC 497 Buffer Overflow Vulnerabilities Trent Jaeger Systems and Internet Infrastructure

Connecting ROOT to the Python world with Numpy arrays 2018-03-08 1 What is the idea? Numpy

Systems &amp; Research Project Prof. Manos Athanassoulis

1 Store Stages in Dynamic Execution Load Bypassing and Memory Disambiguation 1. Wait in RS

TCP Congestion Control Beyond Bandwidth-Delay Product for Mobile Cellular Networks Wai Kay Leong

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Systems & Research Project Prof. Manos Athanassoulis