Storage and Indexing (Appendix D, Chapter 10B Kroenke) 1 Database - PDF document

Storage and Indexing (Appendix D, Chapter 10B – Kroenke) 1 Database Design Process  Requirements analysis  Conceptual design  data model  Logical design  Schema refinement: Normalization  Physical tuning Ramakrishnan, Gehrke: Database Management Systems 2 1

Goals  Query Execution  Indexing Ramakrishnan, Gehrke: Database Management Systems 3 Disks and Files  Basic data abstraction - File - collection of records  DBMS store data on (“hard”) disks  Why not main memory?  Why not tapes?  Data is stored and retrieved in units called disk blocks or pages .  Unlike RAM, time to retrieve a disk page varies depending upon location on disk.  Therefore, relative placement of pages on disk has major impact on DBMS performance!  Major cost component: I/O time Ramakrishnan, Gehrke: Database Management Systems 4 2

Class Exercise  Consider a disk with average I/O time 20msec and page size = 1024 bytes  Table: 200,000 rows of 100 bytes each, no row spans 2 pages  Find:  Number of pages needed to store the table  Time to read all rows sequentially  Worst case time to read all rows in some random order Ramakrishnan, Gehrke: Database Management Systems 5 Class Exercise Solution  Number of pages for table:  Nb rows per page = floor(page size/ row size) = floor(1024/100) = 10  Nb pages to store all rows = ceiling(nb rows/nb rows per page) = ceiling(200000/10) = 20000 pages  Time to read all rows sequentially  nb pages*time per page = 20000*20msec = 400sec ~6.6min  Time to read all rows in some random order:  Note: Since memory is limited, in the worst case, every time we need to read a row, the page holding the row needs to be brought in from disk to memory  Worst case time: nb rows * time per page = 4000 sec ~66.6 min Ramakrishnan, Gehrke: Database Management Systems 6 3

Queries  Equality queries: SELECT * FROM Product WHERE BarCode = 10002121  Range queries: SELECT * FROM Product WHERE Price BETWEEN 5 and 15  Assume: 200,000 rows in table – 20000 pages on disk  Need indexes to allow fast access to data Ramakrishnan, Gehrke: Database Management Systems 7 Indexes  An index on a file speeds up selections on the search key columns  Any subset of the columns of a table can be the search key for an index on the table Ramakrishnan, Gehrke: Database Management Systems 8 4

Hash Index Constant search time Equality queries only Ramakrishnan, Gehrke: Database Management Systems 9 B+ Tree Index O(log d N) search time d – fan-out (~150) N – number of data entries Supports range queries Ramakrishnan, Gehrke: Database Management Systems 10 5

Example B+ Tree  Find 28*? 29*? All > 15* and < 30*  Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes.  Change sometimes bubbles up the tree Ramakrishnan, Gehrke: Database Management Systems 11 Index Classification  Clustered vs. unclustered : If order of rows on hard-disk is the same as order of data entries, then called clustered index.  A file can be clustered on at most one search key.  Cost of retrieving data records through index varies greatly based on whether index is clustered or not! Ramakrishnan, Gehrke: Database Management Systems 12 6

Clustered vs. Unclustered Ramakrishnan, Gehrke: Database Management Systems 13 CREATE INDEX in MySQL  CREATE [UNIQUE] INDEX index_name [USING index_type ] ON tbl_name ( col_name ,...) index_type BTREE | HASH Example: CREATE INDEX I_ItemPrice USING BTREE ON Items (Price) SELECT * FROM Product WHERE Price between 5 and 10 SELECT * FROM Product WHERE BarCode = 100111 Ramakrishnan, Gehrke: Database Management Systems 14 7

Use Indexes – Decisions to Make  What indexes should we create?  Which tables should have indexes? What column(s) should be the search key? Should we build several indexes?  For each index, what kind of an index should it be?  Clustered? Hash/tree? Ramakrishnan, Gehrke: Database Management Systems 15 ICE: Compute query time  SELECT * FROM Product WHERE Price between 5 and 10 1. Is a hash index useful in this case? Why? 2. Compute time needed to evaluate query assuming 20% of data satisfies condition, disk with average I/O time 20msec, page size = 1024 bytes,200,000 rows of 100 bytes each, no row spans 2 pages 1. If no index exists 2. Clustered B+tree index on Price exists 3. Not-clustered B+tree index on Price exists 8

Index Selection Guidelines  Columns in WHERE clause are candidates for index keys.  Exact match condition suggests hash index.  Range query suggests tree index.  Try to choose indexes that benefit as many queries as possible.  At most one clustered index per table! Think of trade-offs before creating an index! Ramakrishnan, Gehrke: Database Management Systems 17 Examples  B+ tree index on E.age can be used to get qualifying tuples.  How selective is the condition?  Is the index clustered?  Consider the GROUP BY query.  If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.  Clustered E.dno index may be better!  Equality queries and duplicates:  Clustering on E.hobby helps! Ramakrishnan, Gehrke: Database Management Systems 18 9

Class Exercise What index would you construct? 1. SELECT * FROM Mids WHERE Company = 6 2. SELECT CourseID, Count(*) FROM StudentsEnroll WHERE Company = 6 GROUP BY CourseID Ramakrishnan, Gehrke: Database Management Systems 19 Summary  Indexes are used to speed up queries  They can slow down inserts/deletes/updates  Can have several indexes on a given table, each with a different search key.  Indexes can be  Hash-based vs. Tree-based  Clustered vs. unclustered Ramakrishnan, Gehrke: Database Management Systems 20 10

Storage and Indexing (Appendix D, Chapter 10B Kroenke) 1 Database - PDF document

Storage and Indexing (Appendix D, Chapter 10B Kroenke) 1 Database Design Process Requirements analysis Conceptual design data model Logical design Schema refinement: Normalization Physical tuning Ramakrishnan, Gehrke:

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Storage and Indexing 11/19/2018 1 Overview We covered storage of unstructured files in HDFS

Storage and Indexing 1 Overview We covered storage of unstructured files in HDFS

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Storage and Indexing DBS Database Systems Reading: R&G Chapters 8, 9 & 10.1 Implementing

Overview of Storage and Indexing [R&G] Chapter 8 CS4320 1 Data on External Storage

Index Unit EBLL case reported for a child under age six in a public housing unit What now?

Production of light flavor hadrons at intermediate and high p T measured with ALICE Michael Linus

Speech Processing 15-492/18-492 Human Speech Processing Phonetics and Phonology This lecture is

fname lname email company country state Kuiwen Ji jikuiwen@huawei.com Huawei CN Yun Pu

Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis

MySQL Index Cookbook Deep & Wide Index Tutorial Rick James Feb., 2015 TOC Preface

Architecture Case Study: 1 Key Word in Context (KWIC) Aims: To demonstrate key features

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure

Storage and Indexing (Appendix D, Chapter 10B Kroenke) 1 Database - PDF document

Storage and Indexing (Appendix D, Chapter 10B Kroenke) 1 Database Design Process Requirements analysis Conceptual design data model Logical design Schema refinement: Normalization Physical tuning Ramakrishnan, Gehrke:

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Storage and Indexing 11/19/2018 1 Overview We covered storage of unstructured files in HDFS

Storage and Indexing 1 Overview We covered storage of unstructured files in HDFS

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Storage and Indexing DBS Database Systems Reading: R&amp;G Chapters 8, 9 &amp; 10.1 Implementing

Overview of Storage and Indexing [R&amp;G] Chapter 8 CS4320 1 Data on External Storage

Index Unit EBLL case reported for a child under age six in a public housing unit What now?

Production of light flavor hadrons at intermediate and high p T measured with ALICE Michael Linus

Speech Processing 15-492/18-492 Human Speech Processing Phonetics and Phonology This lecture is

fname lname email company country state Kuiwen Ji jikuiwen@huawei.com Huawei CN Yun Pu

Learned Index Structures paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis

MySQL Index Cookbook Deep &amp; Wide Index Tutorial Rick James Feb., 2015 TOC Preface

Architecture Case Study: 1 Key Word in Context (KWIC) Aims: To demonstrate key features

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data &amp; Infrastructure

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

Storage and Indexing DBS Database Systems Reading: R&G Chapters 8, 9 & 10.1 Implementing

Overview of Storage and Indexing [R&G] Chapter 8 CS4320 1 Data on External Storage

MySQL Index Cookbook Deep & Wide Index Tutorial Rick James Feb., 2015 TOC Preface

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure