Quiz! How costly is this operation (naive solution)? course per - PDF document

Quiz! How costly is this operation (naive solution)? course per weekday hour room Database Systems TDA357 3 HC1 Monday 15:15 TDA357 3 HC1 Thursday 10:00 TDA357 2 HB1 Tuesday 08:00 n Indexes TDA357 2 HB1 Friday 13:15 TIN090 1 HC1 Wednesday 08:00 DBMSs and ”NoSQL” TIN090 1 HA3 Thursday 13:15 SELECT * Go through all n rows, compare FROM Lectures with the values for course and WHERE course = ’TDA357’ period = 2 n comparisons AND period = 3; 1 2 Quiz! Index • When relations are large, scanning all Can you think of a way to make it faster? rows to find matching tuples becomes very SELECT * expensive. FROM Lectures WHERE course = ’TDA357’ • An index on an attribute A of a relation is a AND period = 3; data structure that makes it efficient to find If rows were stored sorted according to the values those tuples that have a fixed value for course and period, we could get all rows with the given values faster (O(log n) for tree structure). attribute A. Storing rows sorted is expensive, but we can use – Example: a hash table gives amortized O(1) an index that given values of these attributes lookups. points out all sought rows (an index could be a hash map, giving O(1) complexity to lookups). 3 4 Quiz! Typical (abstract) costs • Some typical costs of disk accessing for Asymptotic complexity (O(x) notation) is database operations on a relation stored misleading here. Why? over n blocks: The asymptotic complexity works for data structures in main – Query the full relation: n (disk operations) memory. But when working with stored persistent data, the running time of the data structure, once in main memory, is – Query with the help of index: k, where k is the negligible compared to the time it takes to read data from number of blocks pointed to (1 for key). disk. What really matters to get fast lookups in a database is to minimize the number of disk blocks accessed (could use – Access index: 1 asymptotic complexity over disk block accessing though). – Insert new value: 2 (one read, one write) Indexes help here too though. If a relation is stored over a number of disk blocks, knowing in which of these to look is – Update index: 2 (one read, one write) helpful. 5 6 1

Quiz! Example: SELECT * FROM Lectures How costly is this operation? WHERE course = ’TDA357’ AND period = 3; SELECT * Lectures: n disk blocks FROM Lectures, Courses Assume Lectures is stored in n disk blocks. With no Courses: m disk blocks WHERE course = code; index to help the lookup, we must look at all rows, which means looking in all n disk blocks for a total No index: Index on code in Courses: Go through all n blocks in Lectures, Go through all n blocks in Lectures, cost of n . compare the value for course from compare the value for course from each row with the values for code in each row with the index. Since With an index, we find that there are 2 rows with the all rows of Courses, stored in all m course is a key, each value will exist correct values for the course and period attributes. at most once, so the cost is 2 * n + 1 blocks. The total cost is thus n * m These are stored in two different blocks, so the total accessed disk blocks (1 for fetching accessed disk blocks. the index once). cost is 3 (2 blocks + reading index). 7 8 CREATE INDEX Important properties • Most DBMS support the statement • Indexes are separate data stored by itself. CREATE INDEX index name � Can be created ON table ( attributes ); � on newly created relations � on existing relations – Example: - will take a long time on large relations. CREATE INDEX courseIndex � Can be dropped without deleting any table data. ON Courses (code); • SQL statements do not have to be – Statement not in the SQL standard, but most changed DBMS support it anyway. – Primary keys are given indexes implicitly (by – a DBMS automatically uses any indexes. the SQL standard). 9 10 Quiz! Rule of thumb Why don’t we have indexes on all attributes for • Mostly queries on tables – use indexes for faster lookups? key attributes. – Indexes require disk space. • Mostly updates – be careful with indexes! – Modifications of tables are more expensive. • Need to update both table and index. – Not always useful • The table is very small. • We don’t perform lookups over it (Note: lookups � queries). – Using an index costs extra disk block accesses. 11 12 2

Quiz! Example: Suppose that the Lectures relation is stored in 20 disk blocks, and that we typically perform three operations on this table: Assume we have an index on Lectures for (course, – insert new lectures (Ins) – list all lectures of a particular course (Q1) period, weekday) which is the key. How costly – list all lectures in a given room (Q2) are these queries? Lectures: n disk blocks SELECT * SELECT * Let’s assume that in an average week there are: FROM Lectures FROM Lectures – 2 lectures for each course, and WHERE course = ’TDA357’ WHERE weekday = ’Monday’ – 10 lectures in each room. AND period = 3; AND room = ’HC1’; Let’s also assume that A multi-attribute index is typically organized hierarchically. First the – each course has lectures stored in 2 blocks, and rows are indexed according to the first attribute, then according to the second within each group, and so on. – each room has lectures stored in 7 (some lectures are Thus the left query costs at most k + 1 where k is the number of stored in the same block). rows matching the values. The right query can’t use the index, and thus costs n , where n is the size of the relation in disk blocks. 13 14 Insert new lectures (Ins) Example continued: List all lectures of a particular course (Q1) Quiz! List all lectures in a given room (Q2) Index for Index for Indexes No index Both indexes • Indexes are incredibly useful (although they are (course, period, weekday) room not part of the SQL standard). Ins 2 4 4 6 Q1 20 3 20 3 • Doing it wrong is costly. Q2 20 20 8 8 • Requires knowledge about the internals of a cost 2 + 18p 1 + 18p 2 4 – p 1 + 16p 2 4 + 16p 1 + 4p 2 6 – 3p 1 + 2p 2 DBMS. The amortized cost depends on the distribution of the operations. p 1 is – How is data stored? How large is a block? proportion of operations that are Q 1 queries, p 2 similarly for Q 2 , and thus the proportion of operations that are Ins modifications is 1 – p 1 – p 2 . For some • A DBMS should be able to decide better than different values of p 1 and p 2 we get actual costs of: the user what indexes are needed, from usage analysis. 2 + 18p 1 + 18p 2 4 – p 1 + 16p 2 4 + 16p 1 + 4p 2 6 – 3p 1 + 2p 2 p 1 = p 2 = 0.4 16.4 10 12 5.6 p 1 = p 2 = 0.1 5.6 5.5 6 5.9 So why don’t they?? p 1 = 0.6, p 2 = 0.3 18.2 8.2 14.8 4.8 15 16 Summary – indexes • Indexes make certain lookups and joins more efficient. – Disk block access matters. RDBMSs and beyond – Multi-attribute indexes • CREATE INDEX • Usage analysis The NoSQL movement – What are the expected operations? – How much do they cost? � (cost of operation)x(fraction of time on operation) 17 18 3

(R)DBMSs ”today” Proprietary RDBMSs DMBS Approx. market share • Oracle DB – First commercial SQL database (Microsoft Access) – HUGE market share historically Oracle 40% – Standard incompliant IBM DB2 30% • IBM DB2 Microsoft SQL Server 15% – First SQL database (in-house) Sybase 3% – Near-monopoly on ”mainframes” (…) – Towards Oracle-compliant (!) MySQL 1% • Microsoft SQL Server PostgreSQL 0.5% – Market leading on Windows application platforms 19 20 MySQL PostgreSQL • Open Source – but owned by Oracle since 2010 • Open Source – community development • Historically: fast but feature-poor • Historically: full-featured but (relatively) slow – Subqueries (!) added in recent release • Much faster today – and optimized for complex tasks – FK constraints only with non-standard backend – Efficient support for joins – ACID transactions only (crudely) with non-standard backends • Almost standard-compliant – Not optimized for joins – Full constraint support – except assertions! • Still missing features (but getting closer) – Full ACID transactions – No CHECK constraints (including assertions) – Sequencing (WITH) – No sequencing (WITH) • Prominent users: Yahoo, MySpace, Skype, … • Big on the web: used by Wikipedia, Facebook, … – Early support in PHP helped boost 21 22 Beyond SQL? SQL injection attacks • SQL was first developed around 1970 – Contemporaries: Forth, PL/I, Pascal, C, SmallTalk, ML, … • With one prominent exception – C – these have all been succeded by newer, cooler languages. http://xkcd.com/327/ • Has the time finally come for SQL as well? The possibility for SQL injection attacks has lead development away from literal SQL, towards higher-level interfaces, tools and libraries. 23 24 4

Quiz! How costly is this operation (naive solution)? course per - PDF document

Quiz! How costly is this operation (naive solution)? course per weekday hour room Database Systems TDA357 3 HC1 Monday 15:15 TDA357 3 HC1 Thursday 10:00 TDA357 2 HB1 Tuesday 08:00 n Indexes TDA357 2 HB1 Friday 13:15

Endocrinology: top- decile quiz SBA Quiz Quiz Dr Shuaib Siddiqui, MB BChir MRCP FY3 doctor

PBIO 375 Quiz Section Goals of Quiz Section Website Quiz Section Tests Quiz

Welcome! Happy World Quality Day 2012 Quality Quiz Role Players Quiz Master Neelakanta Ratnam

C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best

Announcements ICS 6B Quiz schedule online * Will allow you to drop 1 quiz Boolean Algebra

San Joaquin Keeping Your Baby Safe While Sleeping 1 PRE-QUIZ 2 Pre-Quiz A. Shaken baby

KNOW THE RISKS A Youth Guide To E-cigarettes Quiz Quiz TOPICS 1 NO MATTER WHAT YOU CALL IT,

Marathon Petroleum Casey Sullivan, Government and Public Affairs Manager 1 Refining 101 2 Quiz

Defining Functions Academic Integrity Quiz Remember : quiz about the course AI policy Have

Defining Functions Academic Integrity Quiz Remember : quiz about the course AI policy Have

The Great DNSSEC Quiz (no DNS records where harmed during the making of this quiz) The rules

Work Instruction Purpose Use this procedure to add a quiz to a presentation. Trigger Perform

Self Reflection https://cmha.ca/work-life-balance-quiz#balance-quiz

VOTING AND ELECTIONS Civic Quiz Challenge Name one of the three requirements needed in order

Quiz! Quiz! If youre on the left side of the classroom : add station FJS to your

Unit 2: Biological basis of life, heredity, and genetics 1 Summary 1. Quiz info - Quiz next

Database Systems Indexes DBMSs and NoSQL 1 Quiz! How costly is this operation (naive

Exodus 1:6-7 6 Now Joseph and all his brothers and all that generation died, 7 but the Israelites

Chapter 10: Later Greek Comedy The Hellenistic Age general chaos and confusion after

Improving Onion Notation Richard Clayton

Weierstra Approximation Theorem (1885) and his 1886 lecture course revisited Reinhard

Devon Middle School Devon Middle School School opened in 1949 Grade 6, 7, 8 50+ Staff

S1 Parents Engagement Night COMPETENT & CONFIDENT LEADER CRIT ITICAL & IN INVENTIVE

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

Quiz! How costly is this operation (naive solution)? course per - PDF document

Quiz! How costly is this operation (naive solution)? course per weekday hour room Database Systems TDA357 3 HC1 Monday 15:15 TDA357 3 HC1 Thursday 10:00 TDA357 2 HB1 Tuesday 08:00 n Indexes TDA357 2 HB1 Friday 13:15

Endocrinology: top- decile quiz SBA Quiz Quiz Dr Shuaib Siddiqui, MB BChir MRCP FY3 doctor

PBIO 375 Quiz Section Goals of Quiz Section Website Quiz Section Tests Quiz

Welcome! Happy World Quality Day 2012 Quality Quiz Role Players Quiz Master Neelakanta Ratnam

C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best

Announcements ICS 6B Quiz schedule online * Will allow you to drop 1 quiz Boolean Algebra

San Joaquin Keeping Your Baby Safe While Sleeping 1 PRE-QUIZ 2 Pre-Quiz A. Shaken baby

KNOW THE RISKS A Youth Guide To E-cigarettes Quiz Quiz TOPICS 1 NO MATTER WHAT YOU CALL IT,

Marathon Petroleum Casey Sullivan, Government and Public Affairs Manager 1 Refining 101 2 Quiz

Defining Functions Academic Integrity Quiz Remember : quiz about the course AI policy Have

Defining Functions Academic Integrity Quiz Remember : quiz about the course AI policy Have

The Great DNSSEC Quiz (no DNS records where harmed during the making of this quiz) The rules

Work Instruction Purpose Use this procedure to add a quiz to a presentation. Trigger Perform

Self Reflection https://cmha.ca/work-life-balance-quiz#balance-quiz

VOTING AND ELECTIONS Civic Quiz Challenge Name one of the three requirements needed in order

Quiz! Quiz! If youre on the left side of the classroom : add station FJS to your

Unit 2: Biological basis of life, heredity, and genetics 1 Summary 1. Quiz info - Quiz next

Database Systems Indexes DBMSs and NoSQL 1 Quiz! How costly is this operation (naive

Exodus 1:6-7 6 Now Joseph and all his brothers and all that generation died, 7 but the Israelites

Chapter 10: Later Greek Comedy The Hellenistic Age general chaos and confusion after

Improving Onion Notation Richard Clayton

Weierstra Approximation Theorem (1885) and his 1886 lecture course revisited Reinhard

Devon Middle School Devon Middle School School opened in 1949 Grade 6, 7, 8 50+ Staff

S1 Parents Engagement Night COMPETENT &amp; CONFIDENT LEADER CRIT ITICAL &amp; IN INVENTIVE

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

S1 Parents Engagement Night COMPETENT & CONFIDENT LEADER CRIT ITICAL & IN INVENTIVE