physical database design
play

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - PowerPoint PPT Presentation

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann) Alternative File Organizations File organization = Method of arranging a file of records on external storage Goal:


  1. Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann)

  2. Alternative File Organizations  File organization = Method of arranging a file of records on external storage • Goal: quickly find records needed by query  Several alternatives • Heap files • Sorted Files • Indexes 340151 Big Databases & Cloud Services (P. Baumann) 2

  3. Index Selection Guidelines  Understand workload: • Queries vs. update • What relations (sizes!), attributes, conditions, joins (selectivity!), …?  Attributes in WHERE clause are candidates for index keys • Exact match condition suggests hash index, range query suggests tree index • Consider multi-attribute search keys for several WHERE clause conditions • Order of attributes important for range queries  Choose indexes that benefit as many queries as possible • impact on updates: Indexes make queries faster, updates slower • require disk space  understand how DBMS evaluates queries & creates query evaluation plans 340151 Big Databases & Cloud Services (P. Baumann) 3

  4. Decisions to Make  What indexes? • Which relations? What field(s) search key? Several indexes? • For each index, what kind of an index should it be?  Change conceptual schema? guided by workload, in addition to redundancy issues • Consider alternative normalized schemas? (many choices!) • “undo’’ some decompositions, settle for a lower normal form, such as 3NF? (denormalization) • Horizontal partitioning, replication, views ...see manuals  If made after a database is in use, called schema evolution 340151 Big Databases & Cloud Services (P. Baumann) 4

  5. Example Schema Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)  Contracts = CSJDPQV; ICs: JP C, SD P; C is primary key • superkey? • What normal form? 340151 Big Databases & Cloud Services (P. Baumann) 5

  6. Denormalization Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)  Suppose following query is important: • “Value of contract less than department budget?”  To speed up, add field budget B (from Departments) to Contracts • New FD for Dept./Budget: Did B • Contracts no longer in 3NF  might choose to modify Contracts • sufficiently important? No good performance otherwise? • i.e., by indexes, choosing alternative 3NF schema 340151 Big Databases & Cloud Services (P. Baumann) 6

  7. Decomposition of a BCNF Relation  Suppose { SDP, CSJDQV } in BCNF • no reason to decompose further (assuming that all known ICs are FDs)  However, suppose that these queries are important • “Find the contracts held by supplier S” • “Find the contracts that department D is involved in”  Decomposing CSJDQV further into CS, CD and CJQV: • could speed up these queries (Why?) • following query is slower: “Find total value of all contracts held by supplier S.” 340151 Big Databases & Cloud Services (P. Baumann) 7

  8. Masking Conceptual Schema Changes CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts  Contracts  LargeContracts + SmallContracts can be masked by view  queries with condition val>10000: preferable LargeContracts for efficient execution 340151 Big Databases & Cloud Services (P. Baumann) 8

  9. Tuning Queries and Views  query runs slower than expected? check if index needs to be re-built or statistics too old  DBMS may not be executing plan you had in mind. Common problems: • Selections involving null values • Selections involving arithmetic or string expressions • Selections involving OR conditions • Lack of evaluation features like index-only strategies or certain join methods or poor size estimation  Check plan used, adjust choice of indexes or rewrite query/view • Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT and GROUP BY 340151 Big Databases & Cloud Services (P. Baumann) 9

  10. PS: A Moderately Complex Query 340151 Big Databases & Cloud Services (P. Baumann) 10

  11. Key Performance Factors  Ref: discussion "what are the key points to improve the query performance" on the LinkedIn Database list, 2012-07-20 340151 Big Databases & Cloud Services (P. Baumann) 11

  12. Summary  Many alternative file organizations, each appropriate in some situation  If selection queries frequent: sort file or build an index • Hash vs tree indexes vs sorted files  Understand workload & DBMS query plans 340151 Big Databases & Cloud Services (P. Baumann) 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend