Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - - PowerPoint PPT Presentation

physical database design
SMART_READER_LITE
LIVE PREVIEW

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - - PowerPoint PPT Presentation

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann) Alternative File Organizations File organization = Method of arranging a file of records on external storage Goal:


slide-1
SLIDE 1

340151 Big Data & Cloud Services (P. Baumann)

Physical Database Design

Ramakrishnan & Gehrke, Chapter 17 & 18

slide-2
SLIDE 2

2 340151 Big Databases & Cloud Services (P. Baumann)

Alternative File Organizations

  • File organization = Method of arranging a file of records on external storage
  • Goal: quickly find records needed by query
  • Several alternatives
  • Heap files
  • Sorted Files
  • Indexes
slide-3
SLIDE 3

3 340151 Big Databases & Cloud Services (P. Baumann)

  • Understand workload:
  • Queries vs. update
  • What relations (sizes!), attributes, conditions, joins (selectivity!), …?
  • Attributes in WHERE clause are candidates for index keys
  • Exact match condition suggests hash index, range query suggests tree index
  • Consider multi-attribute search keys for several WHERE clause conditions
  • Order of attributes important for range queries
  • Choose indexes that benefit as many queries as possible
  • impact on updates: Indexes make queries faster, updates slower
  • require disk space
  • understand how DBMS evaluates queries & creates query evaluation plans

Index Selection Guidelines

slide-4
SLIDE 4

4 340151 Big Databases & Cloud Services (P. Baumann)

Decisions to Make

  • What indexes?
  • Which relations? What field(s) search key? Several indexes?
  • For each index, what kind of an index should it be?
  • Change conceptual schema?

guided by workload, in addition to redundancy issues

  • Consider alternative normalized schemas? (many choices!)
  • “undo’’ some decompositions, settle for a lower normal form, such as 3NF?

(denormalization)

  • Horizontal partitioning, replication, views ...see manuals
  • If made after a database is in use, called schema evolution
slide-5
SLIDE 5

5 340151 Big Databases & Cloud Services (P. Baumann)

Example Schema

  • Contracts = CSJDPQV; ICs: JP

C, SD P; C is primary key

  • superkey?
  • What normal form?

Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)

slide-6
SLIDE 6

6 340151 Big Databases & Cloud Services (P. Baumann)

Denormalization

  • Suppose following query is important:
  • “Value of contract less than department budget?”
  • To speed up, add field budget B (from Departments) to Contracts
  • New FD for Dept./Budget: Did

B

  • Contracts no longer in 3NF
  • might choose to modify Contracts
  • sufficiently important? No good performance otherwise?
  • i.e., by indexes, choosing alternative 3NF schema

Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)

slide-7
SLIDE 7

7 340151 Big Databases & Cloud Services (P. Baumann)

Decomposition of a BCNF Relation

  • Suppose { SDP, CSJDQV } in BCNF
  • no reason to decompose further (assuming that all known ICs are FDs)
  • However, suppose that these queries are important
  • “Find the contracts held by supplier S”
  • “Find the contracts that department D is involved in”
  • Decomposing CSJDQV further into CS, CD and CJQV:
  • could speed up these queries (Why?)
  • following query is slower:

“Find total value of all contracts held by supplier S.”

slide-8
SLIDE 8

8 340151 Big Databases & Cloud Services (P. Baumann)

Masking Conceptual Schema Changes

  • Contracts  LargeContracts + SmallContracts can be masked by view
  • queries with condition val>10000:

preferable LargeContracts for efficient execution

CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts

slide-9
SLIDE 9

9 340151 Big Databases & Cloud Services (P. Baumann)

  • query runs slower than expected?

check if index needs to be re-built or statistics too old

  • DBMS may not be executing plan you had in mind.

Common problems:

  • Selections involving null values
  • Selections involving arithmetic or string expressions
  • Selections involving OR conditions
  • Lack of evaluation features like index-only strategies or certain join methods or poor size

estimation

  • Check plan used, adjust choice of indexes or rewrite query/view
  • Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT

and GROUP BY

Tuning Queries and Views

slide-10
SLIDE 10

10 340151 Big Databases & Cloud Services (P. Baumann)

PS: A Moderately Complex Query

slide-11
SLIDE 11

11 340151 Big Databases & Cloud Services (P. Baumann)

Key Performance Factors

  • Ref: discussion "what are the key points to improve the query performance"
  • n the LinkedIn Database list, 2012-07-20
slide-12
SLIDE 12

12 340151 Big Databases & Cloud Services (P. Baumann)

Summary

  • Many alternative file organizations, each appropriate in some situation
  • If selection queries frequent: sort file or build an index
  • Hash vs tree indexes vs sorted files
  • Understand workload & DBMS query plans