340151 Big Data & Cloud Services (P. Baumann)
Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - - PowerPoint PPT Presentation
Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - - PowerPoint PPT Presentation
Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann) Alternative File Organizations File organization = Method of arranging a file of records on external storage Goal:
2 340151 Big Databases & Cloud Services (P. Baumann)
Alternative File Organizations
- File organization = Method of arranging a file of records on external storage
- Goal: quickly find records needed by query
- Several alternatives
- Heap files
- Sorted Files
- Indexes
3 340151 Big Databases & Cloud Services (P. Baumann)
- Understand workload:
- Queries vs. update
- What relations (sizes!), attributes, conditions, joins (selectivity!), …?
- Attributes in WHERE clause are candidates for index keys
- Exact match condition suggests hash index, range query suggests tree index
- Consider multi-attribute search keys for several WHERE clause conditions
- Order of attributes important for range queries
- Choose indexes that benefit as many queries as possible
- impact on updates: Indexes make queries faster, updates slower
- require disk space
- understand how DBMS evaluates queries & creates query evaluation plans
Index Selection Guidelines
4 340151 Big Databases & Cloud Services (P. Baumann)
Decisions to Make
- What indexes?
- Which relations? What field(s) search key? Several indexes?
- For each index, what kind of an index should it be?
- Change conceptual schema?
guided by workload, in addition to redundancy issues
- Consider alternative normalized schemas? (many choices!)
- “undo’’ some decompositions, settle for a lower normal form, such as 3NF?
(denormalization)
- Horizontal partitioning, replication, views ...see manuals
- If made after a database is in use, called schema evolution
5 340151 Big Databases & Cloud Services (P. Baumann)
Example Schema
- Contracts = CSJDPQV; ICs: JP
C, SD P; C is primary key
- superkey?
- What normal form?
Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)
6 340151 Big Databases & Cloud Services (P. Baumann)
Denormalization
- Suppose following query is important:
- “Value of contract less than department budget?”
- To speed up, add field budget B (from Departments) to Contracts
- New FD for Dept./Budget: Did
B
- Contracts no longer in 3NF
- might choose to modify Contracts
- sufficiently important? No good performance otherwise?
- i.e., by indexes, choosing alternative 3NF schema
Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)
7 340151 Big Databases & Cloud Services (P. Baumann)
Decomposition of a BCNF Relation
- Suppose { SDP, CSJDQV } in BCNF
- no reason to decompose further (assuming that all known ICs are FDs)
- However, suppose that these queries are important
- “Find the contracts held by supplier S”
- “Find the contracts that department D is involved in”
- Decomposing CSJDQV further into CS, CD and CJQV:
- could speed up these queries (Why?)
- following query is slower:
“Find total value of all contracts held by supplier S.”
8 340151 Big Databases & Cloud Services (P. Baumann)
Masking Conceptual Schema Changes
- Contracts LargeContracts + SmallContracts can be masked by view
- queries with condition val>10000:
preferable LargeContracts for efficient execution
CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts
9 340151 Big Databases & Cloud Services (P. Baumann)
- query runs slower than expected?
check if index needs to be re-built or statistics too old
- DBMS may not be executing plan you had in mind.
Common problems:
- Selections involving null values
- Selections involving arithmetic or string expressions
- Selections involving OR conditions
- Lack of evaluation features like index-only strategies or certain join methods or poor size
estimation
- Check plan used, adjust choice of indexes or rewrite query/view
- Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT
and GROUP BY
Tuning Queries and Views
10 340151 Big Databases & Cloud Services (P. Baumann)
PS: A Moderately Complex Query
11 340151 Big Databases & Cloud Services (P. Baumann)
Key Performance Factors
- Ref: discussion "what are the key points to improve the query performance"
- n the LinkedIn Database list, 2012-07-20
12 340151 Big Databases & Cloud Services (P. Baumann)
Summary
- Many alternative file organizations, each appropriate in some situation
- If selection queries frequent: sort file or build an index
- Hash vs tree indexes vs sorted files
- Understand workload & DBMS query plans