Transactions & Indexing
Professor Larry Heimann Carnegie Mellon University Information Systems Program
Transactions & Indexing Professor Larry Heimann Carnegie Mellon - - PowerPoint PPT Presentation
Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example:
Professor Larry Heimann Carnegie Mellon University Information Systems Program
A transaction is a sequence of one or more SQL statements that together form a logical unit of work
Example: Creating an order
quantity-on-hand of the product
salesperson who took the order, and finally...
SELECT * FROM products; INSERT INTO
UPDATE products; UPDATE salesreps; UPDATE offices; SELECT * FROM products; INSERT INTO
UPDATE products; ERROR IN UPDATE salesreps;
unchanged
SELECT * FROM products; INSERT INTO
POWER OUTAGE
unchanged
How do we ensure that database transactions are reliably processed? Four words:
A transaction is an atomic unit of processing, and it either has to be performed in its entirety or not at all Atomicity must be guaranteed in any and all circumstances: user error, power failure, system failure and the like Example of failure of atomicity in the order processing example:
What’s the key idea? Rollback ability
A successful execution of a transaction must take a consistent database state to a (new) consistent database state With various constraints, cascades and triggers in effect, writing new data in one location will be consistent with other data in the database Example of failure of consistency in the order processing example:
What’s the key idea? Integrity constraints
A transaction must not make its modifications visible to other transactions until it is committed, i.e., each transaction is unaware of other transactions executing concurrently in the system If two users are trying to write to the database at the same time, one must wait until the
If there are concurrent transactions, the results should be the same as if transactions are executed serially, i.e. one after the other
What’s the key idea? Concurrency Control
Once a transaction has committed its changes, these changes must never get lost due to subsequent (system) failures Example of failure of durability in the order processing example:
be saved later to the database, but then power goes out and this data is lost.
What’s the key idea? Recovery
BEGIN; UPDATE heroes SET name = 'The Tick' WHERE hero_id = 14; SAVEPOINT first_leg; UPDATE heroes SET age = 42;
ROLLBACK TO first_leg; UPDATE heroes SET age = 42 WHERE hero_id = 14; COMMIT;
Key to transaction processing is the concept of database locking Shared lock is used by the DBMS when a transaction wants to read data from the database. Another concurrent transaction can also acquire a shared lock on the same data, allowing the other transaction to also read the data. Exclusive lock is used by the DBMS when a transaction wants to update data in the database. When a transaction has an exclusive lock on some data, other transactions cannot acquire any type of lock (shared or exclusive)
MVCC provides each user connected to the database with a snapshot of the database for that person to work with Changes made will not be seen by other users of the database until the transaction has been committed Avoids managing locks for read transactions because writes can be isolated by virtue of the old versions being maintained Writes affect future version but at the transaction ID that the read is working at, everything is guaranteed to be consistent because the writes are occurring at a later transaction ID
SELECT * FROM heroes WHERE hero_id=14 FOR UPDATE;
“Indexes add overhead and slow down inserts and updates”
“Index every field to make the database superfast to query”
the idea is that the amount of data on both sides of the tree is roughly the same so the number of levels that must be traversed to find rows is always in the same ballpark. B-Tree indexes can be used for equality and range queries efficiently.
safe, so rarely used.
values to one row, whereas B-Tree indexes are optimized for when a row has a single key value. GINs are good for indexing array values as well as for implementing full-text search.
balanced tree structures, and can be used for operations beyond equality and range comparisons. They are used to index the geometric data types, as well as full-text search. Ideal for smaller sets of records.
CREATE INDEX defects_sources_idx ON defects(source_id); CREATE INDEX users_names_idx ON users(last_name, first_name);
selects
CREATE INDEX defects_faculty_idx ON defects(reporter_id) WHERE reporter_id = 1 or reporter_id = 2
CREATE INDEX users_lastname_idx ON users(lower(last_name));
CREATE INDEX articles_day ON articles(date(published_at)); can be used by a query containing WHERE date(articles.created_at) = date('2011-03-07')
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE summary ILIKE '%concept%'; QUERY PLAN
time=0.028..6615.781 rows=48375 loops=1) Filter: ((summary)::text ~~* '%concept%'::text) Rows Removed by Filter: 4951625 Total runtime: 6620.541 ms (4 rows)
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE to_tsvector('english',summary) @@ to_tsquery('english','concept'); QUERY PLAN
time=0.071..47642.888 rows=48375 loops=1) Filter: (to_tsvector('english'::regconfig, (summary)::text) @@ '''concept'''::tsquery) Rows Removed by Filter: 4951625 Total runtime: 47650.170 ms (4 rows)
bugs_311_fts=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE vector_summary @@ to_tsquery('english','concept'); QUERY PLAN
(actual time=19.171..364.768 rows=48375 loops=1) Recheck Cond: (vector_summary @@ '''concept'''::tsquery) Rows Removed by Index Recheck: 709900
rows=57346 width=0) (actual time=17.547..17.547 rows=48375 loops=1) Index Cond: (vector_summary @@ '''concept'''::tsquery) Total runtime: 367.295 ms (6 rows)
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary, details FROM defects WHERE summary ILIKE 'SQL' OR details ILIKE ‘SQL'; QUERY PLAN
time=23861.346..23861.346 rows=0 loops=1) Filter: (((summary)::text ~~* 'SQL'::text) OR (details ~~* 'SQL'::text)) Rows Removed by Filter: 5000000 Total runtime: 23861.371 ms (4 rows)
bugs_311_fts=# explain analyze SELECT id, summary, details FROM defects WHERE vector_combined @@ to_tsquery(‘english','SQL'); QUERY PLAN
(actual time=156.776..2009.117 rows=238307 loops=1) Recheck Cond: (vector_combined @@ '''sql'''::tsquery) Rows Removed by Index Recheck: 1771575
rows=51734 width=0) (actual time=155.238..155.238 rows=238307 loops=1) Index Cond: (vector_combined @@ '''sql'''::tsquery) Total runtime: 2026.007 ms (6 rows)