 
              Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program
What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example: Creating an order 1. Query the PRODUCTS table to ensure that the product is in stock, 2. Insert the order into the ORDERS table 3. Update the PRODUCTS table, subtracting the quantity ordered from the quantity-on-hand of the product 4. Update the SALESREPS table, adding the order amount to the total sales of the salesperson who took the order, and finally... 5. Update the OFFICES table, adding the order amount to the total sales of the o ffi ce where the salesperson works
SELECT * FROM SELECT * FROM SELECT * FROM products; products; products; INSERT INTO INSERT INTO INSERT INTO orders; orders; orders; POWER UPDATE UPDATE OUTAGE products; products; ERROR IN UPDATE UPDATE salesreps; salesreps; UPDATE offices; unchanged unchanged
The ACID Principle How do we ensure that database transactions are reliably processed? Four words: A tomicity C onsistency I solation D urability
Atomicity A transaction is an atomic unit of processing, and it either has to be performed in its entirety or not at all Atomicity must be guaranteed in any and all circumstances : user error, power failure, system failure and the like Example of failure of atomicity in the order processing example: - inserting new order but being unable to update product inventory records What’s the key idea? Rollback ability
Consistency A successful execution of a transaction must take a consistent database state to a (new) consistent database state With various constraints, cascades and triggers in e ff ect, writing new data in one location will be consistent with other data in the database Example of failure of consistency in the order processing example: - totals of inventory + sales after the transaction doesn’t match totals before What’s the key idea? Integrity constraints
Isolation A transaction must not make its modifications visible to other transactions until it is committed, i.e., each transaction is unaware of other transactions executing concurrently in the system If two users are trying to write to the database at the same time, one must wait until the other is finished to maintain isolation If there are concurrent transactions, the results should be the same as if transactions are executed serially, i.e. one after the other What’s the key idea? Concurrency Control
Durability Once a transaction has committed its changes, these changes must never get lost due to subsequent (system) failures Example of failure of durability in the order processing example: - after an order is executed, the results are saved to a session or added to a bu ff er to be saved later to the database, but then power goes out and this data is lost. What’s the key idea? Recovery
Basic transaction syntax BEGIN; UPDATE heroes SET name = 'The Tick' WHERE hero_id = 14; SAVEPOINT first_leg; UPDATE heroes SET age = 42; -- did't mean to change age of everyone to 42... ROLLBACK TO first_leg; UPDATE heroes SET age = 42 WHERE hero_id = 14; COMMIT;
Transactions in action
Things you can’t rollback • CREATE DATABASE • CREATE INDEX • ALTER DATABASE • DROP INDEX • DROP DATABASE • CREATE EVENT • CREATE TABLE • DROP EVENT • ALTER TABLE • CREATE FUNCTION • DROP TABLE • DROP FUNCTION • RENAME TABLE • CREATE PROCEDURE • TRUNCATE TABLE • DROP PROCEDURE
Database locking Key to transaction processing is the concept of database locking Shared lock is used by the DBMS when a transaction wants to read data from the database. Another concurrent transaction can also acquire a shared lock on the same data, allowing the other transaction to also read the data. Exclusive lock is used by the DBMS when a transaction wants to update data in the database. When a transaction has an exclusive lock on some data, other transactions cannot acquire any type of lock (shared or exclusive) on the data.
Multiversion concurrency control (MVCC) MVCC provides each user connected to the database with a snapshot of the database for that person to work with Changes made will not be seen by other users of the database until the transaction has been committed Avoids managing locks for read transactions because writes can be isolated by virtue of the old versions being maintained Writes a ff ect future version but at the transaction ID that the read is working at, everything is guaranteed to be consistent because the writes are occurring at a later transaction ID
Lock conflicts
Types of locks • Can lock entire tables or just individual rows • Can lock particular rows within a transaction: SELECT * FROM heroes WHERE hero_id=14 FOR UPDATE;
Note: the plural of ‘index’ is traditionally ‘indices’, however today ‘indexes’ is considered appropriate and is the norm among database professionals.
The trouble with indexes • No indexing at all “Indexes add overhead and slow down inserts and updates” • Index shotgun “Index every field to make the database superfast to query” • Finding the middle ground
Types of indexes • B-Tree -- the default that you get in postgres. The B stands for Balanced, and the idea is that the amount of data on both sides of the tree is roughly the same so the number of levels that must be traversed to find rows is always in the same ballpark. B-Tree indexes can be used for equality and range queries e ffi ciently. • Hash Indexes -- only useful for equality comparisons, but are not transaction safe, so rarely used. • Generalized Inverted Indexes (GIN) -- useful when an index must map many values to one row, whereas B-Tree indexes are optimized for when a row has a single key value. GINs are good for indexing array values as well as for implementing full-text search. • Generalized Search Tree (GiST) Indexes -- allow you to build general balanced tree structures, and can be used for operations beyond equality and range comparisons. They are used to index the geometric data types, as well as full-text search. Ideal for smaller sets of records.
Creating indexes • Example: CREATE INDEX defects_sources_idx ON defects(source_id); CREATE INDEX users_names_idx ON users(last_name, first_name); • Creating index can take time depending on the number of records • Creating indexes adds time to insert and update commands, but saves time on selects • Query optimizer will determine if index exists that will help speed up query
Partial indexes • Covers just a subset of the table’s data • Essentially a index with a where clause • Example: CREATE INDEX defects_faculty_idx ON defects(reporter_id) WHERE reporter_id = 1 or reporter_id = 2
Expression indexes • Can create an index which uses a function • Common example CREATE INDEX users_lastname_idx ON users(lower(last_name)); • Another example with dates CREATE INDEX articles_day ON articles(date(published_at)); can be used by a query containing WHERE date(articles.created_at) = date('2011-03-07')
What to index • Unique fields • Foreign keys • Queries commonly run • Fields commonly combined
Full-text indexing and searching
Performance
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE summary ILIKE '%concept%'; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..236670.88 rows=50153 width=38) (actual time=0.028..6615.781 rows=48375 loops=1) Filter: ((summary)::text ~~* '%concept%'::text) Rows Removed by Filter: 4951625 Total runtime: 6620.541 ms (4 rows)
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE to_tsvector('english',summary) @@ to_tsquery('english','concept'); QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..249168.65 rows=24996 width=38) (actual time=0.071..47642.888 rows=48375 loops=1) Filter: (to_tsvector('english'::regconfig, (summary)::text) @@ '''concept'''::tsquery) Rows Removed by Filter: 4951625 Total runtime: 47650.170 ms (4 rows)
bugs_311_fts=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE vector_summary @@ to_tsquery('english','concept'); QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on defects (cost=8048.43..164050.73 rows=57346 width=38) (actual time=19.171..364.768 rows=48375 loops=1) Recheck Cond: (vector_summary @@ '''concept'''::tsquery) Rows Removed by Index Recheck: 709900 -> Bitmap Index Scan on defects_fti_vector_summary (cost=0.00..8034.10 rows=57346 width=0) (actual time=17.547..17.547 rows=48375 loops=1) Index Cond: (vector_summary @@ '''concept'''::tsquery) Total runtime: 367.295 ms (6 rows)
What about searching two fields?
bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary, details FROM defects WHERE summary ILIKE 'SQL' OR details ILIKE ‘SQL'; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..249168.65 rows=828 width=200) (actual time=23861.346..23861.346 rows=0 loops=1) Filter: (((summary)::text ~~* 'SQL'::text) OR (details ~~* 'SQL'::text)) Rows Removed by Filter: 5000000 Total runtime: 23861.371 ms (4 rows)
Recommend
More recommend