Transactions & Indexing Professor Larry Heimann Carnegie Mellon - PowerPoint PPT Presentation

Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program

What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example: Creating an order 1. Query the PRODUCTS table to ensure that the product is in stock, 2. Insert the order into the ORDERS table 3. Update the PRODUCTS table, subtracting the quantity ordered from the quantity-on-hand of the product 4. Update the SALESREPS table, adding the order amount to the total sales of the salesperson who took the order, and finally... 5. Update the OFFICES table, adding the order amount to the total sales of the o ffi ce where the salesperson works

SELECT * FROM SELECT * FROM SELECT * FROM products; products; products; INSERT INTO INSERT INTO INSERT INTO orders; orders; orders; POWER UPDATE UPDATE OUTAGE products; products; ERROR IN UPDATE UPDATE salesreps; salesreps; UPDATE offices; unchanged unchanged

The ACID Principle How do we ensure that database transactions are reliably processed? Four words: A tomicity C onsistency I solation D urability

Atomicity A transaction is an atomic unit of processing, and it either has to be performed in its entirety or not at all Atomicity must be guaranteed in any and all circumstances : user error, power failure, system failure and the like Example of failure of atomicity in the order processing example: - inserting new order but being unable to update product inventory records What’s the key idea? Rollback ability

Consistency A successful execution of a transaction must take a consistent database state to a (new) consistent database state With various constraints, cascades and triggers in e ff ect, writing new data in one location will be consistent with other data in the database Example of failure of consistency in the order processing example: - totals of inventory + sales after the transaction doesn’t match totals before What’s the key idea? Integrity constraints

Isolation A transaction must not make its modifications visible to other transactions until it is committed, i.e., each transaction is unaware of other transactions executing concurrently in the system If two users are trying to write to the database at the same time, one must wait until the other is finished to maintain isolation If there are concurrent transactions, the results should be the same as if transactions are executed serially, i.e. one after the other What’s the key idea? Concurrency Control

Durability Once a transaction has committed its changes, these changes must never get lost due to subsequent (system) failures Example of failure of durability in the order processing example: - after an order is executed, the results are saved to a session or added to a bu ff er to be saved later to the database, but then power goes out and this data is lost. What’s the key idea? Recovery

Basic transaction syntax BEGIN; UPDATE heroes SET name = 'The Tick' WHERE hero_id = 14; SAVEPOINT first_leg; UPDATE heroes SET age = 42; -- did't mean to change age of everyone to 42... ROLLBACK TO first_leg; UPDATE heroes SET age = 42 WHERE hero_id = 14; COMMIT;

Transactions in action

Things you can’t rollback • CREATE DATABASE • CREATE INDEX • ALTER DATABASE • DROP INDEX • DROP DATABASE • CREATE EVENT • CREATE TABLE • DROP EVENT • ALTER TABLE • CREATE FUNCTION • DROP TABLE • DROP FUNCTION • RENAME TABLE • CREATE PROCEDURE • TRUNCATE TABLE • DROP PROCEDURE

Database locking Key to transaction processing is the concept of database locking Shared lock is used by the DBMS when a transaction wants to read data from the database. Another concurrent transaction can also acquire a shared lock on the same data, allowing the other transaction to also read the data. Exclusive lock is used by the DBMS when a transaction wants to update data in the database. When a transaction has an exclusive lock on some data, other transactions cannot acquire any type of lock (shared or exclusive) on the data.

Multiversion concurrency control (MVCC) MVCC provides each user connected to the database with a snapshot of the database for that person to work with Changes made will not be seen by other users of the database until the transaction has been committed Avoids managing locks for read transactions because writes can be isolated by virtue of the old versions being maintained Writes a ff ect future version but at the transaction ID that the read is working at, everything is guaranteed to be consistent because the writes are occurring at a later transaction ID

Lock conflicts

Types of locks • Can lock entire tables or just individual rows • Can lock particular rows within a transaction: SELECT * FROM heroes WHERE hero_id=14 FOR UPDATE;

Note: the plural of ‘index’ is traditionally ‘indices’, however today ‘indexes’ is considered appropriate and is the norm among database professionals.

The trouble with indexes • No indexing at all “Indexes add overhead and slow down inserts and updates” • Index shotgun “Index every field to make the database superfast to query” • Finding the middle ground

Types of indexes • B-Tree -- the default that you get in postgres. The B stands for Balanced, and the idea is that the amount of data on both sides of the tree is roughly the same so the number of levels that must be traversed to find rows is always in the same ballpark. B-Tree indexes can be used for equality and range queries e ffi ciently. • Hash Indexes -- only useful for equality comparisons, but are not transaction safe, so rarely used. • Generalized Inverted Indexes (GIN) -- useful when an index must map many values to one row, whereas B-Tree indexes are optimized for when a row has a single key value. GINs are good for indexing array values as well as for implementing full-text search. • Generalized Search Tree (GiST) Indexes -- allow you to build general balanced tree structures, and can be used for operations beyond equality and range comparisons. They are used to index the geometric data types, as well as full-text search. Ideal for smaller sets of records.

Creating indexes • Example: CREATE INDEX defects_sources_idx ON defects(source_id); CREATE INDEX users_names_idx ON users(last_name, first_name); • Creating index can take time depending on the number of records • Creating indexes adds time to insert and update commands, but saves time on selects • Query optimizer will determine if index exists that will help speed up query

Partial indexes • Covers just a subset of the table’s data • Essentially a index with a where clause • Example: CREATE INDEX defects_faculty_idx ON defects(reporter_id) WHERE reporter_id = 1 or reporter_id = 2

Expression indexes • Can create an index which uses a function • Common example CREATE INDEX users_lastname_idx ON users(lower(last_name)); • Another example with dates CREATE INDEX articles_day ON articles(date(published_at)); can be used by a query containing WHERE date(articles.created_at) = date('2011-03-07')

What to index • Unique fields • Foreign keys • Queries commonly run • Fields commonly combined

Full-text indexing and searching

Performance

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE summary ILIKE '%concept%'; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..236670.88 rows=50153 width=38) (actual time=0.028..6615.781 rows=48375 loops=1) Filter: ((summary)::text ~~* '%concept%'::text) Rows Removed by Filter: 4951625 Total runtime: 6620.541 ms (4 rows)

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE to_tsvector('english',summary) @@ to_tsquery('english','concept'); QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..249168.65 rows=24996 width=38) (actual time=0.071..47642.888 rows=48375 loops=1) Filter: (to_tsvector('english'::regconfig, (summary)::text) @@ '''concept'''::tsquery) Rows Removed by Filter: 4951625 Total runtime: 47650.170 ms (4 rows)

bugs_311_fts=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE vector_summary @@ to_tsquery('english','concept'); QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on defects (cost=8048.43..164050.73 rows=57346 width=38) (actual time=19.171..364.768 rows=48375 loops=1) Recheck Cond: (vector_summary @@ '''concept'''::tsquery) Rows Removed by Index Recheck: 709900 -> Bitmap Index Scan on defects_fti_vector_summary (cost=0.00..8034.10 rows=57346 width=0) (actual time=17.547..17.547 rows=48375 loops=1) Index Cond: (vector_summary @@ '''concept'''::tsquery) Total runtime: 367.295 ms (6 rows)

What about searching two fields?

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary, details FROM defects WHERE summary ILIKE 'SQL' OR details ILIKE ‘SQL'; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on defects (cost=0.00..249168.65 rows=828 width=200) (actual time=23861.346..23861.346 rows=0 loops=1) Filter: (((summary)::text ~~* 'SQL'::text) OR (details ~~* 'SQL'::text)) Rows Removed by Filter: 5000000 Total runtime: 23861.371 ms (4 rows)

Transactions & Indexing Professor Larry Heimann Carnegie Mellon - PowerPoint PPT Presentation

Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example:

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

13.1 Introduction 13.2 Transactions 13.3 Nested transactions 13.4 Locks 13.5 Optimistic

Module 15: Managing Transactions and Locks Overview Introduction to Transactions and Locks

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

20 0 6 Transactions $1.01 billion in bonds 18 transactions 20 0 6 Transactions By Num

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Flat and nested distributed Outline transactions Flat and nested distributed transactions

Database Management Objectives of Lecture 7 Systems Transactions Models Transactions Models

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

C++11 atomics for G4atomic Jonathan R. Madsen Texas A&M University What is an atomic? We

Atomicity and Deadlock November 29, 2007 1 Check-then-Act Race public void transfer (Account

Synchronization presented by Radu Teodorescu CS533 Why we need it? Parallel programs share

Decentralize everything! The Future of Bitcoin The block chain as a vehicle for

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

Operating System Principles: Mutual Exclusion and Asynchronous Completion CS 111 Operating

Databases Standard stuff Class webpage: cs.rhodes.edu/db Textbook: get it somewhere; used

Approche Algorithmique des Syst` emes R epartis (AASR) Guillaume Pierre

Transactions & Indexing Professor Larry Heimann Carnegie Mellon - PowerPoint PPT Presentation

Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example:

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

13.1 Introduction 13.2 Transactions 13.3 Nested transactions 13.4 Locks 13.5 Optimistic

Module 15: Managing Transactions and Locks Overview Introduction to Transactions and Locks

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

20 0 6 Transactions $1.01 billion in bonds 18 transactions 20 0 6 Transactions By Num

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Flat and nested distributed Outline transactions Flat and nested distributed transactions

Database Management Objectives of Lecture 7 Systems Transactions Models Transactions Models

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

C++11 atomics for G4atomic Jonathan R. Madsen Texas A&amp;M University What is an atomic? We

Atomicity and Deadlock November 29, 2007 1 Check-then-Act Race public void transfer (Account

Synchronization presented by Radu Teodorescu CS533 Why we need it? Parallel programs share

Decentralize everything! The Future of Bitcoin The block chain as a vehicle for

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

Operating System Principles: Mutual Exclusion and Asynchronous Completion CS 111 Operating

Databases Standard stuff Class webpage: cs.rhodes.edu/db Textbook: get it somewhere; used

Approche Algorithmique des Syst` emes R epartis (AASR) Guillaume Pierre

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

C++11 atomics for G4atomic Jonathan R. Madsen Texas A&M University What is an atomic? We