Transactions & Indexing Professor Larry Heimann Carnegie Mellon - - PowerPoint PPT Presentation

transactions indexing
SMART_READER_LITE
LIVE PREVIEW

Transactions & Indexing Professor Larry Heimann Carnegie Mellon - - PowerPoint PPT Presentation

Transactions & Indexing Professor Larry Heimann Carnegie Mellon University Information Systems Program What is a transaction? A transaction is a sequence of one or more SQL statements that together form a logical unit of work Example:


slide-1
SLIDE 1

Transactions & Indexing

Professor Larry Heimann Carnegie Mellon University Information Systems Program

slide-2
SLIDE 2

What is a transaction?

A transaction is a sequence of one or more SQL statements that together form a logical unit of work

Example: Creating an order

  • 1. Query the PRODUCTS table to ensure that the product is in stock,
  • 2. Insert the order into the ORDERS table
  • 3. Update the PRODUCTS table, subtracting the quantity ordered from the

quantity-on-hand of the product

  • 4. Update the SALESREPS table, adding the order amount to the total sales of the

salesperson who took the order, and finally...

  • 5. Update the OFFICES table, adding the order amount to the total sales of the
  • ffice where the salesperson works
slide-3
SLIDE 3

SELECT * FROM products; INSERT INTO

  • rders;

UPDATE products; UPDATE salesreps; UPDATE offices; SELECT * FROM products; INSERT INTO

  • rders;

UPDATE products; ERROR IN UPDATE salesreps;

unchanged

SELECT * FROM products; INSERT INTO

  • rders;

POWER OUTAGE

unchanged

slide-4
SLIDE 4

The ACID Principle

How do we ensure that database transactions are reliably processed? Four words:

Atomicity Consistency Isolation Durability

slide-5
SLIDE 5

Atomicity

A transaction is an atomic unit of processing, and it either has to be performed in its entirety or not at all Atomicity must be guaranteed in any and all circumstances: user error, power failure, system failure and the like Example of failure of atomicity in the order processing example:

  • inserting new order but being unable to update product inventory records

What’s the key idea? Rollback ability

slide-6
SLIDE 6

Consistency

A successful execution of a transaction must take a consistent database state to a (new) consistent database state With various constraints, cascades and triggers in effect, writing new data in one location will be consistent with other data in the database Example of failure of consistency in the order processing example:

  • totals of inventory + sales after the transaction doesn’t match totals before

What’s the key idea? Integrity constraints

slide-7
SLIDE 7

Isolation

A transaction must not make its modifications visible to other transactions until it is committed, i.e., each transaction is unaware of other transactions executing concurrently in the system If two users are trying to write to the database at the same time, one must wait until the

  • ther is finished to maintain isolation

If there are concurrent transactions, the results should be the same as if transactions are executed serially, i.e. one after the other

What’s the key idea? Concurrency Control

slide-8
SLIDE 8

Durability

Once a transaction has committed its changes, these changes must never get lost due to subsequent (system) failures Example of failure of durability in the order processing example:

  • after an order is executed, the results are saved to a session or added to a buffer to

be saved later to the database, but then power goes out and this data is lost.

What’s the key idea? Recovery

slide-9
SLIDE 9

Basic transaction syntax

BEGIN; UPDATE heroes SET name = 'The Tick' WHERE hero_id = 14; SAVEPOINT first_leg; UPDATE heroes SET age = 42;

  • - did't mean to change age of everyone to 42...

ROLLBACK TO first_leg; UPDATE heroes SET age = 42 WHERE hero_id = 14; COMMIT;

slide-10
SLIDE 10

Transactions in action

slide-11
SLIDE 11

Things you can’t rollback

  • CREATE DATABASE
  • ALTER DATABASE
  • DROP DATABASE
  • CREATE TABLE
  • ALTER TABLE
  • DROP TABLE
  • RENAME TABLE
  • TRUNCATE TABLE
  • CREATE INDEX
  • DROP INDEX
  • CREATE EVENT
  • DROP EVENT
  • CREATE FUNCTION
  • DROP FUNCTION
  • CREATE PROCEDURE
  • DROP PROCEDURE
slide-12
SLIDE 12

Database locking

Key to transaction processing is the concept of database locking Shared lock is used by the DBMS when a transaction wants to read data from the database. Another concurrent transaction can also acquire a shared lock on the same data, allowing the other transaction to also read the data. Exclusive lock is used by the DBMS when a transaction wants to update data in the database. When a transaction has an exclusive lock on some data, other transactions cannot acquire any type of lock (shared or exclusive)

  • n the data.
slide-13
SLIDE 13

Multiversion concurrency control (MVCC)

MVCC provides each user connected to the database with a snapshot of the database for that person to work with Changes made will not be seen by other users of the database until the transaction has been committed Avoids managing locks for read transactions because writes can be isolated by virtue of the old versions being maintained Writes affect future version but at the transaction ID that the read is working at, everything is guaranteed to be consistent because the writes are occurring at a later transaction ID

slide-14
SLIDE 14

Lock conflicts

slide-15
SLIDE 15

Types of locks

  • Can lock entire tables or just individual rows
  • Can lock particular rows within a transaction:

SELECT * FROM heroes WHERE hero_id=14 FOR UPDATE;

slide-16
SLIDE 16
slide-17
SLIDE 17

Note: the plural of ‘index’ is traditionally ‘indices’, however today ‘indexes’ is considered appropriate and is the norm among database professionals.

slide-18
SLIDE 18

The trouble with indexes

  • No indexing at all

“Indexes add overhead and slow down inserts and updates”

  • Index shotgun

“Index every field to make the database superfast to query”

  • Finding the middle ground
slide-19
SLIDE 19

Types of indexes

  • B-Tree -- the default that you get in postgres. The B stands for Balanced, and

the idea is that the amount of data on both sides of the tree is roughly the same so the number of levels that must be traversed to find rows is always in the same ballpark. B-Tree indexes can be used for equality and range queries efficiently.

  • Hash Indexes -- only useful for equality comparisons, but are not transaction

safe, so rarely used.

  • Generalized Inverted Indexes (GIN) -- useful when an index must map many

values to one row, whereas B-Tree indexes are optimized for when a row has a single key value. GINs are good for indexing array values as well as for implementing full-text search.

  • Generalized Search Tree (GiST) Indexes -- allow you to build general

balanced tree structures, and can be used for operations beyond equality and range comparisons. They are used to index the geometric data types, as well as full-text search. Ideal for smaller sets of records.

slide-20
SLIDE 20

Creating indexes

  • Example:

CREATE INDEX defects_sources_idx ON defects(source_id); CREATE INDEX users_names_idx ON users(last_name, first_name);

  • Creating index can take time depending on the number of records
  • Creating indexes adds time to insert and update commands, but saves time on

selects

  • Query optimizer will determine if index exists that will help speed up query
slide-21
SLIDE 21

Partial indexes

  • Covers just a subset of the table’s data
  • Essentially a index with a where clause
  • Example:

CREATE INDEX defects_faculty_idx ON defects(reporter_id) WHERE reporter_id = 1 or reporter_id = 2

slide-22
SLIDE 22

Expression indexes

  • Can create an index which uses a function
  • Common example

CREATE INDEX users_lastname_idx ON users(lower(last_name));

  • Another example with dates

CREATE INDEX articles_day ON articles(date(published_at)); can be used by a query containing WHERE date(articles.created_at) = date('2011-03-07')

slide-23
SLIDE 23

What to index

  • Unique fields
  • Foreign keys
  • Queries commonly run
  • Fields commonly combined
slide-24
SLIDE 24

Full-text indexing and searching

slide-25
SLIDE 25

Performance

slide-26
SLIDE 26

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE summary ILIKE '%concept%'; QUERY PLAN

  • Seq Scan on defects (cost=0.00..236670.88 rows=50153 width=38) (actual

time=0.028..6615.781 rows=48375 loops=1) Filter: ((summary)::text ~~* '%concept%'::text) Rows Removed by Filter: 4951625 Total runtime: 6620.541 ms (4 rows)

slide-27
SLIDE 27

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE to_tsvector('english',summary) @@ to_tsquery('english','concept'); QUERY PLAN

  • Seq Scan on defects (cost=0.00..249168.65 rows=24996 width=38) (actual

time=0.071..47642.888 rows=48375 loops=1) Filter: (to_tsvector('english'::regconfig, (summary)::text) @@ '''concept'''::tsquery) Rows Removed by Filter: 4951625 Total runtime: 47650.170 ms (4 rows)

slide-28
SLIDE 28

bugs_311_fts=# EXPLAIN ANALYZE SELECT id, summary FROM defects WHERE vector_summary @@ to_tsquery('english','concept'); QUERY PLAN

  • Bitmap Heap Scan on defects (cost=8048.43..164050.73 rows=57346 width=38)

(actual time=19.171..364.768 rows=48375 loops=1) Recheck Cond: (vector_summary @@ '''concept'''::tsquery) Rows Removed by Index Recheck: 709900

  • > Bitmap Index Scan on defects_fti_vector_summary (cost=0.00..8034.10

rows=57346 width=0) (actual time=17.547..17.547 rows=48375 loops=1) Index Cond: (vector_summary @@ '''concept'''::tsquery) Total runtime: 367.295 ms (6 rows)

slide-29
SLIDE 29

What about searching two fields?

slide-30
SLIDE 30

bugs_311_mega=# EXPLAIN ANALYZE SELECT id, summary, details FROM defects WHERE summary ILIKE 'SQL' OR details ILIKE ‘SQL'; QUERY PLAN

  • Seq Scan on defects (cost=0.00..249168.65 rows=828 width=200) (actual

time=23861.346..23861.346 rows=0 loops=1) Filter: (((summary)::text ~~* 'SQL'::text) OR (details ~~* 'SQL'::text)) Rows Removed by Filter: 5000000 Total runtime: 23861.371 ms (4 rows)

slide-31
SLIDE 31

bugs_311_fts=# explain analyze SELECT id, summary, details FROM defects WHERE vector_combined @@ to_tsquery(‘english','SQL'); QUERY PLAN

  • Bitmap Heap Scan on defects (cost=8128.94..181504.23 rows=51734 width=200)

(actual time=156.776..2009.117 rows=238307 loops=1) Recheck Cond: (vector_combined @@ '''sql'''::tsquery) Rows Removed by Index Recheck: 1771575

  • > Bitmap Index Scan on defects_fti_vector_combines (cost=0.00..8116.01

rows=51734 width=0) (actual time=155.238..155.238 rows=238307 loops=1) Index Cond: (vector_combined @@ '''sql'''::tsquery) Total runtime: 2026.007 ms (6 rows)