EDBT Summer School Database Performance Pat & Betty (Elizabeth) - PowerPoint PPT Presentation

EDBT Summer School Database Performance Pat & Betty (Elizabeth) O’Neil Sept. 6, 2007

Database Performance: Outline  Here is an outline of our two-lecture course  First, we explain why OLTP performance is no longer thought to be an important problem by most researchers and developers  But we’ll introduce a new popular Transactional product called Snapshot Isolation that supports updates & concurrent queries  Then we look at queries, especially Data Warehouse queries, which get the most performance attention currently  We explain early query performance concepts, e.g. index Filter Factor, then look at cubical clustering, the new crucial factor  Data Warehousing is where most new purchases are made at present, and we examine the area in detail 2

DB Performance: OLTP  Jim Gray said years ago: OLTP is a solved problem  Older established companies that do OLTP Order Entry have solutions in place, efficient enough not to need change  One reason is that change is very difficult: rewriting a company’s OLTP programs is called the “Dusty Decks” problem  Nobody remembers how all the code works! ฀ Also, the cost of hardware to support update transactions  compared to programming costs, or even terminal costs, is low  We explain this in the slides immediately following  But some new installations use new transactional approaches that better support queries, such as Snapshot Isolation  We explain Snapshot Isolation a bit later (starting on Slide 7) 3

OLTP DebitCredit Benchmark  Anon [Jim Gray] et al. paper (in Datamation, 1985) introduced the DebitCredit benchmark. Bank Transaction tables: Acct, Teller, Brnch, History, with 100 byte rows Read 100 bytes from terminal: aid, tid, bid, delta Begin Transaction; Update Accnt set Accnt_bal = Accnt_bal +:delta where Accnt_ID = :aid; Insert to History values (:aid,:tid, :bid, :delta, Time_stamp); Update Teller set Teller_bal = Teller_bal + :delta where Teller_ID = :tid; Update Brnch set Brnch_bal = Brnch_bal+:delta where Brnch_ID = :bid; Commit; Write 200 bytes to terminal including: aid, tid, bid, delta, Accnt_bal Transactions per second (tps) scaled with size of tables: each tps had 100,000 Accnts, 10 Brnches, 100 Tellers; History held 90 days of inserts 1 History of DebitCredit/TPC-A in Jim Gray’s “The Benchmark Handbook”, Chapter 2. 4

OLTP DebitCredit Benchmark  DebitCredit restrictions; performance features tested  Terminal request response time required 95% under 1 second; throughput driver had to be titrated to give needed response  Think time was 100 secs (after response): meant much context had to be held in memory for terminals (10,000 for 100 tps)  Terminal simulators were used; Variant TP1 benchmark drove transactions by software loop generation with trivial think time  Mirrored Tx logs required to provide guaranteed recovery  History Inserts (50 byte rows) costly if pages locked  System had to be priced, measured in $/tps 5

OLTP DebitCredit/TPC-A Benchmark  Vendors cheated when running DebitCredit  Jim Gray welcomed new Transaction Processing Performance Council (TPC), consortium of vendors, created TPC-A 1 in 1989  Change from DebitCredit: measurement rules were carefully specified and auditors were required for results to be published  Detail changes:  For each 1 tps, only one Branch instead of 10: meant concurrency would be tougher: important to update Branch LAST  Instead of 100 sec Think Time, used 10 sec. Unrealistic, but reason was that TERMINAL PRICES WERE TOO HIGH WITH 100 SEC! Terminal prices were more than all other costs for benchmark Even in 1989, transactions were cheap to run! 1 History of DebitCredit/TPC-A in Jim Gray’s “The Benchmark Handbook”, Chapter 2. In ACM SIGMOD Anthology, http://www.sigmod.org/dblp/db/books/collections/gray93.html 6

DB Performance: OLTP & Queries  Amazon doesn’t use OLTP of any DBMS to enter purchases in a customer’s shopping cart (does for dollar transactions though)  Each purchase is saved using a PC file system, with mirroring & replication for failover in case of node drop-out: log not needed  Classical OLTP not needed because only one purchase saved at a time on a single-user shopping cart  There is one new approach to transactional isolation adopted for some new applications, mainly used to support faster queries  It is called “Snapshot Isolation” 1 (SI), and it has been adopted in both the Oracle and Microsoft SQL Server database products  SI is not fully Serializable, so it is officially an Isolation Level, but anomalies are rare and Queries aren’t blocked by Update Tx’s 1 First published in SIGMOD 2005. A Critique of ANSI Isolation Levels . by Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Pat O’Neil and Elizabeth O’Neil 7

DB Performance: Snapshot Isolation (SI)  A transaction T i executing under Snapshot Isolation (SI) reads only committed data as of the time Start(T i )  This doesn’t work by taking a “snapshot” of all data when new Tx starts, but by taking a timestamp and keeping old versions of data when new versions are written: Histories in SI read & write “versions” of data items  Data read, including both row values and indexes, time travels to Start(T i ), so predicates read by T i also time travel to Start(T i )  A transaction T i in SI also maintains data it has written in a local cache, so if it rereads this data, it will read what it wrote  No T k concurrent with T i (overlapping lifetimes) can read what T i has written (can only read data committed as of Start(T k ))  If concurrent transactions both update same data item A, then second one trying to commit will abort (First committer wins rule) 8

DB Performance: Snapshot Isolation (SI)  SI Histories read and write Versions of data items: (X0, X1, X2); X0 is value before updates, X2 would be written by T 2 , X7 by T 7  Standard Tx’l anomalies of non-SR behavior don’t occur in SI  Lost Update anomaly in Isolation Level READ COMMITTED (READ COMMITTED is Default Isolation Level on all DB products) : (I’m underlining T 2 ’s operations just to group them visually) H1: R 1 (X,50) R 2 (X,50) W 2 (X,70) C 2 W 1 (X,60) C 1 (X should be 80)  In SI, First Committer Wins property prevents this anomaly: H1 SI : R 1 (X0,50) R 2 (X0,50) W 2 (X2,70) C 2 W 1 (X1,60) A 1 Inconsistent View anomaly possible in READ COMMITTED:  H2: R 1 (X,50) R 2 (X,50) W 2 (X,70) R 2 (Y,50) W 2 (Y,30) C 2 R 1 (Y,30) C 1  Here T1 sees X+Y = 80; In SI, Snapshot Read avoids this anomaly H2 SI : R 1 (X0,50) R 2 (X0,50) W 2 (X2,70) R 2 (Y0,50) W 2 (Y2,30) C 2 R 1 (Y0,50) C 1 (T 1 reads sum of X+Y = 100, valid at Start(T 1 ) 9

DB Performance: Snapshot Isolation  In SI, READ-ONLY T 1 never has to wait for Update T 2 since T 1 reads consistent committed data at Start(T 1 )  Customers LIKE this property!  SI was adopted by Oracle to avoid implementing DB2 Key Value Locking that prevents predicate anomalies  In SI, new row insert by T 2 into predicate read by T 1 won’t be accessed by T 1 (since index entries have versions as well)  Customers using Oracle asked Microsoft to adopt SI as well, so Readers wouldn’t have to WAIT 10

DB Performance: Snapshot Isolation  Oracle calls the SI Isolation Level “SERIALIZABLE”  SI is not truly SERIALIZABLE (SR); anomalies exist  Assume husband and wife have bank account balances A and B with starting values 100; Bank allows withdrawals from these accounts to bring balances A or B negative as long as A+B > 0  This is a weird constraint; banks probably wouldn’t impose it  In SI can see following history: H3: R 1 (A0,100) R 1 (B0,100) R 2 (A0,100) R 2 (B0,100) W 2 (A2,-30) C 2 W 1 (B1,-30) C 1 Both T 1 and T 2 expect total balance will end positive, but it goes negative because each Tx has value it read changed by other Tx This is called a “Skew Writes” anomaly: weird constraint fails 11

Snapshot Isolation Skew-Write Anomaly R 1 (A0,100) R 1 (B0,100) R 2 (A0,100) R 2 (B0,100) W 2 (A2,-30) C 2 W 1 (B1,-30) C 1 R 1 (A0,100) R 1 (B0,100) W 1 (B1,-30) C 1 T1 Conflict cycle allowed by SI T2 R 2 (A0,100) R 2 (B0,100) W 2 (A2,-30) C 2 Time-->

DB Performance: Snapshot Isolation  Here is a Skew Write anomaly with no pre-existing constraint  Assume T 1 copies data item X to Y and T 2 copies Y to X  R 1 (X0,10) R 2 (Y0,20) W 1 (Y1,10) C 1 W 2 (X2,20) C 2  Concurrent T 1 & T 2 only exchange values of X and Y But any serial execution of T 1 & T 2 would end with X and Y identical, so this history is not SR  Once again we see an example of skew writes: each concurrent Tx depends on value that gets changed  The rule (constraint?) broken is quite odd, and unavoidable: it is one that comes into existence only as a post-condition! 13

EDBT Summer School Database Performance Pat & Betty (Elizabeth) - PowerPoint PPT Presentation

EDBT Summer School Database Performance Pat & Betty (Elizabeth) ONeil Sept. 6, 2007 Database Performance: Outline Here is an outline of our two-lecture course First, we explain why OLTP performance is no longer thought to be

Subjective Databases: Enabling Search by Experience Wang-Chiew Tan Megagon Labs EDBT 2019

FILM RESTORATION SUMMER SCHOOL / FIAF SUMMER SCHOOL 2009 FILM RESTORATION SUMMER SCHOOL / FIAF

SUMMER BRAIN GAIN: REIMAGINING SUMMER LEARNING What is the problem? Why Summer Matters There is

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Summer Salary MARCH 20, 2019 Todays Agenda What is Summer Salary? Key Considerations

Proposed Project Schedules Bond Summer Summer Summer Summer Commitment 2007 2008 2009

Summer School and Summer Programs 2014 District Effectiveness Report Summer School Locations:

Summer Reading Summer Reading 12th Grade 12th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 9th Grade 9th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Partners: Summer Village of Birchcliff Town of Sylvan Lake Summer Village of Half Moon Bay

ACPSD SUMMER LEARNING 2019 INSTRUCTIONAL SERVICES DIVISION Goal for Summer Learning: To

Summer Reading Summer Reading 11th Grade 11th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 10th Grade 10th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Lunch Program Summer Lunch Presentation for Food Security Task Force 12/4/19 1 Agenda

Farm to Summer Eats Bringing Fresh, Local Foods to Summer Meals Workshop Goals Understand

STUDY IN FRANCE SUMMER 2018 SUMMER STUDY ABROAD PROGRAM AT PURPAN Open to students from SUMMER

1 What c an you affor d? Do yo u wa nt to g ive yo ur e mplo ye e s ra ise s? L a yo

CAS CS 460/660 Introduction to Database Systems Query Optimization II 1.1 Review

Cloud Analytics Data Warehousing Marco Serafini COMPSCI 532 Lecture 18 Trivia How does

Purchasing Local Food from a Broadline Distributor Abby Harper Farm to School Specialist Center

Blockchain Programming Languages Tur uring ing complet plete e vs. s. non-Turing uring

Clean Architecture Clean Architecture in Python in Python Sebastian Buczyski Sebastian

Linda, JavaSpaces & Jini Manish Parashar parashar@ece.rutgers.edu Department of Electrical

YOUTH HOMELESSNESS IN CALIFORNIA What Impact has the Five Percent Youth SetAside in the

EDBT Summer School Database Performance Pat & Betty (Elizabeth) - PowerPoint PPT Presentation

EDBT Summer School Database Performance Pat & Betty (Elizabeth) ONeil Sept. 6, 2007 Database Performance: Outline Here is an outline of our two-lecture course First, we explain why OLTP performance is no longer thought to be

Subjective Databases: Enabling Search by Experience Wang-Chiew Tan Megagon Labs EDBT 2019

FILM RESTORATION SUMMER SCHOOL / FIAF SUMMER SCHOOL 2009 FILM RESTORATION SUMMER SCHOOL / FIAF

SUMMER BRAIN GAIN: REIMAGINING SUMMER LEARNING What is the problem? Why Summer Matters There is

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Summer Salary MARCH 20, 2019 Todays Agenda What is Summer Salary? Key Considerations

Proposed Project Schedules Bond Summer Summer Summer Summer Commitment 2007 2008 2009

Summer School and Summer Programs 2014 District Effectiveness Report Summer School Locations:

Summer Reading Summer Reading 12th Grade 12th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 9th Grade 9th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Partners: Summer Village of Birchcliff Town of Sylvan Lake Summer Village of Half Moon Bay

ACPSD SUMMER LEARNING 2019 INSTRUCTIONAL SERVICES DIVISION Goal for Summer Learning: To

Summer Reading Summer Reading 11th Grade 11th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Reading Summer Reading 10th Grade 10th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Summer Lunch Program Summer Lunch Presentation for Food Security Task Force 12/4/19 1 Agenda

Farm to Summer Eats Bringing Fresh, Local Foods to Summer Meals Workshop Goals Understand

STUDY IN FRANCE SUMMER 2018 SUMMER STUDY ABROAD PROGRAM AT PURPAN Open to students from SUMMER

1 What c an you affor d? Do yo u wa nt to g ive yo ur e mplo ye e s ra ise s? L a yo

CAS CS 460/660 Introduction to Database Systems Query Optimization II 1.1 Review

Cloud Analytics Data Warehousing Marco Serafini COMPSCI 532 Lecture 18 Trivia How does

Purchasing Local Food from a Broadline Distributor Abby Harper Farm to School Specialist Center

Blockchain Programming Languages Tur uring ing complet plete e vs. s. non-Turing uring

Clean Architecture Clean Architecture in Python in Python Sebastian Buczyski Sebastian

Linda, JavaSpaces &amp; Jini Manish Parashar parashar@ece.rutgers.edu Department of Electrical

YOUTH HOMELESSNESS IN CALIFORNIA What Impact has the Five Percent Youth SetAside in the

Linda, JavaSpaces & Jini Manish Parashar parashar@ece.rutgers.edu Department of Electrical