Data Management Systems
Fall Semester 2020 Content Introduction
1
Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich
Course Introduction
Data Management Systems Fall Semester 2020 Gustavo Alonso - - PowerPoint PPT Presentation
Data Management Systems Fall Semester 2020 Gustavo Alonso Institute of Computing Platforms Content Introduction Department of Computer Science ETH Zrich Course Introduction 1 Course contents Intro & basics Storage Access
Fall Semester 2020 Content Introduction
1
Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich
Course Introduction
2 Course Introduction
applications
scientist or as a scientist dealing with data
3 Course Introduction
applications.
consistency) to the database rather than to the program
4 Course Introduction
5 Course Introduction
databases)
forms of data management
6 Course Introduction
From Oracle M7 documentation
7 Course Introduction
8 Course Introduction
9 Course Introduction
Physical data independence
data is actually stored persistently and represented in memory
relations and a schema
schema is the engine’s problem Logical data independence
views over the schema so that different logical interpretations
views and the engine is responsible for maintaining such views
10 Course Introduction
Language) specifies how the result looks like and describes the tuples that should be part of the result
result (no control flow)
to optimize queries
11
TPC-H Query 1 SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM WHERE L_SHIPDATE <= dateadd(dd, -90, cast('1998-12-01' as date)) GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG,L_LINESTATUS
Course Introduction
prove the equivalence of certain transformations.
transformed into an operator tree (a plan), the plan can be
information, heuristics, cost estimations, etc.
involved (hints allowed)
12 Course Introduction
data:
another table)
letting the engine enforce these rules at run time. These rules are declarative.
13 Course Introduction
can be seen using queries
14 Course Introduction
a consistent database
applications/users are also modifying the data
modified and it also ensures your changes make it to the database even when others are also reading and writing the same data
fake news from companies when they do not know how to do it)
15 Course Introduction
when failures occur
consistent stage before shutting down
made, keeping track of all modifications to the data or the system
16 Course Introduction
a wide variety of features
extension and deployments
highly reliable (code tested by very many users)
reliability and hugely optimized code
benchmark in functionality as well as performance
17 Course Introduction
18 Course Introduction
19
Physical storage Blocks, files, segments Pages in memory Physical records Logical records (tuples) Logical data (tables, schemas) Relations, views Queries, Transactions (SQL) Record Interface Record Access Page access File Access Application Logical view (logical data) Access Paths Physical data in memory Page structure Storage allocation
Course Introduction
20
https://docs.oracle.com/cd/E11882_01/server.112/e40540/pro cess.htm#CNCPT902
Course Introduction
21
https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
Course Introduction
neither disk nor memory
to go to where the data is
where the query is
and to minimize data shipping
22
DB Engine Disk NETWORK DB Engine Disk DB Engine Disk
Course Introduction
nodes
node
23
DB Engine Disk NETWORK DB Engine Disk DB Engine Disk
Course Introduction
24
DB Engine Disk DB Engine Disk DB Engine Disk T T1 T3 T2 SELECT * FROM T WHERE Age > 50 SELECT * FROM T WHERE Age > 50 SELECT * FROM T WHERE Age > 50 SELECT * FROM T WHERE Age > 50 Results Results Merge and answer
Course Introduction
25
DB Engine Disk DB Engine Disk DB Engine Disk R S SELECT * FROM R,S WHERE R.name = S.name R S Data Shipping (S) SELECT * FROM R,S WHERE R.name = S.name Process and answer
Course Introduction
26
DB Engine Disk DB Engine Disk DB Engine Disk R S SELECT * FROM R,S WHERE R.id = S.id R S S SELECT * FROM R,S WHERE R.id = S.id Function shipping
Course Introduction
… but databases deal with all such cases:
the tables involved
27 Course Introduction
is provided over the network merging the memory of each node
programming is much easier (the shared memory abstraction is provided at lower levels)
systems
28
DB Engine Disk NETWORK DB Engine Disk DB Engine Disk Shared memory abstraction
Course Introduction
supercomputer or large scale machines
provided by the hardware and highly specialized networks
large scales without having to change the engine
29 Course Introduction
allows a growing number of processors connected to a large pool of memory in a single machine
machine, connection bottlenecks, need backup, price …)
30 Course Introduction
(typically networks attached storage)
go where the query is
partitioning
31
DB Engine Disk NETWORK DB Engine Disk DB Engine Disk STORAGE SYSTEM
Course Introduction
32 Course Introduction
33
⟕ ⟖
Course Introduction
… is that it can be used to prove equivalence
34 Course Introduction
35
SQL STATEMENT OPERATOR TREE OPTIMIZED OPERATOR TREE COMPILED QUERY INTERPRETER
Course Introduction
for all X in R for all Y in S if X = Y { merge X and Y; return X+Y }
times tuples in S)
tables small (fit in cache)
36
R S
Course Introduction
expensive)
the other
the comparison is just a hash lookup
cache when doing the comparison
37
R S
Course Introduction
range, there is no match
matches exists
comparing ranges where there are no matches
against the index
38 Course Introduction
version of every operator:
39 Course Introduction
40 Course Introduction
Example: transfer CHF 50 from A to B
1. Read balance of A from DB into Variable a: read(A,a); 2. Subtract 50.- CHF from the balance: a:= a – 50; 3. Write new balance back into DB: write(A,a); 4. Read balance of B from DB into Variable b: read(B,b); 5. Add 50,- CHF to balance: b := b + 50; 6. Write new balance back into DB: write(B, b); This set of operations is correct if all of them are completed. In between, the state is inconsistent
=> THIS IS NOT HOW BANKS DO WITHDRAWALS AND DEPOSITS
41 Course Introduction
42
Transaction = ordered sequence of read and writes over the database that either commits or aborts T1 = b1 r1[x] r1[y] w1[x] c1 Basic assumptions:
Consistent state Consistent state
transaction
Course Introduction
43
ATOMICITY: a transaction is executed in its entirety or not at all
Consistent state
transaction CONSISTENCY: a transaction executed in its entirety over a consistent DB produces a consistent DB
Consistent state Consistent state
transaction ISOLATION: a transaction executes as if it were alone in the system
Consistent state Consistent state
T1 T2 DURABILITY: committed changes of a transaction are never lost - can be recovered
Consistent state Consistent state
transaction Persistent storage
Course Introduction
management:
consistency
management of any large data collection (replication, cloud)
Course Introduction 44
these issues
tailored to particular applications and use cases (transactions, analytics, warehouses, etc.)
and affect the architectures
ideas when running in the cloud and data centers
45 Course Introduction