A N D R E W C R O T T Y & A L E X G A L A K AT O S
C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K - - PowerPoint PPT Presentation
C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K - - PowerPoint PPT Presentation
C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K AT O S O UTLINE RDBMS SQL Row Store Column Store C-Store Vertica MonetDB Hardware Optimizations F ACULTY M EMBER V ERSION E XPERIMENT Question :
OUTLINE
- RDBMS
- SQL
- Row Store
- Column Store
- C-Store
- Vertica
- MonetDB
- Hardware Optimizations
FACULTY MEMBER VERSION
EXPERIMENT
- Question: How does time spent as a faculty
member affect one's attention span?
HYPOTHESIS
< 1 ≈ 10 > 100 Attention Span Time as Faculty (years)
Attention Span vs Time as Faculty
WHAT IS AN RDBMS?
- Relational database management system
- Standard for information storage/retrieval
- Based on Codd's relational model
- Data structured as relations
- Manipulated using relational operators
- Structure specified by a schema
EXAMPLE
id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k
WHAT IS SQL?
- Structured Query Language
- Used to operate on relations
- Four main operations:
- select
- insert
- update
- delete
SELECT STATEMENT
select <column(s)> from <table> where <column> = <val>
EXAMPLE
id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k
ROW STORE
id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k
ROW STORE
- What is the name, department, and salary of the
employee with id 1757?
select name, dept, salary from employee where id = 1757
ROW STORE
id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k
select name, dept, salary from employee where id = 1757
ROW STORE
- What is the average salary of all employees?
select avg(salary) from employee
ROW STORE
id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k
select avg(salary) from employee
ROW STORE
id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k
select avg(salary) from employee
COLUMN STORE
id 1756 1757 1758 dept Physics Math CS salary 50k 60k 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k name Scott Bob John
COLUMN STORE
- What is the average salary of all employees?
select avg(salary) from employee
COLUMN STORE
id 1756 1757 1758 dept Physics Math CS salary 50k 60k 80k name Scott Bob John
select avg(salary) from employee
HISTORY
- TAXIR
- 1969
- Biology information retrieval
- RAPID
- 1976
- Statistics Canada
- Canadian Census of Population and Housing
- Sybase IQ
- Early 1990s
- Sybase (later SAP)
- Only commercial column DB for many years
C-STORE
- Project from Brown, Brandeis, MIT, and UMass Boston
- Read-optimized
- Contributions:
- Hybrid architecture
- Novel data structures
- Advanced compression
2005: Winner "Best Logo" Award
ARCHITECTURE
2005: Winner "Most Intricate Architecture" Award
TUPLE MOVER
- What does it do???
TUPLE MOVER
- What does it do???
- Moves tuples
2005: Winner "Most Creative Name" Award
PROJECTIONS
name dept John CS Bob Math Scott Physics name salary Scott 50k Bob 60k John 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k
JOIN INDEXES
name dept John CS Bob Math Scott Physics name salary Scott 50k Bob 60k John 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k id 1756 1757 1758
COMPRESSION
- Similar data types
- Less information = less I/O
- Operate on compressed data
- Compression strategies:
- Run length encoding (v,f,n)
- Delta encoding
- Bitmap encoding (v,b)
- Distinct value encoding
RUN LENGTH ENCODING
Original
gender
female male male male male female female female
gender
(f, 0, 1) (m, 1, 4) (f, 5, 3) Result
DELTA ENCODING
salary
70,000 70,500 70,900 71,250 75,000 79,000 81,500 82,000
salary
70,000 500 400 350 4,250 3,500 2,500 500 Original Result
BITMAP ENCODING
dept
CS Math Math English CS CS Biology Math
dept bitmap
CS 10001100 Math 01100001 English 00010000 Biology 00000010 Original Result
DISTINCT VALUE ENCODING
gender age
female 5 male 4 male 4 male 8 female 4 female 5
gender age
00 1 01 1 01 1 10 01 00
gender/age
00 01 01 10 11 00 Stage 2 Stage 1 Original
LATE MATERIALIZATION
- Tuple reconstruction is:
- Costly
- Often unnecessary
- Solution:
- Apply predicates early per column
- Reconstruct tuples as late as possible
- Bring in only relevant columns
- Also improves cache performance
VERTICA
- Commercialization of C-Store
- BI and analytics market
- Acquired by HP in 2011
- Differences from C-Store
- Automatic layout tool
- No join indexes
- Instead requires super projection
MONETDB
- Developed at CWI (Netherlands)
- High performance on complex queries
- Column store architecture
- Uses demand paging
- Exploits CPU caches
- Automatic/self-tuning indexes
- Open source!
HARDWARE OPTIMIZATIONS
- Ocelot
- Hardware-oblivious abstraction
- Built on top of MonetDB
- Easily and efficiently support operations on:
- CPU
- GPU
- FPGA
- Take advantage of optimal memory access
patterns
- CPU: prefetching, cache awareness
- GPU: coalesce memory accesses
HARDWARE OPTIMIZATIONS
SUMMARY
- RDBMS
- SQL
- Row Store
- Column Store
- C-Store
- Vertica
- MonetDB
- Hardware Optimizations