C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K - - PowerPoint PPT Presentation

c olumn d atabases
SMART_READER_LITE
LIVE PREVIEW

C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K - - PowerPoint PPT Presentation

C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K AT O S O UTLINE RDBMS SQL Row Store Column Store C-Store Vertica MonetDB Hardware Optimizations F ACULTY M EMBER V ERSION E XPERIMENT Question :


slide-1
SLIDE 1

A N D R E W C R O T T Y & A L E X G A L A K AT O S

COLUMN DATABASES

slide-2
SLIDE 2

OUTLINE

  • RDBMS
  • SQL
  • Row Store
  • Column Store
  • C-Store
  • Vertica
  • MonetDB
  • Hardware Optimizations
slide-3
SLIDE 3

FACULTY MEMBER VERSION

slide-4
SLIDE 4

EXPERIMENT

  • Question: How does time spent as a faculty

member affect one's attention span?

slide-5
SLIDE 5

HYPOTHESIS

< 1 ≈ 10 > 100 Attention Span Time as Faculty (years)

Attention Span vs Time as Faculty

slide-6
SLIDE 6

WHAT IS AN RDBMS?

  • Relational database management system
  • Standard for information storage/retrieval
  • Based on Codd's relational model
  • Data structured as relations
  • Manipulated using relational operators
  • Structure specified by a schema
slide-7
SLIDE 7

EXAMPLE

id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

slide-8
SLIDE 8

WHAT IS SQL?

  • Structured Query Language
  • Used to operate on relations
  • Four main operations:
  • select
  • insert
  • update
  • delete
slide-9
SLIDE 9

SELECT STATEMENT

select <column(s)> from <table> where <column> = <val>

slide-10
SLIDE 10

EXAMPLE

id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

slide-11
SLIDE 11

ROW STORE

id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

slide-12
SLIDE 12

ROW STORE

  • What is the name, department, and salary of the

employee with id 1757?

select name, dept, salary from employee where id = 1757

slide-13
SLIDE 13

ROW STORE

id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k

select name, dept, salary from employee where id = 1757

slide-14
SLIDE 14

ROW STORE

  • What is the average salary of all employees?

select avg(salary) from employee

slide-15
SLIDE 15

ROW STORE

id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k

select avg(salary) from employee

slide-16
SLIDE 16

ROW STORE

id name dept salary 1756 Scott Physics 50k id name dept salary 1757 Bob Math 60k id name dept salary 1758 John CS 80k

select avg(salary) from employee

slide-17
SLIDE 17

COLUMN STORE

id 1756 1757 1758 dept Physics Math CS salary 50k 60k 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k name Scott Bob John

slide-18
SLIDE 18

COLUMN STORE

  • What is the average salary of all employees?

select avg(salary) from employee

slide-19
SLIDE 19

COLUMN STORE

id 1756 1757 1758 dept Physics Math CS salary 50k 60k 80k name Scott Bob John

select avg(salary) from employee

slide-20
SLIDE 20

HISTORY

  • TAXIR
  • 1969
  • Biology information retrieval
  • RAPID
  • 1976
  • Statistics Canada
  • Canadian Census of Population and Housing
  • Sybase IQ
  • Early 1990s
  • Sybase (later SAP)
  • Only commercial column DB for many years
slide-21
SLIDE 21

C-STORE

  • Project from Brown, Brandeis, MIT, and UMass Boston
  • Read-optimized
  • Contributions:
  • Hybrid architecture
  • Novel data structures
  • Advanced compression

2005: Winner "Best Logo" Award

slide-22
SLIDE 22

ARCHITECTURE

2005: Winner "Most Intricate Architecture" Award

slide-23
SLIDE 23

TUPLE MOVER

  • What does it do???
slide-24
SLIDE 24

TUPLE MOVER

  • What does it do???
  • Moves tuples

2005: Winner "Most Creative Name" Award

slide-25
SLIDE 25

PROJECTIONS

name dept John CS Bob Math Scott Physics name salary Scott 50k Bob 60k John 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

slide-26
SLIDE 26

JOIN INDEXES

name dept John CS Bob Math Scott Physics name salary Scott 50k Bob 60k John 80k id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k id 1756 1757 1758

slide-27
SLIDE 27

COMPRESSION

  • Similar data types
  • Less information = less I/O
  • Operate on compressed data
  • Compression strategies:
  • Run length encoding (v,f,n)
  • Delta encoding
  • Bitmap encoding (v,b)
  • Distinct value encoding
slide-28
SLIDE 28

RUN LENGTH ENCODING

Original

gender

female male male male male female female female

gender

(f, 0, 1) (m, 1, 4) (f, 5, 3) Result

slide-29
SLIDE 29

DELTA ENCODING

salary

70,000 70,500 70,900 71,250 75,000 79,000 81,500 82,000

salary

70,000 500 400 350 4,250 3,500 2,500 500 Original Result

slide-30
SLIDE 30

BITMAP ENCODING

dept

CS Math Math English CS CS Biology Math

dept bitmap

CS 10001100 Math 01100001 English 00010000 Biology 00000010 Original Result

slide-31
SLIDE 31

DISTINCT VALUE ENCODING

gender age

female 5 male 4 male 4 male 8 female 4 female 5

gender age

00 1 01 1 01 1 10 01 00

gender/age

00 01 01 10 11 00 Stage 2 Stage 1 Original

slide-32
SLIDE 32

LATE MATERIALIZATION

  • Tuple reconstruction is:
  • Costly
  • Often unnecessary
  • Solution:
  • Apply predicates early per column
  • Reconstruct tuples as late as possible
  • Bring in only relevant columns
  • Also improves cache performance
slide-33
SLIDE 33

VERTICA

  • Commercialization of C-Store
  • BI and analytics market
  • Acquired by HP in 2011
  • Differences from C-Store
  • Automatic layout tool
  • No join indexes
  • Instead requires super projection
slide-34
SLIDE 34

MONETDB

  • Developed at CWI (Netherlands)
  • High performance on complex queries
  • Column store architecture
  • Uses demand paging
  • Exploits CPU caches
  • Automatic/self-tuning indexes
  • Open source!
slide-35
SLIDE 35

HARDWARE OPTIMIZATIONS

  • Ocelot
  • Hardware-oblivious abstraction
  • Built on top of MonetDB
  • Easily and efficiently support operations on:
  • CPU
  • GPU
  • FPGA
  • Take advantage of optimal memory access

patterns

  • CPU: prefetching, cache awareness
  • GPU: coalesce memory accesses
slide-36
SLIDE 36

HARDWARE OPTIMIZATIONS

slide-37
SLIDE 37

SUMMARY

  • RDBMS
  • SQL
  • Row Store
  • Column Store
  • C-Store
  • Vertica
  • MonetDB
  • Hardware Optimizations

Questions?