c olumn d atabases
play

C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K - PowerPoint PPT Presentation

C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K AT O S O UTLINE RDBMS SQL Row Store Column Store C-Store Vertica MonetDB Hardware Optimizations F ACULTY M EMBER V ERSION E XPERIMENT Question :


  1. C OLUMN D ATABASES A N D R E W C R O T T Y & A L E X G A L A K AT O S

  2. O UTLINE • RDBMS • SQL • Row Store • Column Store • C-Store • Vertica • MonetDB • Hardware Optimizations

  3. F ACULTY M EMBER V ERSION

  4. E XPERIMENT • Question : How does time spent as a faculty member affect one's attention span ?

  5. H YPOTHESIS Attention Span vs Time as Faculty Attention Span 0 < 1 ≈ 10 > 100 Time as Faculty (years)

  6. W HAT IS AN RDBMS? • Relational database management system • Standard for information storage/retrieval • Based on Codd's relational model • Data structured as relations • Manipulated using relational operators • Structure specified by a schema

  7. E XAMPLE id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

  8. W HAT IS SQL? • Structured Query Language • Used to operate on relations • Four main operations: • select • insert • update • delete

  9. S ELECT S TATEMENT select <column(s)> from <table> where <column> = <val>

  10. E XAMPLE id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1758 John CS 80k

  11. R OW S TORE id name dept salary 1756 Scott Physics 50k id name dept salary id name dept salary 1756 Scott Physics 50k 1757 Bob Math 60k 1757 Bob Math 60k 1758 John CS 80k id name dept salary 1758 John CS 80k

  12. R OW S TORE • What is the name, department, and salary of the employee with id 1757 ? select name, dept, salary from employee where id = 1757

  13. R OW S TORE id name dept salary 1756 Scott Physics 50k select name, dept, salary id name dept salary from employee 1757 Bob Math 60k where id = 1757 id name dept salary 1758 John CS 80k

  14. R OW S TORE • What is the average salary of all employees ? select avg(salary) from employee

  15. R OW S TORE id name dept salary 1756 Scott Physics 50k id name dept salary select avg(salary) 1757 Bob Math 60k from employee id name dept salary 1758 John CS 80k

  16. R OW S TORE id name dept salary 1756 Scott Physics 50k id name dept salary select avg(salary) 1757 Bob Math 60k from employee id name dept salary 1758 John CS 80k

  17. C OLUMN S TORE id name 1756 Scott id name dept salary 1757 Bob 1758 John 1756 Scott Physics 50k 1757 Bob Math 60k dept salary 1758 John CS 80k Physics 50k Math 60k CS 80k

  18. C OLUMN S TORE • What is the average salary of all employees ? select avg(salary) from employee

  19. C OLUMN S TORE id name 1756 Scott 1757 Bob 1758 John select avg(salary) from employee dept salary Physics 50k Math 60k CS 80k

  20. H ISTORY • TAXIR • 1969 • Biology information retrieval • RAPID • 1976 • Statistics Canada • Canadian Census of Population and Housing • Sybase IQ • Early 1990s • Sybase (later SAP) • Only commercial column DB for many years

  21. C-S TORE • Project from Brown, Brandeis, MIT, and UMass Boston • Read-optimized • Contributions: • Hybrid architecture • Novel data structures • Advanced compression 2005 : Winner "Best Logo" Award

  22. A RCHITECTURE 2005 : Winner "Most Intricate Architecture" Award

  23. T UPLE M OVER • What does it do ???

  24. T UPLE M OVER • What does it do ??? • Moves tuples 2005 : Winner "Most Creative Name" Award

  25. P ROJECTIONS name dept John CS id name dept salary Bob Math Scott Physics 1756 Scott Physics 50k 1757 Bob Math 60k name salary 1758 John CS 80k Scott 50k Bob 60k John 80k

  26. J OIN I NDEXES name dept John CS Bob Math Scott Physics id name dept salary id 1756 Scott Physics 50k 1756 1757 Bob Math 60k 1757 1758 John CS 80k 1758 name salary Scott 50k Bob 60k John 80k

  27. C OMPRESSION • Similar data types • Less information = less I/O • Operate on compressed data • Compression strategies: • Run length encoding (v,f,n) • Delta encoding • Bitmap encoding (v,b) • Distinct value encoding

  28. R UN L ENGTH E NCODING Original gender female Result male gender male (f, 0, 1) male (m, 1, 4) male (f, 5, 3) female female female

  29. D ELTA E NCODING Result Original salary salary 70,000 70,000 70,500 500 70,900 400 71,250 350 75,000 4,250 79,000 3,500 81,500 2,500 82,000 500

  30. B ITMAP E NCODING Original dept Result CS dept bitmap Math CS 10001100 Math Math 01100001 English English 00010000 CS Biology 00000010 CS Biology Math

  31. D ISTINCT V ALUE E NCODING Stage 1 Stage 2 Original gender age gender age gender/age female 5 0 00 00 male 4 1 01 01 male 4 1 01 01 male 8 1 10 10 female 4 0 01 11 female 5 0 00 00

  32. L ATE M ATERIALIZATION • Tuple reconstruction is: • Costly • Often unnecessary • Solution : • Apply predicates early per column • Reconstruct tuples as late as possible • Bring in only relevant columns • Also improves cache performance

  33. V ERTICA • Commercialization of C-Store • BI and analytics market • Acquired by HP in 2011 • Differences from C-Store • Automatic layout tool • No join indexes • Instead requires super projection

  34. M ONET DB • Developed at CWI (Netherlands) • High performance on complex queries • Column store architecture • Uses demand paging • Exploits CPU caches • Automatic/self-tuning indexes • Open source!

  35. H ARDWARE O PTIMIZATIONS • Ocelot • Hardware-oblivious abstraction • Built on top of MonetDB • Easily and efficiently support operations on: • CPU • GPU • FPGA • Take advantage of optimal memory access patterns • CPU: prefetching, cache awareness • GPU: coalesce memory accesses

  36. H ARDWARE O PTIMIZATIONS

  37. S UMMARY • RDBMS • SQL • Row Store • Column Store • C-Store • Vertica • MonetDB • Hardware Optimizations Questions ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend