P o s t g r e S Q L a s a C o l u m n a r S t o r e DCPUG May - - PowerPoint PPT Presentation

p o s t g r e s q l a s a c o l u m n a r s t o r e
SMART_READER_LITE
LIVE PREVIEW

P o s t g r e S Q L a s a C o l u m n a r S t o r e DCPUG May - - PowerPoint PPT Presentation

P o s t g r e S Q L a s a C o l u m n a r S t o r e DCPUG May 2014 Reston, VA Stephen Frost sfrost@snowman.net Resonate, Inc. Digital Media PostgreSQL Hadoop techjobs@resonateinsights.com http://www.resonateinsights.com


slide-1
SLIDE 1

P o s t g r e S Q L a s a C o l u m n a r S t o r e

DCPUG May 2014 Reston, VA

Stephen Frost sfrost@snowman.net

Resonate, Inc. • Digital Media • PostgreSQL • Hadoop • techjobs@resonateinsights.com • http://www.resonateinsights.com

slide-2
SLIDE 2

S t e p h e n F r o s t

  • PostgreSQL
  • Major Contributor, Committer
  • Implemented Roles in 8.3
  • Column-Level Privileges in 8.4
  • Contributions to PL/pgSQL, PostGIS
  • Resonate, Inc.
  • Principal Database Engineer
  • Online Digital Media Company
  • We're Hiring! - techjobs@resonateinsights.com
slide-3
SLIDE 3

D o y o u r e a d . . .

  • planet.postgresql.org
slide-4
SLIDE 4

W h y C o l u m n a r ?

  • Reduced overhead
  • High compressability
  • Min/Max indexes
slide-5
SLIDE 5

P G C o l u m n a r C h a l l e n g e s

  • Large per-row overhead
  • Page header
  • Line pointers
  • 24 bytes per row of visibility information
  • 226 rows/page with single 4-byte integer
  • No Min/Max indexes
  • No compression
slide-6
SLIDE 6

P a g e H e a d e r s p e c i f i c s

  • pd_lsn - Identifies xlog record for last change
  • pd_checksum - Page checksum, if set
  • pd_flags - Set of flags
  • pd_lower - Offset to start of free space
  • pd_upper - Offset to end of free space
  • pd_special - Offset to special area
  • pd_pagesize_version - Size (bytes) & layout version
  • pd_prune_xid - Oldest prunable XID in tuples
slide-7
SLIDE 7

P a g e H e a d e r S i z e s

  • pd_lsn - 8 bytes
  • pd_checksum - 2 bytes
  • pd_flags - 2 bytes
  • pd_lower - 2 bytes
  • pd_upper - 2 bytes
  • pd_special - 2 bytes
  • pd_pagesize_version - 2 bytes
  • pd_prunce_xid - 4 bytes
  • Total: 24 bytes
slide-8
SLIDE 8

L i n e p o i n t e r s

  • One pointer per tuple
  • 4 bytes each
  • 15 bits for offset to start of tuple
  • 2 bits for flags (tuple state)
  • 15 bits for length of tuple (in bytes)
slide-9
SLIDE 9

T u p l e H e a d e r

  • xmin - inserting XID - 4 bytes
  • xmax - deleteing XID - 4 bytes
  • ctid - command ID - 4 bytes
  • tid - current TID - 4 bytes
  • infomask - 4 bytes
  • header size / offset - 1 byte
  • variable length NULL bitmap - 1+ bytes
  • Total: 24 bytes
slide-10
SLIDE 10

T u p l e s

  • Data: Single integer - 4 bytes
  • Padding: 4 bytes (would get bigint for free...)
  • Tuple length total: 32 bytes
  • Stored from the end of the page
  • First tuple at 8160 (8192-32)
  • Last tuple at 960 (960 - 991)
slide-11
SLIDE 11

O v e r a l l P a g e

  • Page header
  • 0 - 24
  • Line pointers
  • 24 - 927 (226x 4 bytes each)
  • Tuples
  • 960 - 8192 (226x 32 bytes each)
  • Free space
  • 928 - 960 (32 bytes)
  • Can't fit another line # + tuple
  • Density: 11% (904 bytes of data in 8192 page)
slide-12
SLIDE 12

W h y P G ?

  • ACID hotness
  • Checksums (tho we don't use them, yet)
  • Already use it for many other things
  • Great platform for development
  • Can "hide" the columnar reality
slide-13
SLIDE 13

A r r a y s i n P G

  • An array adds another header to the mix
  • After the header, tightly packed integers
  • Arrays can be compressed via TOAST
  • Can store mixed arrays / non-arrays in a tuple
slide-14
SLIDE 14

A r r a y H e a d e r

  • vl_len - Array length (req'd for var-length types)
  • ndims - Number of dimensions
  • dataoffset - Offset to array data (0 if no NULLs)
  • elemtype - Oid for array element (eg: integer)
  • (per dimension)
  • ARR_DIMS - Number of entries in the dimension
  • ARR_LBOUND - Lower bound of the dimension
slide-15
SLIDE 15

A r r a y H e a d e r S i z e s

  • vl_len - 4 bytes
  • ndims - 4 bytes
  • dataoffset - 4 bytes
  • elemtype - 4 bytes
  • ARR_DIMS - 4 bytes
  • ARR_LBOUND - 4 bytes
  • Total: 24
  • Tuple + Array headers: 24 + 24 = 48 bytes
slide-16
SLIDE 16

A r r a y T u p l e s

  • Data: Single integer - 4 bytes
  • Padding: 4 bytes (would get bigint for free...)
  • Tuple length total: 56 bytes
  • Stored from the end of the page
  • First tuple at 8136 (8192-56)
  • Last tuple at 576 (576 - 631)
slide-17
SLIDE 17

O v e r a l l P a g e

  • Page header
  • 0 - 24
  • Line pointers
  • 24 - 567 (136x 4 bytes each)
  • Tuples
  • 576 - 8192 (136x 56 bytes each)
  • Free space
  • 568 - 576 (8 bytes)
  • Can't fit another line # + tuple
  • Density: 7% (544 bytes of data in 8192 page)
slide-18
SLIDE 18

W h y A r r a y s ?

  • After array header, the values are tightly packed
  • Consider arrays of 200 integers
  • Big array header, but
  • Much less overhead overall
  • Far fewer line pointers
  • Am I for real? Let's check it out-
slide-19
SLIDE 19

A r r a y T u p l e s x 2 0 0

  • Data: 200 integers - 800 bytes
  • Padding: NONE
  • Tuple length total: 848 bytes
  • Stored from the end of the page
  • First tuple at 7344 (8192-848)
  • Last tuple at 560 (560 - 1407)
slide-20
SLIDE 20

O v e r a l l P a g e

  • Page header
  • 0 - 24
  • Line pointers
  • 24 - 59 (9x 4 bytes each)
  • Tuples
  • 560 - 8192 (9x 848 bytes each)
  • Free space
  • 60 - 560 (500 bytes)
  • Could fit another array if we tried..
  • Density: 88% (7200 bytes of data in 8192 page)
slide-21
SLIDE 21

A r r a y T u p l e s x 2 1 2

  • Squeeze it out...
  • Data: 212 integers - 848 bytes
  • Padding: NONE
  • Tuple length total: 896 bytes
  • Stored from the end of the page
  • First tuple at 7296 (8192-896)
  • Last tuple at 128 (128 - 1023)
slide-22
SLIDE 22

O v e r a l l P a g e

  • Page header
  • 0 - 24
  • Line pointers
  • 24 - 59 (9x 4 bytes each)
  • Tuples
  • 128 - 8192 (9x 896 bytes each)
  • Free space
  • 60 - 128 (68 bytes) - Lots of room...
  • Not that much lost
  • Density: 93% (7632 bytes of data in 8192 page)
slide-23
SLIDE 23

C a v e a t s

  • Have to use unnest()
  • PG may have more difficulty planning
  • Have to unnest() an entire array to extract data
  • No visibility info on each value
  • Stored in individual tables, can be awkward
  • Requires mapping tables
slide-24
SLIDE 24

C o l u m n a r A d v a n t a g e s ?

  • Compression via TOAST
  • Will try to compress arrays > 2k
  • May be unable to
  • Requires more CPU to decompress
  • Min/Max
  • Add 'min_value' and 'max_value columns
  • Use regular btree indexes
  • Include in queries
slide-25
SLIDE 25

Q u e s t i o n s ?

slide-26
SLIDE 26

T h a n k y o u !

Stephen Frost sfrost@snowman.net @net_snow