S h a r d i n g w i t h p o s t g r e s _ f d w Postgres Open 2013 - - PowerPoint PPT Presentation

s h a r d i n g w i t h p o s t g r e s f d w
SMART_READER_LITE
LIVE PREVIEW

S h a r d i n g w i t h p o s t g r e s _ f d w Postgres Open 2013 - - PowerPoint PPT Presentation

S h a r d i n g w i t h p o s t g r e s _ f d w Postgres Open 2013 Chicago Stephen Frost sfrost@snowman.net Resonate, Inc. Digital Media PostgreSQL Hadoop techjobs@resonateinsights.com http://www.resonateinsights.com S t


slide-1
SLIDE 1

S h a r d i n g w i t h p o s t g r e s _ f d w

Postgres Open 2013 Chicago

Stephen Frost sfrost@snowman.net

Resonate, Inc. • Digital Media • PostgreSQL • Hadoop • techjobs@resonateinsights.com • http://www.resonateinsights.com

slide-2
SLIDE 2

S t e p h e n F r o s t

  • PostgreSQL
  • Major Contributor, Committer
  • Implemented Roles in 8.3
  • Column-Level Privileges in 8.4
  • Contributions to PL/pgSQL, PostGIS
  • Resonate, Inc.
  • Principal Database Engineer
  • Online Digital Media Company
  • We're Hiring! - techjobs@resonateinsights.com
slide-3
SLIDE 3

D o y o u r e a d . . .

  • planet.postgresql.org
slide-4
SLIDE 4

W h a t i s a n F D W ?

  • First, SQL/MED
  • SQL/ Management of External Data
  • Standard to allow integration with external data
  • Foreign data can be nearly anything:
  • SQL Databases, CSV Files, Text Files,
  • NoSQL Databases, Cacheing systems, etc..
  • Defines the notion of a 'FOREIGN TABLE'
  • Foreign tables are "views" to external data
  • No data is stored in the DB
slide-5
SLIDE 5

W h a t i s a n F D W ? ( p a r t 2 )

  • FDWs are the back-end piece to support SQL/MED
  • PostgreSQL provides a generic FDW API
  • An FDW is a PG EXTENSION implementing the API
  • PG Extensions already exist for:
  • RDMS's: Oracle, MySQL, ODBC, JDBC
  • NoSQL's: CouchDB, Mongo, Redis
  • Files: CSV, Text, even JSON
  • "Other": Twitter, HTTP
  • Our focus will be on (ab)using postgres_fdw
slide-6
SLIDE 6

B a s i c s o f F D W c o n n e c t i o n s

  • Connecting to another actual RDBMS is complicated
  • CREATE FOREIGN SERVER
  • CREATE USER MAPPING
  • CREATE FOREIGN TABLE
  • 'SERVER' provides a name and options to connect
  • 'USER' maps the local user to the remote user
  • 'TABLE' defines:
  • A local TABLE object, with columns, etc
  • A remote TABLE (through a FOREIGN SERVER)
  • Connecting with a file FDW is simpler (no user map)
slide-7
SLIDE 7

U s i n g p o s t g r e s _ f d w

  • CREATE EXTENSION postgres_fdw;

CREATE FOREIGN SERVER shard01 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'shard01', dbname 'mydb', ...)

  • All libpq options accepted except user/pw
  • User/PW is specified in user mappings
  • Cost options (fdw_startup_cost, fdw_tuple_cost)
slide-8
SLIDE 8

C r e a t e i n g U s e r M a p p i n g s

CREATE USER MAPPING FOR myrole SERVER shard01 OPTIONS (user 'myrole', password 'abc123')

  • Only takes user and password options
  • User mappings are tied to servers
  • User must exist on client and server
  • Must use a password for non-superuser maps
slide-9
SLIDE 9

C r e a t i n g F o r e i g n T a b l e s

CREATE FOREIGN TABLE mytable_shard01 ( a int OPTIONS (column_name 'b'), b int OPTIONS (column_name 'a'), ... SERVER shard01 OPTIONS (table_name 'mytable');

  • Can pick remote schema, remote table, and

remote column

  • These don't have to match the local system
  • Very important for sharding
slide-10
SLIDE 10

R e m o t e Q u e r y E x e c u t i o n

  • Each backend manages its own remote connections
  • When a foreign table is queried:
  • PG opens a connection to the remote server
  • Starts a transaction on the remote server
  • A cursor is created for the query
  • WHERE clauses are pushed to remote server
  • Data is pulled through the remote cursor when

rows are requested during query execution

slide-11
SLIDE 11

M o r e o n Q u e r y E x e c u t i o n

  • The remote transaction ends when the local

transaction ends

  • Rolls back or commits based on local transaction
  • Rows inserted are not visible on remote until the

local transaction completes

  • Be careful of 'idle in transaction' connections..
  • Connections are kept after the foreign query
  • Re-used for later requests to the same server
  • No explicit limit on number of connections
  • Each connection uses up memory, of course.
slide-12
SLIDE 12

Q u e r y c o s t i n g w i t h F D W s

  • Approach to costing can be changed
  • Options can be set at server or table level
  • fdw_startup_cost and fdw_tuple_cost
  • use_remote_estimate - false (default)
  • Looks up statistics for the table locally
  • Statistics updated with ANALYZE
  • use_remote_estimate - true
  • Queries the remote server to determine cost info
  • Uses EXPLAIN on remote side
  • ANALYZE your tables!
slide-13
SLIDE 13

S h a r d i n g

  • What is sharding?
  • Horizontal partitioning across servers
  • Break up large tables based on a key/range
  • Replicate small / common data across nodes
  • Why sharding?
  • Allows (more) parallelization of work
  • Scales beyond a single server
  • Challenges
  • Data consistency
  • Difficult to query aganist
slide-14
SLIDE 14

D e a l i n g w i t h 3 2 s h a r d s

  • Why 32?
  • Pre-sharding
  • Only 8 physical servers
  • Four clusters per node
  • Too many to manage manually
  • Script everything
  • Building the clusters
  • User/role creation
  • Table creation, etc, etc..
  • Use a CM System (Puppet, Chef, etc.)
slide-15
SLIDE 15

S h a r d i n g s u g g e s t i o n s

  • Still partition on shards
  • Smaller tables, smaller indexes
  • Use inheiritance and CHECK constraints
  • Foreign tables can use parent tables
  • Break up sequence spaces
  • Define a range for each shard
  • Put constraints to ensure correct sequence used
  • Consider one global sequence approach
slide-16
SLIDE 16

F D W C h a l l e n g e s

  • Not parallelized!
  • Queries against foreign tables are done serially
  • Transactions commit with the head node
  • What is pushed down and what isn't?
  • Conditionals
  • Only built-in data types, operators, functions
  • Joins aren't (yet...)
  • Not able to call remote functions directly
  • Foreign Tables are one-to-one
  • Inserts go to all columns (can't have defaults..)
slide-17
SLIDE 17

P a r a l l e l i z i n g

  • Need an independent "job starting" process
  • cron
  • pgAgent
  • Daemon w/ LISTEN/NOTIFY
  • Use triggers on remote tables to NOTIFY
  • View / Manage jobs through the head node
  • Custom background worker...?
slide-18
SLIDE 18

W o r k i n g t h r o u g h F D W s

  • Use lots of views
  • Script building them
  • UNION ALL is your friend
  • Add constants/conditionals to view's query
  • Use DO-INSTEAD rules for updates
  • Put them on foreign system too for joins, etc
  • Get friendly with triggers
  • Use them to run remote procedures
  • Remember that everything is serial!
  • Bottlenecks, network latency can be a factor
slide-19
SLIDE 19

V i e w E x a m p l e

C R E A T E F O R E I G N T A B L E w o r k f l o w . j o b s _ s h a r d 1 ( w o r k f l o w _ n a m e t e x t , n a m e t e x t , s t a t e t e x t ) S E R V E R s h a r d 1 O P T I O N S ( s c h e m a _ n a m e ' w o r k f l o w ' , t a b l e _ n a m e ' j o b s ' ) ; . . . C R E A T E F O R E I G N T A B L E w o r k f l o w . w o r k f l o w _ s h a r d 1 ( n a m e t e x t , s t a t e t e x t ) S E R V E R s h a r d 1 O P T I O N S ( s c h e m a _ n a m e ' w o r k f l o w ' , t a b l e _ n a m e ' w o r k f l o w ' ) ; . . . C R E A T E V I E W w o r k f l o w . w o r k f l o w A S S E L E C T ' s h a r d 1 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . w o r k f l o w _ s h a r d 1 U N I O N A L L S E L E C T ' s h a r d 2 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . w o r k f l o w _ s h a r d 2 U N I O N A L L S E L E C T ' s h a r d 3 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . w o r k f l o w _ s h a r d 3 . . . C R E A T E V I E W w o r k f l o w . j o b s A S S E L E C T ' s h a r d 1 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . j o b s _ s h a r d 1 U N I O N A L L S E L E C T ' s h a r d 2 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . j o b s _ s h a r d 2 U N I O N A L L S E L E C T ' s h a r d 3 ' : : t e x t A S s h a r d , * F R O M w o r k f l o w _ s h a r d . j o b s _ s h a r d 3 . . .

slide-20
SLIDE 20

W h a t ' s P G d o ?

  • Let's try a join..

E X P L A I N S E L E C T * F R O M w o r k f l o w J O I N j o b s O N ( w o r k f l o w . s h a r d = j o b s . s h a r d a n d w o r k f l o w . n a m e = w o r k f l o w _ n a m e ) ;

Q U E R Y P L A N

  • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

M e r g e J o i n ( c o s t = 1 3 2 3 5 . 2 9 . . 4 1 5 5 5 . 0 8 r o w s = 1 8 7 8 6 1 0 w i d t h = 2 2 4 ) M e r g e C o n d : ( ( ( ' s h a r d 1 ' : : t e x t ) = ( ' s h a r d 1 ' : : t e x t ) ) A N D ( w o r k f l o w _ s h a r d 1 . n a m e = j o b s _ s h a r d 1 . w o r k f l o w _ n a m e ) )

  • > S o r t ( c o s t = 6 3 6 7 . 7 5 . . 6 4 1 0 . 7 9 r o w s = 1 7 2 1 6 w i d t h = 1 2 8 )

S o r t K e y : ( ' s h a r d 1 ' : : t e x t ) , j o b s _ s h a r d 1 . w o r k f l o w _ n a m e

  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 4 0 3 6 . 4 8 r o w s = 1 7 2 1 6 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n j o b s _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n j o b s _ s h a r d 2 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n j o b s _ s h a r d 3 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n j o b s _ s h a r d 4 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )

. . . . . .

  • > M a t e r i a l i z e ( c o s t = 6 8 6 7 . 5 4 . . 6 9 7 6 . 6 6 r o w s = 2 1 8 2 4 w i d t h = 9 6 )
  • > S o r t ( c o s t = 6 8 6 7 . 5 4 . . 6 9 2 2 . 1 0 r o w s = 2 1 8 2 4 w i d t h = 9 6 )

S o r t K e y : ( ' s h a r d 1 ' : : t e x t ) , w o r k f l o w _ s h a r d 1 . n a m e

  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 4 1 7 4 . 7 2 r o w s = 2 1 8 2 4 w i d t h = 9 6 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d 2 ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d 3 ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )

. . . . . . ( 7 3 r o w s )

slide-21
SLIDE 21

P l a y i n g w i t h v i e w s

  • Looking at one shard..

E X P L A I N S E L E C T * F R O M w o r k f l o w J O I N j o b s O N ( w o r k f l o w . s h a r d = j o b s . s h a r d a n d w o r k f l o w . n a m e = w o r k f l o w _ n a m e ) W H E R E w o r k f l o w . s h a r d = ' s h a r d 1 ' ; Q U E R Y P L A N

  • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

H a s h J o i n ( c o s t = 2 3 2 . 8 6 . . 3 2 9 . 4 1 r o w s = 1 8 3 5 w i d t h = 2 2 4 ) H a s h C o n d : ( w o r k f l o w _ s h a r d 1 . n a m e = j o b s _ s h a r d 1 . w o r k f l o w _ n a m e )

  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )
  • > H a s h ( c o s t = 1 2 6 . 1 4 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n j o b s _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )

( 7 r o w s )

  • Much better, but means you have to remember...
  • Still works through prepared queries
slide-22
SLIDE 22

V e r b o s e

  • Shows the query to be sent

E X P L A I N ( v e r b o s e ) S E L E C T * F R O M w o r k f l o w J O I N j o b s O N ( w o r k f l o w . s h a r d = j o b s . s h a r d a n d w o r k f l o w . n a m e = w o r k f l o w _ n a m e ) W H E R E w o r k f l o w . s h a r d = ' s h a r d 1 ' ; Q U E R Y P L A N

  • - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

H a s h J o i n ( c o s t = 2 3 2 . 8 6 . . 3 2 9 . 4 1 r o w s = 1 8 3 5 w i d t h = 2 2 4 ) O u t p u t : ( ' s h a r d 1 ' : : t e x t ) , w o r k f l o w _ s h a r d 1 . n a m e , w o r k f l o w _ s h a r d 1 . s t a t e , ( ' s h a r d 1 ' : : t e x t ) , j o b s _ s h a r d 1 . w o r k f l o w _ n a m e , j o b s _ s h a r d 1 . n a m e , j o b s _ s h a r d 1 . s t a t e H a s h C o n d : ( w o r k f l o w _ s h a r d 1 . n a m e = j o b s _ s h a r d 1 . w o r k f l o w _ n a m e )

  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d . w o r k f l o w _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 3 0 . 4 6 r o w s = 6 8 2 w i d t h = 9 6 )

O u t p u t : ' s h a r d 1 ' : : t e x t , w o r k f l o w _ s h a r d 1 . n a m e , w o r k f l o w _ s h a r d 1 . s t a t e R e m o t e S Q L : S E L E C T n a m e , s t a t e F R O M w o r k f l o w . w o r k f l o w

  • > H a s h ( c o s t = 1 2 6 . 1 4 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )

O u t p u t : ( ' s h a r d 1 ' : : t e x t ) , j o b s _ s h a r d 1 . w o r k f l o w _ n a m e , j o b s _ s h a r d 1 . n a m e , j o b s _ s h a r d 1 . s t a t e

  • > A p p e n d ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )
  • > F o r e i g n S c a n o n w o r k f l o w _ s h a r d . j o b s _ s h a r d 1 ( c o s t = 1 0 0 . 0 0 . . 1 2 6 . 1 4 r o w s = 5 3 8 w i d t h = 1 2 8 )

O u t p u t : ' s h a r d 1 ' : : t e x t , j o b s _ s h a r d 1 . w o r k f l o w _ n a m e , j o b s _ s h a r d 1 . n a m e , j o b s _ s h a r d 1 . s t a t e R e m o t e S Q L : S E L E C T w o r k f l o w _ n a m e , n a m e , s t a t e F R O M w o r k f l o w . j o b s ( 1 3 r o w s )

slide-23
SLIDE 23

F i r i n g a r e m o t e p r o c e d u r e

  • Have to set it up as an INSERT trigger
  • Arguments and result end up being columns

O n t h e s h a r d s : C R E A T E T A B L E f i r e _ f u n c ( i d b i g i n t , a i n t , b i n t , r e s u l t i n t ) ; C R E A T E F U N C T I O N a d d _ t w o ( ) R E T U R N S t r i g g e r A S $ _ $ b e g i n n e w . r e s u l t = n e w . a + n e w . b ; r e t u r n n e w ; e n d ; $ _ $ L A N G U A G E p l p g s q l ; C R E A T E T R I G G E R a d d _ t w o _ t r i g B E F O R E I N S E R T O N f i r e _ f u n c F O R E A C H R O W E X E C U T E P R O C E D U R E a d d _ t w o ( ) ; O n t h e h e a d n o d e : C R E A T E F O R E I G N T A B L E f i r e _ f u n c _ s h a r d 1 ( i d b i g i n t , a i n t , b i n t , r e s u l t i n t ) S E R V E R s h a r d 1 O P T I O N S ( s c h e m a _ n a m e ' w o r k f l o w ' , t a b l e _ n a m e ' f i r e _ f u n c ' ) ; C R E A T E F O R E I G N T A B L E f i r e _ f u n c _ s h a r d 2 ( i d b i g i n t , a i n t , b i n t , r e s u l t i n t ) S E R V E R s h a r d 2 O P T I O N S ( s c h e m a _ n a m e ' w o r k f l o w ' , t a b l e _ n a m e ' f i r e _ f u n c ' ) ; . . . = # i n s e r t i n t o f i r e _ f u n c _ s h a r d 2 ( i d , a , b ) v a l u e s ( 1 0 0 , 1 , 2 ) r e t u r n i n g i d , a , b , r e s u l t ; i d | a | b | r e s u l t

  • - - - - + - - - + - - - + - - - - - - - -

1 0 0 | 1 | 2 | 3 ( 1 r o w )

slide-24
SLIDE 24

M a n a g i n g f o r e i g n t a b l e s

  • Scripts, ideally generalized
  • Generating foreign tables
  • Building views
  • Use a schema migration system
  • Roll-your-own
  • External options (Sqitch, etc)
  • Use "foreign schemas"
slide-25
SLIDE 25

I m p r o v e m e n t s f o r F D W s

  • Parallelize work
  • Make Append() send all FDW queries at once
  • Use a round-robin approach to pulling data
  • Buffer results
  • Better user management
  • Credential proxying
  • Automatic user maps
  • Trusted inter-server connections
slide-26
SLIDE 26

M o r e i d l e t h o u g h t s

  • Make UNION ALL views updatable
  • Inheiritance for foreign tables
  • Auto-discover foreign table definition
  • Join push-down
  • Scripting the server/user map/table creation
  • Building views over the foregn tables
  • How views are implemented / run by PG
  • Build system to trigger actions off of a table update
  • Managing workflows, external processes
  • REAL PARTITIONING
slide-27
SLIDE 27

T h a n k y o u !

Stephen Frost sfrost@snowman.net @net_snow