Distributed Computing on PostgreSQL Marco Slot - - PowerPoint PPT Presentation

distributed computing on postgresql
SMART_READER_LITE
LIVE PREVIEW

Distributed Computing on PostgreSQL Marco Slot - - PowerPoint PPT Presentation

Distributed Computing on PostgreSQL Marco Slot <marco@citusdata.com> Small data architecture Big data architecture Big data architecture using postgres Real-time analytics Messaging Records Data warehouse PostgreSQL is a perfect


slide-1
SLIDE 1

Distributed Computing on PostgreSQL

Marco Slot <marco@citusdata.com>

slide-2
SLIDE 2

Small data architecture

slide-3
SLIDE 3

Big data architecture

slide-4
SLIDE 4

Records Data warehouse Real-time analytics

Big data architecture using postgres

Messaging

slide-5
SLIDE 5

PostgreSQL is a perfect building block for distributed systems

slide-6
SLIDE 6

Features!

PostgreSQL contains many useful features for building a distributed system:

  • Well-defined protocol, libpq
  • Crash safety
  • Concurrent execution
  • Transactions
  • Access controls
  • 2PC
  • Replication
  • Custom functions
slide-7
SLIDE 7

Extensions!

Built-in / contrib:

  • postgres_fdw
  • dblink

RPC!

  • plpgsql

Third-party open source:

  • pglogical
  • pg_cron
  • citus
slide-8
SLIDE 8

Extensions!

Built-in / contrib:

  • postgres_fdw
  • dblink

RPC!

  • plpgsql

Third-party open source:

  • pglogical
  • pg_cron
  • citus

Yours!

slide-9
SLIDE 9

dblink

Run queries on remote postgres server

SELECT dblink_connect(node_id, format('host=%s port=%s dbname=postgres', node_name, node_port)) FROM nodes; SELECT dblink_send_query(node_id, $$SELECT pg_database_size('postgres')$$) FROM nodes; SELECT sum(size::bigint) FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); SELECT dblink_disconnect(node_id) FROM nodes;

slide-10
SLIDE 10

RPC using dblink

For every postgres function, we can create a client-side stub using dblink.

CREATE FUNCTION func(input text) ... CREATE FUNCTION remote_func(host text, port int, input text) RETURNS text LANGUAGE sql AS $function$ SELECT res FROM dblink( format('host=%s port=%s', host, port), format('SELECT * FROM func(%L)', input)) AS res(output text); $function$;

slide-11
SLIDE 11

PL/pgSQL

Procedural language for Postgres:

CREATE FUNCTION distributed_database_size(dbname text) RETURNS bigint LANGUAGE plpgsql AS $function$ DECLARE total_size bigint; BEGIN PERFORM dblink_send_query(node_id, format('SELECT pg_database_size(%L)', dbname) FROM nodes; SELECT sum(size::bigint) INTO total_size FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); RETURN total_size END; $function$;

slide-12
SLIDE 12

Distributed system in progress...

With these extensions, we can already create a simple distributed computing system.

Nodes Nodes Nodes Nodes Parallel operation using dblink SELECT transform_data() Data 1 Data 2 Data 3 postgres_fdw?

slide-13
SLIDE 13

pglogical / logical replication

Asynchronously replicate changes to another database.

Nodes Nodes Nodes Nodes

slide-14
SLIDE 14

pg_paxos

Consistently replicate changes between databases.

Nodes Nodes Nodes

slide-15
SLIDE 15

pg_cron

Cron-based job scheduler for postgres: CREATE EXTENSION pg_cron; SELECT cron.schedule('* * * * */10', 'SELECT transform_data()'); Internally uses libpq, meaning it can also schedule jobs on other nodes. pg_cron provides a way for nodes to act autonomously

slide-16
SLIDE 16

Citus

Transparently shards tables across multiple nodes

Coordinator

E1 E4 E2 E5 E2 E5 Events

create_distributed_table('events', 'event_id');

slide-17
SLIDE 17

Citus MX

Nodes can have the distributed tables too

Coordinator

E1 E4 E2 E5 E2 E5 Events Events Events Events

slide-18
SLIDE 18

How to build a distributed system using only PostgreSQL & extensions?

slide-19
SLIDE 19

Building a streaming publish-subscribe system

Producers Postgres nodes Consumers

topic: adclick

slide-20
SLIDE 20

Storage nodes

E1 E4 E2 E5 E2 E5 Events Events Events

Coordinator

Events

CREATE TABLE

Use Citus to create a distributed table

slide-21
SLIDE 21

Distributed Table Creation

$ psql -h coordinator CREATE TABLE events ( event_id bigserial, ingest_time timestamptz default now(), topic_name text not null, payload jsonb ); SELECT create_distributed_table('events', 'event_id'); $ psql -h any-node INSERT INTO events (topic_name, payload) VALUES ('adclick','{...}');

slide-22
SLIDE 22

Sharding strategy

Shard is chosen by hashing the value in the partition column. Application-defined:

  • stream_id text not null

Optimise data distribution:

  • event_id bigserial

Optimise ingest capacity and availability:

  • sid int default pick_local_value()
slide-23
SLIDE 23

Producers connect to a random node and perform COPY or INSERT into events

Producers

E1 E4 E2 E5 E2 E5 Events Events Events COPY / INSERT

slide-24
SLIDE 24

Consumers in a group together consume events at least / exactly once.

Consumers

E1 E4 E2 E5 E2 E5 topic: adclick% Consumer group

slide-25
SLIDE 25

Consumers obtain leases for consuming a shard. Lease are kept in a separate table on each node:

CREATE TABLE leases ( consumer_group text not null, shard_id bigint not null,

  • wner text,

new_owner text, last_heartbeat timestamptz, PRIMARY KEY (consumer_group, shard_id) );

Consumer leases

slide-26
SLIDE 26

Consumers obtain leases for consuming a shard.

SELECT * FROM claim_lease('click-analytics', 'node-2', 102008);

Under the covers: Insert a new lease or set new_owner to steal lease.

CREATE FUNCTION claim_lease(group_name text, node_name text, shard_id int) … INSERT INTO leases (consumer_group, shard_id, owner, last_heartbeat) VALUES (group_name, shard, node_name, now()) ON CONFLICT (consumer_group, shard_id) DO UPDATE SET new_owner = node_name WHERE leases.new_owner IS NULL;

Consumer leases

slide-27
SLIDE 27

Distributing leases across consumers

Distributed algorithm for distributing leases across nodes

SELECT * FROM obtain_leases('click-analytics', 'node-2')

  • - gets all available lease tables
  • - claim all unclaimed shards
  • - claim random shards until #claims >= #shards/#consumers

Not perfect, but ensures all shards are consumed with load balancing (unless C>S)

slide-28
SLIDE 28

Consumers

E1 E4 E2 E5 E2 E5 leases

First consumer consumes all

  • btain_leases

leases leases

slide-29
SLIDE 29

Consumers

E1 E4 E2 E5 E2 E5

First consumer consumes all

leases leases leases

slide-30
SLIDE 30

Consumers

E1 E4 E2 E5 E2 E5

Second consumer steals leases from first consumer

  • btain_leases

leases leases leases

slide-31
SLIDE 31

Consumers

E1 E4 E2 E5 E2 E5

Second consumer steals leases from first consumer

slide-32
SLIDE 32

Consuming events

Consumer wants to receive all events once. Several options:

  • SQL level
  • Logical decoding utility functions
  • Use a replication connection
  • PG10 logical replication / pglogical
slide-33
SLIDE 33

Consuming events

Get a batch of events from a shard: SELECT * FROM poll_events('click-analytics', 'node-2', 102008, 'adclick',

'<last-processed-event-id>');

  • - Check if node has the lease

Set owner = new_owner if new_owner is set

  • - Get all pending events

(pg_logical_slot_peek_changes)

  • - Progress the replication slot (pg_logical_slot_get_changes)
  • - Return remaining events if still owner
slide-34
SLIDE 34

Consumer loop

E1 E4 E2 E5 E2 E5

1. Call poll_events for each leased shard 2. Process events from each batch 3. Repeat with event IDs of last event in each batch

poll_events

slide-35
SLIDE 35

Failure handling

Producer / consumer fails to connect to storage node: → Connect to different node Storage node fails: → Use pick_local_value() for partition column, failover to hot standby Consumer fails to consume batch → Events are repeated until confirmed Consumer fails and does not come back → Consumers periodically call obtain_leases → Old leases expire

slide-36
SLIDE 36

Use pg_cron to periodically expire leases on coordinator:

SELECT cron.schedule('* * * * *', 'SELECT expire_leases()'); CREATE FUNCTION expire_leases() ... UPDATE leases SET owner = new_owner, last_heartbeat = now() WHERE last_heartbeat < now() - interval '2 minutes'

Maintenance: Lease expiration

slide-37
SLIDE 37

Use pg_cron to periodically expire leases on coordinator:

$ psql -h coordinator SELECT cron.schedule('* * * * *', 'SELECT expire_events()'); CREATE FUNCTION expire_events() ... DELETE FROM events WHERE ingest_time < now() - interval '1 day';

Maintenance: Delete old events

slide-38
SLIDE 38

Prototyped a functional, highly available publish-subscribe systems in https://goo.gl/R1suAo

~300 lines of code

slide-39
SLIDE 39

Demo

slide-40
SLIDE 40

Records Data warehouse Real-time analytics

Big data architecture using postgres

Messaging

slide-41
SLIDE 41

Questions?

marco@citusdata.com