pg_rewind Heikki Linnakangas Your typical setup Streaming - PowerPoint PPT Presentation

pg_rewind Heikki Linnakangas

Your typical setup Streaming Replication STANDBY Master

Your typical catastrophe Streaming Replication STANDBY Master

Standby takes over STANDBY Master MASTER

Wait, the old master survived after all! STANDBY Master MASTER

How do you turn the old master into standby? Streaming Replication??? STANDBY Master MASTER STANDBY??

WAL Timelines 1 2 # # T T R R E E S S N N I TLI 1 I

WAL Timelines 1 2 # # T T R R Master E E S S N N I TLI 1 I TLI 1 Standby

Promotion 1 2 # # T T R R Master E E S S N N I TLI 1 I Meteor Promotion TLI 1 TLI 2 I N S Standby E R T # 3

Lost transactions Lost transactions, not 1 2 # # streamed to standby T T R R E E S S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3

What about synchronous replication? Lost transactions, not 1 2 # # streamed to standby T T R R E E S S N N I TLI 1 I TLI 1 TLI 2 I N S E R T Nope: # 3 ● only commits are synchronized ● records may hit the disk in master before they're replicated anyway

Even controlled failover is tricky ● How do you verify that the standby got all the WAL?

How to resynchronize? 1 2 # # ??? T T R R E E S S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3

Naive approach ● Just create a recovery.conf file on old master to point to new master ● Will not work: LOG: database system was shut down at 2015-03-05 15:26:37 EET LOG: entering standby mode LOG: consistent recovery state reached at 0/4000098 LOG: invalid record length at 0/4000098 LOG: fetching timeline history file for timeline 2 from primary server FATAL: could not start WAL streaming: ERROR: requested starting point 0/4000000 on timeline 1 is not in this server's history DETAIL: This server's history forked from timeline 1 at 0/3010758. ● Might appear to work, but may silently corrupt your database!

Wrong approach 1 2 # # T T R R E E S WRONG! S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3

Solution 1: Rebuild from scratch ● Erase old master, take new base backup from new master, and copy it over. ● Is slow – Reads all data from disk – Sends all data through the network – Writes all data to disk

Solution 2: rsync ● Call pg_start_backup() in new master ● Use rsync to resynchronize the data dir ● Be careful which options you use ● Still slow – Reads all data from disk

Solution 3: pg_rewind ● Fast – Only reads and copies data that was changed

Terminology Target 1 2 # # T T R R E E S Source S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3 Source : New master. Not modified. Target : Old master. Overwritten with data from source.

How it works ● Find out what blocks the lost transactions modified ● Copy those blocks from source to target ~ rsync on steroids

How it works? 1. Determine point of divergence 1 2 # # T T R R E E S S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3 ● Looks at the pg_control file on both systems

How it works? 2. Scan the old WAL 1 2 # # T T R R E E S S N N I TLI 1 I TLI 1 TLI 2 I N S E R T # 3 ● Build a list of blocks that were changed on TLI 1 – lost transactions

How it works? 3. Copy over all changed blocks ● Copies everything except those blocks of relation files that were not modified – pg_clog, etc. – Configuration files – FSM and VM files

File map backup_label.old ( COPY ) base/1/12454_fsm (COPY) base/1/12454_vm (COPY) base/1/12456_fsm (COPY) ... pg_xlog/archive_status/000000010000000000000003.done (COPY) pg_xlog/archive_status/00000002.history.done (COPY) postgresql.auto.conf (COPY) postgresql.conf (COPY) recovery.done (COPY) base/12726/12475 ( COPY_TAIL ) pg_xlog/archive_status/000000010000000000000003.ready (REMOVE) pg_xlog/archive_status/000000010000000000000002.00000028.backup.done (REMOVE) pg_xlog/archive_status/000000010000000000000001.done (REMOVE) pg_xlog/000000010000000000000004 (REMOVE) pg_xlog/000000010000000000000002.00000028.backup (REMOVE) pg_xlog/000000010000000000000001 (REMOVE) pg_stat/global.stat ( REMOVE ) pg_stat/db_12726.stat (REMOVE) pg_stat/db_0.stat (REMOVE)

How it works? 4. Reset the control file ● Start recovery from the point of divergence, not some later checkpoint. 1 2 # # T T R R E E S S N N I TLI 1 I TLI 2 I N C S H E E R C T K P # O 3 I N T

How it works? 5. Replay new WAL ● On first startup (not by pg_rewind) 1 2 # # T T R R E E S S N N I TLI 1 I TLI 2 I N R S E E D R O T p o # i 3 n t

Usage Usage: pg_rewind [OPTION]... Options: -D, --target-pgdata=DIRECTORY existing data directory to modify --source-pgdata=DIRECTORY source data directory to sync with --source-server=CONNSTR source server to sync with -P, --progress write progress messages -n, --dry-run stop before modifying anything --debug write a lot of debug messages -V, --version output version information, then exit -?, --help show this help, then exit

Example $ pg_rewind --source-server="host=localhost port=5433 dbname=postgres" --target-pgdata=data-master The servers diverged at WAL position 0/3000060 on timeline 1. Rewinding from last common checkpoint at 0/2000060 on timeline 1 Done!

Example: --progress $ pg_rewind --progress --source-server="host=localhost port=5433 dbname=postgres" –target-pgdata=data-master connected to remote server The servers diverged at WAL position 0/3000060 on timeline 1. Rewinding from last common checkpoint at 0/2000060 on timeline 1 reading source file list reading target file list reading WAL in target Need to copy 51 MB (total source directory size is 67 MB) 53071/53071 kB (100%) copied creating backup label and updating control file Done!

Example: clean failover $ pg_rewind --source-server="host=localhost port=5433 dbname=postgres" --target-pgdata=data-master The servers diverged at WAL position 0/4000098 on timeline 1. No rewind required.

Caveats ● Must set wal_log_hints=on in postgresql.conf – before the meteor strikes – or use checksums (initdb -k) ● All WAL needs to be available in the pg_xlog directories

More use cases ● Synchronize new master to old master, instead of the other way 'round ● Synchronize a second standby after failing over ● Rewind back to an earlier base backup (haven't tested those, might not work currently)

Design goals ● Safety – exit gracefully without modifying target if rewind is not possible – dry-run mode – unrecognized files are copied in toto ● Ease of use ● Speed – Faster than reading through all data

In PostgreSQL 9.5 ● Included in PostgreSQL 9.5 ● In src/bin/pg_rewind ● Changed WAL record format in 9.5 – to support pg_rewind among other things

pg_rewind – for 9.3 and 9.4 Stand-alone versions available for 9.3 and 9.4 ● https://github.com/vmware/pg_rewind ● PostgreSQL-licensed

Future development ● Be smarter about what to copy – Free Space Maps, Visibility Maps – pg_clog , pg_subtrans , etc. ● When copying a whole file, use checksums to skip unchanged parts – like rsync does ● Allow using pg_rewind when there have been timeline switches in the target – http://www.postgresql.org/message-id/CAPpHfdtaqYG z6JKvx4AdySA_ceqPH7Lki=F1HxUeNNaBRC7Mtw@mail.gma l.com

Thank you! ● Thanks to Michael Paquier and everyone else involved! ● Questions?

pg_rewind Heikki Linnakangas Your typical setup Streaming - PowerPoint PPT Presentation

pg_rewind Heikki Linnakangas Your typical setup Streaming Replication STANDBY Master Your typical catastrophe Streaming Replication STANDBY Master Standby takes over STANDBY Master MASTER Wait, the old master survived after all!

SCRAM authentication Heikki Linnakangas / Pivotal pg_hba.conf # TYPE DATABASE USER

Fractional L evy processes Heikki Tikanm aki Stockholm, March 15 2010 Heikki Tikanm aki

HistorySampo Heikki Rantala heikki.rantala@aalto.fi Contents Background of HISTO ontology Why

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory October 2019 Heikki

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki

Impact of algorithmic data analysis Heikki Mannila Helsinki Institute for Information Technology

Play, Pause, Rewind The Era of Archived Lifetimes Dr Cathal Gurrin (Insight Centre for Data

Flashback to the Past, Rewind to the Future The Challenge of Imagining a New National Library

DEVIALET 2013 Range Presentation 1 REWIND 3 years on the market and over 1600

Lets rewind, to Saturday 3 December 2016 and set the scene. Sunshine, a little overcast at 7am

Soyez Efficace, Rembobinez Be efficient, Rewind Stphane Devismes (Verimag, UGA)

Dancing Back to 1914 Rewind Dance Dance 1 20/11/15 2 20/11/15 Heritage g 3 20/11/15 4

BIBLICAL SURVEY Joseph: the Background Story I wish life had a rewind button Our study of Joseph

are delivering, solid view on H2 July 26, 2018 Heikki Takala, President and CEO Normal

Clustering Aggregation Aristides Gionis, Heikki Mannila, and Panayiotis Tsaparas Helsinki

Broad-based improvement and acceleration July 29, 2015 - Heikki Takala, President and CEO

Johan Sverdrup Polymer Force Seminar, Stavanger April 5 th , 2016 Classification: Internal

2018 Annual Assessor Meeting Madison, Milwaukee, Eau Claire, Wausau and Green Bay October and

Ottheinrich Bible, ~1430 Bavarian State Library, Munich Jakob Claesz van Utrecht, Gavn Altar,

Basin Types and Their Exploration and Production Reserves and Resources Petroleum Geology AES/TA

Peninsula Clean Energy Board of Directors Meeting October 24, 2019 Agenda Call to order /

Capital Master Planning Update CWU Board of Trustees Jan 26-27, 2017 Gene Shoda, Vice President

Facilities Master Plan Final Presentation June 13, 2019 Information Gathering We want your

Nor orth O Old Woodw dward/ d/ Bates St Ba Street Parki king a and Redevel elop opmen

Sambuz

Useful Links

Newsletter

Mail Us