Efficiently Backing up Terabytes of Data with pgBackRest David - PowerPoint PPT Presentation

Efficiently Backing up Terabytes of Data with pgBackRest David Steele Crunchy Data PGDay Russia 2017 July 6, 2017

Agenda 1 Why Backup? 2 Living Backups 3 Design 4 Features Performance 5 Changes to Core 6 In The Pipeline 7 8 Questions? 2 / 25

Why Backup? Hardware Failure: No amount of redundancy can prevent it. Replication: WAL archive for when async streaming gets behind. Sync replica from backup instead of master. Corruption: Can be caused by hardware or software. Detection is, of course, a challenge. 3 / 25

Why Backup? Accidents: So you dropped a table? Deleted your most important account? Development: No more realistic data than production! May not be practical due to size / privacy issues. Reporting: Use backups to standup an independent reporting server. Recover important data that was removed on purpose. 4 / 25

Schr¨ odingers Backup The state of any backup is unknown until a restore is attempted. 5 / 25

Making Backups Useful Find a way to use your backups Syncing / New Replicas Offline reporting Offline data archiving Development Unused code paths will not work when you need them unless they are tested Regularly scheduled automated failover using backups to restore the old primary Regularly scheduled disaster recovery (during a maintenance window if possible) to test restore techniques 6 / 25

pgBackRest Design Rsync powers many database backup solutions but it has some serious limitations: Single-process. One second timestamp resolution. Incremental backups require previous backup to be uncompressed. pgBackRest does not use rsync, tar or other typical backup tools: Protocol supports local/remote operation. Solves timestamp resolution issue. 7 / 25

Multi-Process Backup & Restore Compression is the usual bottleneck: But most PostgreSQL backup solutions are single-process. pgBackRest solves the problem with multi-processing. 1TB/hr raw throughput even on a 1Gb/s link using multiple cores. 8 / 25

Local or Remote Operation Custom protocol allows backup, restore, and archive locally or remotely via SSH with minimal configuration. No direct access to PostgreSQL is required from the remote server which enhances security. 9 / 25

Full, Incremental, & Differential Backups Multiple backup types: Full Differential Incremental pgBackRest is not susceptible to the time resolution issues of rsync, making differential and incremental backups safe. 10 / 25

Backup Rotation & Archive Expiration Retention based on full or differential backups. WAL retention for all backups or configure number of recent backups. WAL required for consistency of backups always preserved. 11 / 25

Backup Integrity PostgreSQL page checksums are validated if present ( � 9.3). Checksums are calculated for every file in the backup and rechecked during a restore. After a backup required WAL segments are checked in the repository. Simple backup format: Backup directories have the same format as a PostgreSQL cluster. Clusters can be brought up in place with snapshots if compression is disabled. Advantageous for terabyte-scale databases. All operations utilize file and directory level fsync to ensure durability. 12 / 25

Backup Resume An aborted backup can be resumed from the point where it stopped. Checksumming files on resume takes place on the backup server. Saves load on the master by not compressing and transmitting resumed files. 13 / 25

Streaming Compression & Checksums Compression and checksum calculations are performed in stream. Compression is not done more than once. Lower compression is used when the destination is uncompressed to efficiently utilize CPU and network bandwidth. 14 / 25

Delta Restore Backup manifest contains checksum and size for every file. On delta restore all files not present in the backup or with a different size are removed from PGDATA. The remaining files are checksummed and only files with a checksum mismatch are restored. Multi-processing can lead to dramatic reductions in restore time and network utilization. 15 / 25

Advanced Parallel Archiving Dedicated commands are included for both pushing WAL to the archive and retrieving WAL from the archive. Push command automatically detects WAL segments that are pushed multiple times and de-duplicates when the segment is identical, otherwise an error is raised. Push and get commands both ensure that the database and repository match by comparing PostgreSQL versions and system identifiers to prevent misconfiguration. Asynchronous parallel archiving allows compression and transfer to be offloaded to another process which maintains continuous connections to the remote server, improving throughput significantly. Critical feature for databases with extremely high write volume. 16 / 25

Tablespace & Link Support Tablespaces are fully supported and on restore tablespaces can be remapped to any location. Remap all tablespaces to one location with a single command which is useful for development restores. File and directory links are supported for any file or directory in the PostgreSQL cluster. Restore all links to their original locations, remap some or all links, or restore some or all links as normal files or directories within the cluster directory 17 / 25

Selective Restore Restore only specified databases out of a cluster backup. Other files are restored as sparse, zeroed files the save space. All WAL must be replayed. Cannot connect to non-restored databases, can only drop them. 18 / 25

Backup from Standby Backup is started on master. Backup starts when replay location on standby reaches start backup location. Reduces load on master because replicated files are copied from the standby. 19 / 25

S3 Support Repositories stored in S3. All pgBackRest features supported. Efficient implementation. 20 / 25

Compatibility with PostgreSQL � 8.3 Support for versions down to 8.3, since older versions of PostgreSQL are still regularly utilized. 21 / 25

Performance Parameters pgBackRest rsync processes: 1 124 Seconds network compression: l3 141 Seconds (.13X Faster) destination compression: none processes: 2 84 Seconds network compression: l3 N/A (1.48X Faster) destination compression: none processes: 1 334 Seconds network compression: l6 510 Seconds (1.52X Faster) destination compression: l6 processes: 2 174 Seconds network compression: l6 N/A (2.93X Faster) destination compression: l6 22 / 25

Changes to Core Completed Exclude files/directories reset or rebuilt on recovery. Make pg stop backup() wait optional. Non-exclusive backups (Magnus Hagander). Archive timeout fix (Michael Paquier). Planned More exclusions. Allow group read on ✩ PGDATA. Pass multiple WAL segments to archive command. Configurable WAL segment size (Beena Emerson). 23 / 25

In The Pipeline PostgreSQL 10 support. Encryption. Zstandard compression. Parallel archive-get. 24 / 25

Questions? website: http://www.pgbackrest.org email: david@pgbackrest.org email: david@crunchydata.com releases: https://github.com/pgbackrest/pgbackrest/releases slides & demo: https://github.com/dwsteele/conference/releases 25 / 25

Efficiently Backing up Terabytes of Data with pgBackRest David - PowerPoint PPT Presentation

Efficiently Backing up Terabytes of Data with pgBackRest David Steele Crunchy Data PGDay Russia 2017 July 6, 2017 Agenda 1 Why Backup? 2 Living Backups 3 Design 4 Features Performance 5 Changes to Core 6 In The Pipeline 7 8

Efficiently Backing up Terabytes of Data with pgBackRest David Steele LISA 2015 November 11,

High Performance pgBackRest David Steele Crunchy Data PGConf.EU 2018 October 24, 2018 Agenda

Backing Chain Management in libvirt and qemu Eric Blake <eblake@redhat.com> KVM Forum,

Unpredictable & interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata

Inducing Efficiently Inducing Efficiently optimizi optimizing outpati ng outpatient i ent

Peter G. Kazansky Optoelectronics Research Centre, University of Southampton It is estimated

Managing Terabytes Selena Deckelmann Emma, Inc - http://myemma.com PostgreSQL Global Development

Terasaur Gigabytes to Terabytes @smalljones - Paul Jones - School of Information and Library

EGTDC Database Course 2004 Database Users And Security Backing-Up Data Tim Booth :

Backing up Wikipedia Databases Jaime Crespo & Manuel Arstegui Data Persistence Subteam,

NEBC Database Course 2008 Database Users And Security Backing-Up Data Tim Booth :

Dealing with Data Gradients: Backing Out & Calibration Nathaniel Osgood Agent-Based

Backing Up Your Mac A Joe ON Tech Guide Backup Basics Your Mac contains valuable and perhaps

Backing Up Photos 1 What Can Happen to Your Masterpiece? 2 3 4 5 Your Photos Here 6 7 8

Demand Paging Code pages are stored in a memory-mapped file on the backing store some are

Values-Based Grantmaking Practice Stanislava Stancheva Grants Officer Mission Backing the

VESTESEN A/S PRESENTATION wind/diesel desalination Introduction Stand-alone wind/diesel (WD)

Claredale Environmental Assessment Overview Problem Solving Process Alternative 3B Evaluation

Welcome to the Texas Department of Slide 1 Transportation San Antonio Districts virtual

Tailored 685 Third Avenue Technologies LLC New York, NY 10017 Tel: (212) 503-6300 Date:

Back Office & Data Processing Services About us Valad Infotech Solutions is privately owned

The Good News CDFIs have been stepping into the breach to address lending - related

Compliance Outreach Program National Seminar For Investment Adviser and Investment Company

INVESTOR PRESENTATION Winter 2020 FORWARD LOOKING STATEMENTS This presentation contains

Efficiently Backing up Terabytes of Data with pgBackRest David - PowerPoint PPT Presentation

Efficiently Backing up Terabytes of Data with pgBackRest David Steele Crunchy Data PGDay Russia 2017 July 6, 2017 Agenda 1 Why Backup? 2 Living Backups 3 Design 4 Features Performance 5 Changes to Core 6 In The Pipeline 7 8

Efficiently Backing up Terabytes of Data with pgBackRest David Steele LISA 2015 November 11,

High Performance pgBackRest David Steele Crunchy Data PGConf.EU 2018 October 24, 2018 Agenda

Backing Chain Management in libvirt and qemu Eric Blake &lt;eblake@redhat.com&gt; KVM Forum,

Unpredictable &amp; interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata

Inducing Efficiently Inducing Efficiently optimizi optimizing outpati ng outpatient i ent

Peter G. Kazansky Optoelectronics Research Centre, University of Southampton It is estimated

Managing Terabytes Selena Deckelmann Emma, Inc - http://myemma.com PostgreSQL Global Development

Terasaur Gigabytes to Terabytes @smalljones - Paul Jones - School of Information and Library

EGTDC Database Course 2004 Database Users And Security Backing-Up Data Tim Booth :

Backing up Wikipedia Databases Jaime Crespo &amp; Manuel Arstegui Data Persistence Subteam,

NEBC Database Course 2008 Database Users And Security Backing-Up Data Tim Booth :

Dealing with Data Gradients: Backing Out &amp; Calibration Nathaniel Osgood Agent-Based

Backing Up Your Mac A Joe ON Tech Guide Backup Basics Your Mac contains valuable and perhaps

Backing Up Photos 1 What Can Happen to Your Masterpiece? 2 3 4 5 Your Photos Here 6 7 8

Demand Paging Code pages are stored in a memory-mapped file on the backing store some are

Values-Based Grantmaking Practice Stanislava Stancheva Grants Officer Mission Backing the

VESTESEN A/S PRESENTATION wind/diesel desalination Introduction Stand-alone wind/diesel (WD)

Claredale Environmental Assessment Overview Problem Solving Process Alternative 3B Evaluation

Welcome to the Texas Department of Slide 1 Transportation San Antonio Districts virtual

Tailored 685 Third Avenue Technologies LLC New York, NY 10017 Tel: (212) 503-6300 Date:

Back Office &amp; Data Processing Services About us Valad Infotech Solutions is privately owned

The Good News CDFIs have been stepping into the breach to address lending - related

Compliance Outreach Program National Seminar For Investment Adviser and Investment Company

INVESTOR PRESENTATION Winter 2020 FORWARD LOOKING STATEMENTS This presentation contains

Backing Chain Management in libvirt and qemu Eric Blake <eblake@redhat.com> KVM Forum,

Unpredictable & interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata

Backing up Wikipedia Databases Jaime Crespo & Manuel Arstegui Data Persistence Subteam,

Dealing with Data Gradients: Backing Out & Calibration Nathaniel Osgood Agent-Based

Back Office & Data Processing Services About us Valad Infotech Solutions is privately owned