POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) - - PowerPoint PPT Presentation

postgresql on aws
SMART_READER_LITE
LIVE PREVIEW

POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) - - PowerPoint PPT Presentation

Please write title, subtitle Please write title, subtitle and speaker name in all and speaker name in all capital letters capital letters POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PGConf.EU 2017, Warsaw


slide-1
SLIDE 1

Please write title, subtitle and speaker name in all capital letters

POSTGRESQL ON AWS:

TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PGConf.EU 2017, Warsaw 26-10-2017

Please write title, subtitle and speaker name in all capital letters

slide-2
SLIDE 2

2

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder" Use bullet points to summarize information rather than writing long paragraphs in the text box

ABOUT ME

Alexander Kukushkin

Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de Twitter: @cyberdemn

slide-3
SLIDE 3

Put images in the grey dotted box "unsupported placeholder" - behind the

  • range box and quote in

capital letters

FACTS & FIGURES

slide-4
SLIDE 4

4

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

ZALANDO AT A GLANCE

~3.6billion EURO

revenue 2016

~200 million

visits per month

>13,000

employees in Europe

50%

return rate across all categories

~21

million

active customers

~250,000

product choices

~2,000

brands

15

countries

slide-5
SLIDE 5

5

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder" Use bullet points to summarize information rather than writing long paragraphs in the text box

ZALANDO TECHNOLOGY

  • > 150 databases in DC
  • > 260 databases on AWS
  • > 1800 tech employees
  • We are hiring!
slide-6
SLIDE 6

6

Please write the title in all capital letters

ZALANDO TECHNOLOGY

BERLIN DORTMUND DUBLIN HELSINKI ERFURT MÖNCHENGLADBACH HAMBURG

slide-7
SLIDE 7

7

Please write the title in all capital letters

WHY WE USE AWS

  • Fast hardware (service) provisioning
  • Easy scale up/down/in/out
  • Pay only for resources you need
slide-8
SLIDE 8

8

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

GLOSSARY

  • RDS - Relational Database Service
  • EC2 - Elastic Compute Cloud
  • EBS - Elastic Block Store
  • S3 - Simple Storage Service
  • ELB - Elastic Load Balancing
  • AZ - Availability Zone
  • ASG - Auto Scaling Group
slide-9
SLIDE 9

9

Please write the title in all capital letters

POSTGRESQL HA ON AWS

  • Shared storage - EBS, attach it to a different EC2 Instance

○ Works within one AZ

  • Storage mirroring - Multi-AZ RDS (EBS is replicated to another AZ)

○ Works between AZ ○ Replicas can’t execute queries

  • Streaming replication - you can start additional RDS replicas

○ Works between AZ ○ Replicas can execute queries ○ Automatic Failover problem (if not Multi-AZ deployment)

slide-10
SLIDE 10

10

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

EC2 VS. RDS: PRICING*

Instance Type RDS EC2 (db.)t2.micro $18/month $10/month (db.)t2.small $32/month $20/month (db.)t2.large $122/month $78/month (db.)m4.large $158/month $87/month (db.)m4.xlarge $317/month $175/month

* storage price is not included

slide-11
SLIDE 11

11

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

EC2 VS. RDS: PRICING*

Instance Type RDS Multi-AZ 2 x EC2 + ELB** 3 x EC2 + ELB** (db.)t2.micro $36.0/month $41.8/month $51.8/month (db.)t2.small $64/month $60.5/month $79.9/month (db.)t2.large $244/month $177.1/month $254.9/month (db.)m4.large $316/month $194.4/month $280.8/month (db.)m4.xlarge $634/month $367.2/month $540.0/month

* storage price is not included ** ELB - ~$21/month

slide-12
SLIDE 12

12

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • No superuser access
  • No replication connection
  • No custom extensions
  • Either you run Multi-AZ RDS deployment (replica can’t receive queries)
  • Or Master + Replica on RDS and don’t have automatic failover

RDS SUMMARY

slide-13
SLIDE 13

13

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Patroni - High-Availability and automatic failover
  • Spilo - Docker package of Patroni and WAL-E for AWS or Kubernetes
  • Use CloudFormation stacks and ASG for deployments
  • One Docker container per EC2 Instance
  • ELB for traffic routing

SPILO AND PATRONI

slide-14
SLIDE 14

14

Please write the title in all capital letters

AWS DEPLOYMENT

slide-15
SLIDE 15

15

Please write the title in all capital letters

Master ELB Security Group Cluster Security Group Auto-Scaling Availability Zone A Data Volume Root volume Master Elastic Load Balancer Cloud Formation Stack

Replica DB

Availability Zone B Data Volume Root volume

Master DB

Availability Zone C Data Volume Root volume

Replica DB

Replica ELB Security Group Replica Elastic Load Balancer

5432, 8008 5432, 8008

GET /master GET /replica db.zalando db-repl.zalando S3 bucket: Backup + WAL

User Data:

  • Docker image
  • Backup schedule
  • Superuser password
  • Replication password
  • Postgres parameters

Etcd

slide-16
SLIDE 16

16

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Instance Types: t2, m4, c4, x1, r4, p2, g3, f1, i3, d2
  • Instance Sizes: nano, micro, small, medium, large,

xlarge, 2xlarge, 4xlarge, 10xlarge, 16xlarge, 32xlarge

  • Performance: Fixed vs. Burstable (T2 Instances)
  • Storage: Instance Store (Ephemeral) vs. EBS
  • EBS-optimized Instances
  • Enhanced Networking

EC2 INSTANCE FEATURES

slide-17
SLIDE 17

17

Put images in the grey dotted box "unsupported placeholder" - behind the

  • range box (left side

stays white) Write the quote in all capital letters

BURSTABLE PERFORMANCE OR WHY IS MY DATABASE VERY SLOW?

slide-18
SLIDE 18

18

Please write the title in all capital letters

BURSTABLE PERFORMANCE

~100% ~10% ~2h 30m t2.micro

slide-19
SLIDE 19

19

Please write the title in all capital letters

BURSTABLE PERFORMANCE

t2.micro

slide-20
SLIDE 20

20

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Designed to provide moderate baseline performance and the capability to burst to significantly higher

performance as required by your workload.

  • If your account is less than 12 months old, you can use a t2.micro instance for free within certain usage

limits.

T2 INSTANCES

Instance Type Initial CPU credit Credits earned per hour vCPUs Base performance (CPU utilization) Max CPU credit balance t2.micro 30 6 1 10% 144 t2.small 30 12 1 20% 288 t2.medium 60 24 2 40% 578 t2.large 60 36 2 60% 864

slide-21
SLIDE 21

21

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • One CPU credit is equal to one vCPU running at 100% utilization for
  • ne minute.
  • When a T2 instance uses fewer CPU resources than its base

performance level allows (such as when it is idle), the unused CPU credits (or the difference between what was earned and what was spent) are stored in the credit balance for up to 24 hours, building CPU credits for bursting.

  • When a T2 instance requires more CPU resources than its base

performance level allows, it uses credits from the CPU credit balance to burst up to 100% utilization.

CPU CREDITS

slide-22
SLIDE 22

22

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • WAL-E wal-fetch/wal-prefetch is terribly slow

○ Prefetch spawns 8 worker processes (by default) and can burn all CPU Credits How we solved it:

  • 1. use “-p 0” to disable prefetch
  • 2. reimplemented ‘wal-fetch’ in bash + curl + openssl

HORROR STORY

slide-23
SLIDE 23

23

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • CPUCreditUsage metric indicates the number of CPU credits used

during the measurement period

  • CPUCreditBalance metric indicates the number of unused CPU credits

a T2 instance has earned

MONITOR CPU CREDITS

t2.small

slide-24
SLIDE 24

24

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • 1 year or 3 year contracts
  • Significant discount compared to On-Demand instance pricing
  • Capacity reservation
  • Customers using both Reserved and On-Demand instances will have

Reserved Instance rates applied first to minimize costs

RESERVED INSTANCES

slide-25
SLIDE 25

25

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

RESERVED INSTANCES

m4.large, Standard 1-Year Term Payment Option Upfront Monthly Effective Hourly Savings On-Demand Hourly No Upfront $0 $59.79 $0.082 32% $0.12 Partial Upfront $342 $28.47 $0.078 35% All Upfront $670 $0 $0.076 36% m4.large, Standard 3-Year Term No Upfront $0 $44.17 $0.061 50% $0.12 Partial Upfront $736 $20.44 $0.056 53% All Upfront $1383 $0 $0.053 56% * Frankfurt region

slide-26
SLIDE 26

26

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Bidding on “AWS overcapacity”
  • Variable price point, save up to 90% vs. on-demand

*Risks* Unavailability or loss of instance if outbid!!! SPOT INSTANCES

slide-27
SLIDE 27

27

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

SPOT MARKET

slide-28
SLIDE 28

28

Please write the title in all capital letters

  • Async processing
  • Reporting
  • CI
  • Staging systems
  • Testing of backups

USE CASES

Everything that can fail or be unavailable for short duration

slide-29
SLIDE 29

29

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • There were only 2 AZ in eu-central-1 until June 2017
  • ASG is trying to do its best to distribute resources across different AZ, but when Spot Price

is high, it might happen that two EC2 Instances will run in the same AZ

  • When price will go down, ASG will rebalance resources, i.e. spawn a new instance in another

AZ and terminate one of the running instances

  • With 50% probability it will kill the “master”

How we solved it: AutoScalingGroup -> “TerminationPolicies”: [“NewestInstance”, “Default”]

HORROR STORY

slide-30
SLIDE 30

30

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

ELASTIC BLOCK STORE

Name io1 gp2 st1 sc1 Size 4GB - 16TB 1GB - 16TB 500GB - 16TB 500GB - 16TB Max IOPS/Volume 20000 10000 500 250 Max Throughput/Volume 320MB/s 160MB/s 500MB/s 250MB/s Price

$0.149/GB-month $0.078/provisioned IOPS $0.119/GB-month $0.054/GB-month $0.03/GB-month

slide-31
SLIDE 31

31

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • ST1 and SC1 - HDD, good throughput, but low IOPS
  • GP2 and IO1 - SSD, usually good choice for databases
  • EBS volumes are created in a specific AZ, and can then be attached to

any instances in that AZ

  • IOPS and throughput depends on Volume Size

ELASTIC BLOCK STORE

slide-32
SLIDE 32

32

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • IO1 gives better IOPS guarantees
  • But you have to pay for it – the same size costs 25% more. Plus you

have to pay for provisioned IOPS

  • For RDS minimum size of IO1 volume is 100 GB + 1000 provisioned
  • IOPS. It will cost you $13.80 + $110 per month
  • You can get 1000 IOPS with 334 GB GP2 volume for $42/month

IO1 VS. GP2

slide-33
SLIDE 33

33

Put images in the grey dotted box "unsupported placeholder" - behind the

  • range box (left side

stays white) Write the quote in all capital letters

SEQUEL: BURSTABLE PERFORMANCE OR WHY IS MY DATABASE SLOW AGAIN?

slide-34
SLIDE 34

34

BURSTABLE PERFORMANCE

~3000 IOPS ~1200 IOPS ~ 45 min 400 GB Volume

slide-35
SLIDE 35

35

BURSTABLE PERFORMANCE

~3000 IOPS ~1200 IOPS ~ 45 min 400 GB Volume

slide-36
SLIDE 36

36

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

I/O CREDITS AND BURST PERFORMANCE

  • The performance of gp2 volumes is tied to volume size (3 IOPS/GiB)
  • The maximum and initial I/O credit balance for a volume is 5.4 million
  • When your volume requires more than the baseline performance I/O

level, it draws on I/O credits in the credit balance to burst to the required performance level, up to a maximum of 3,000 IOPS

  • When your volume uses fewer I/O credits than it earns in a second,

unused I/O credits are added to the I/O credit balance

slide-37
SLIDE 37

37

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

GP2 VOLUMES EXPLAINED

slide-38
SLIDE 38

38

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Especially BurstBalance if GP2 volume is smaller than 1TB

MONITOR I/O WITH CLOUDWATCH

500 GB Volume 2500 1250

slide-39
SLIDE 39

39

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Dedicated bandwidth to Amazon EBS, with options between 500 Mbps

and 12,000 Mbps, depending on the instance type you use

  • Provides the best performance for your EBS volumes by minimizing

contention between Amazon EBS I/O and other traffic from your instance

EBS-OPTIMIZED INSTANCES

slide-40
SLIDE 40

40

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

EBS THROUGHPUT

InstanceType vCPU Memory Max IOPS Throughput (Mb/s) Price (per month)

m4.large 2 8 GB 3600 56.25 $87.6 r4.large 2 15 GB 3000 54 $116.8 m4.xlarge 4 16 GB 6000 93.75 $175.2 r4.xlarge 4 30 GB 6000 109 $233.6 m4.4xlarge 16 64 GB 16000 250 $700.7 r4.4xlarge 16 120 GB 18750 437 $934.4

slide-41
SLIDE 41

41

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • t2.micro, t2.small and t2.medium are showing up to ~60 MB/s on EBS
  • t2.large, t2.xlarge and t2.2xlarge are showing up to ~120 MB/s on EBS
  • I don’t have any numbers about Max. IOPS
  • But! There is no guaranteed throughput for T2 instances!

EBS THROUGHPUT

slide-42
SLIDE 42

42

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • GP2 is MUCH cheaper than IO1
  • To get more than 10000 IOPS or 160MB/s with GP2 -

build a RAID-0 from multiple volumes

  • Choose an EC2 Instance with enough bandwidth

EBS TIPS

slide-43
SLIDE 43

43

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder" Use bullet points to summarize information rather than writing long paragraphs in the text box

HORROR STORY

  • AWS didn’t provide a means to

monitor EBS Burst Balance until November 2016

  • RDS still doesn’t provide

information about GP2 Volume Burst Balance

slide-44
SLIDE 44

44

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Pros:

○ Located on disks that are physically attached to the host computer ○ Amazing throughput and latencies compared to EBS

  • Cons:

○ Provides only temporary block-level storage ○ Data in the instance store is lost under the following circumstances: ■ The underlying disk drive fails ■ The instance stops or terminates

INSTANCE STORE VOLUMES

slide-45
SLIDE 45

45

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

PRICE COMPARISON

InstanceType m4.xlarge r4.xlarge i3.xlarge m4.2xlarge r4.2xlarge i3.2xlarge vCPU 4 4 4 8 8 8 Memory 16 GB 30 GB 30 GB 32 GB 60 GB 60 GB Max IOPS 6000 6000 6000 8000 12000 12000 Throughput (Mb/s) 93.75 109 100 125 218 200 Instance Storage (NVMe)

  • 950 GB
  • 1900 GB

Price (per month) $175.2 $233.6 $271.56 $350.4 $467.2 $543.12 Price with 950 GB EBS $288.25 $346.65 $463.45 $580.25 Price with 1900 GB EBS $401.3 $459.7 $576.5 $693.3

slide-46
SLIDE 46

46

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • For high-intensive OLTP, i3 instances are a rescue:

○ r4.2xlarge + 3.3TB EBS ($863.75/month) wasn’t able to keep up ○ The switch to i3.2xlarge solved all problems and saved 37% of costs We run ~30 clusters on i3 instances and only one has failed during last 6 months.

INSTANCE STORE SUMMARY

slide-47
SLIDE 47

47

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

ENHANCED NETWORKING

  • Enhanced networking uses single root I/O virtualization (SR-IOV) to

provide high-performance networking capabilities on supported instance types

  • SR-IOV is a method of device virtualization that provides higher I/O

performance and lower CPU utilization when compared to traditional virtualized network interfaces

  • Enhanced networking provides higher bandwidth, higher packet per

second (PPS) performance, and consistently lower inter-instance latencies

slide-48
SLIDE 48

48

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Heavy loaded M4 instances periodically losing network
  • Heavy loaded R4 instances were simply dying (terminated by AWS)
  • The only visible indicator of a problem was “StatusCheckFailed”

CloudWatch metric

HORROR STORY

slide-49
SLIDE 49

49

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • It took us several days of investigating:

○ Thank you AWS support for pointing to an outdated ixgbevf driver ○ The rest we figured out on our own

  • Our AMI is built on Ubuntu 14.04:

○ Ubuntu 14.04 has an outdated driver ixgbevf ○ Ubuntu 14.04 doesn’t have an ena driver

HORROR STORY

slide-50
SLIDE 50

50

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

1. AMI must contain drivers (linux kernel module) a. Ixgbevf for C3, C4, D2, I2, R3, and M4 (excluding m4.16xlarge) b. ena for F1, I3, P2, R4, X1, and m4.16xlarge 2. You have to explicitly enable enhanced networking for an AMI or a specific instance. If you use AMI from AWS it’s already enabled.

ENABLING ENHANCED NETWORKING

slide-51
SLIDE 51

51

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

INTEL 82599 VF

[ec2-user ~]$ modinfo ixgbevf filename: /lib/modules/3.10.48-55.140.amzn1.x86_64/kernel/drivers/amazon/ixgbevf/ixgbevf.ko version: 2.14.2 $ aws ec2 describe-instance-attribute --instance-id instance_id --attribute sriovNetSupport $ aws ec2 describe-image-attribute --image-id ami_id --attribute sriovNetSupport

  • Verify that the ixgbevf module is installed
  • Check that support is enabled on Instance level
  • Or support is enabled on AMI level

If the attribute isn't set, SriovNetSupport is empty; otherwise, it is set as follows: "SriovNetSupport": { "Value": "simple" },

slide-52
SLIDE 52

52

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

ENA

[ec2-user ~]$ modinfo ena filename: /lib/modules/4.4.11-23.53.amzn1.x86_64/kernel/drivers/amazon/net/ena/ena.ko version: 0.6.6 $ aws ec2 describe-instances --instance-id instance_id --query 'Reservations[].Instances[].EnaSupport' $ aws ec2 describe-images --image-id ami_id --query 'Images[].EnaSupport'

  • Verify that the ena module is installed
  • Check that support is enabled on Instance level
  • Or support is enabled on AMI level

If the attribute is set, the response is true

slide-53
SLIDE 53

53

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

VERIFY THAT THE MODULE IS IN USE

[ec2-user ~]$ ethtool -i eth0 driver: vif version: firmware-version: bus-info: vif-0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no [ec2-user ~]$ ethtool -i eth0 driver: ixgbevf version: 2.14.2 firmware-version: N/A bus-info: 0000:00:03.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no [ec2-user ~]$ ethtool -i eth0 driver: ena version: 0.6.6 firmware-version: bus-info: 0000:00:03.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no

slide-54
SLIDE 54

54

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Start with small Instances and Volumes, it’s easy to

scale up later

  • Some of the resources (CPUCreditUsage,

CPUCreditBalance, BurstBalance) it’s possible to monitor only with CloudWatch

  • Always do backups and test them

SUMMARY

slide-55
SLIDE 55

55

Please write the title in all capital letters Use bullet points to summarize information rather than writing long paragraphs in the text box

  • Easy Amazon EC2 Instance Comparison - EC2instances.info
  • Easy Amazon RDS Instance Comparison - RDSInstances.info
  • Simple monthly calculator - calculator.s3.amazonaws.com/index.html
  • Patroni - github.com/zalando/patroni
  • Spilo - github.com/zalando/spilo

USEFUL LINKS

slide-56
SLIDE 56

56

Put images in the grey dotted box "unsupported placeholder" - behind the

  • range box (left side

stays white) Write the quote in all capital letters

QUESTIONS?