Scaling fashionably How PostgreSQL helped Zalando to become one of - - PowerPoint PPT Presentation

scaling fashionably
SMART_READER_LITE
LIVE PREVIEW

Scaling fashionably How PostgreSQL helped Zalando to become one of - - PowerPoint PPT Presentation

Scaling fashionably How PostgreSQL helped Zalando to become one of the biggest online fashion retailers in Europe About me Valentine Gogichashvili Database Engineer @Zalando twitter: @valgog google+: +valgog email: valentine.gogichashvili


slide-1
SLIDE 1
slide-2
SLIDE 2

Scaling fashionably

How PostgreSQL helped Zalando to become one

  • f the biggest online fashion retailers in Europe
slide-3
SLIDE 3

About me Valentine Gogichashvili

Database Engineer @Zalando twitter: @valgog google+: +valgog email: valentine.gogichashvili@zalando.de

slide-4
SLIDE 4
slide-5
SLIDE 5

15 countries 3 fulfillment centers 13.7+ million active customers 1.8 billion € revenue 2013 150,000+ products 640+ million visits in first half-year 2014 One of Europe's largest

  • nline fashion retailers
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Some more numbers

200+ deployment units (WARs) 1300+ production instances 80+ database master instances 90+ different databases 300+ developers 10 database engineers

slide-9
SLIDE 9

Even more numbers

  • > 4.0 TB of PostgreSQL data
  • Biggest instances (not counted before)

○ eventlogdb (3TB)

■ 20 GB per week

○ riskmgmtdb (5TB)

■ 12 GB per day

slide-10
SLIDE 10

Biggest challenges

  • Constantly growing
  • Fast development cycles
  • No downtimes are tolerated
slide-11
SLIDE 11

Agenda

  • access data
  • change data models without downtimes
  • shard without limits
  • monitor

How we

slide-12
SLIDE 12

Agenda

  • access data
  • change data models without downtimes
  • shard without limits
  • monitor

How we

slide-13
SLIDE 13

Accessing data

  • customer
  • bank account
  • order -> bank account
  • order position
  • return order -> order
  • return position -> order position
  • financial document
  • financial transaction -> order
slide-14
SLIDE 14

Accessing data

NoSQL ▶ map your object hierarchy to a document ▶ (de-)serialization is easy ▶ transactions are not needed ▷ No SQL ▷ implicit schemas are tricky

slide-15
SLIDE 15

Accessing data

ORM

▶ is well known to developers ▶ CRUD operations are easy ▶ all business logic inside your application ▶ developers are in their comfort zone

slide-16
SLIDE 16

Accessing data

ORM

▶ is well known to developers ▶ CRUD operations are easy ▶ all business logic inside your application ▶ developers are in their comfort zone ▷ error prone transaction management ▷ you have to reflect your tables in your code ▷ all business logic inside your application ▷ schema changes are not easy

slide-17
SLIDE 17

Accessing data

Are there alternatives to ORM?

slide-18
SLIDE 18

Accessing data

Are there alternatives to ORM? Stored Procedures ▶ return/receive entity aggregates ▶ clear transaction scope ▶ more data consistency checks ▶ independent from underlying data schema

slide-19
SLIDE 19

JDBC Java Application Database Tables

Accessing data

slide-20
SLIDE 20

JDBC Java Application Database Tables Stored Procedure API Database Tables

Accessing data

slide-21
SLIDE 21

Java Sproc Wrapper

JDBC Java Application Java Application Sproc Wrapper Database Tables Stored Procedure API Database Tables

slide-22
SLIDE 22

CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER;

SQL

Java Sproc Wrapper

slide-23
SLIDE 23

@SProcService public interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender); }

JAVA

CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER;

SQL

Java Sproc Wrapper

slide-24
SLIDE 24

@SProcService public interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender); } CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER;

JAVA SQL

Java Sproc Wrapper

slide-25
SLIDE 25

@SProcCall List<Order> findOrders(@SProcParam String email);

JAVA

CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF record AS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER;

SQL

Java Sproc Wrapper

slide-26
SLIDE 26

CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF record AS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER; @SProcCall List<Order> findOrders(@SProcParam String email);

JAVA SQL

Java Sproc Wrapper

slide-27
SLIDE 27

Stored Procedures for developers ▷ CRUD operations need too much code ▷ Developers have to learn SQL ▷ Developers can write bad SQL ▷ Code reviews are needed

slide-28
SLIDE 28

Stored Procedures for developers ▷ CRUD operations need too much code ▷ Developers have to learn SQL ▷ Developers can write bad SQL ▷ Code reviews are needed ▶ Use-case driven ▶ Developers have to learn SQL ▶ Developers learn how to write good SQL

slide-29
SLIDE 29

Horror story

▷ Never map your data manually ▷ Educate yourself

slide-30
SLIDE 30

Database Tables

Stored Procedure API versioning

api_v13_01 search_path = api_v13_01, public;

slide-31
SLIDE 31

Database Tables

Stored Procedure API versioning

api_v13_02 api_v13_01 search_path = api_v13_01, public;

slide-32
SLIDE 32

Database Tables

Stored Procedure API versioning

api_v13_02 api_v13_01 search_path = api_v13_01, public; search_path = api_v13_02, public;

slide-33
SLIDE 33

Database Tables

Stored Procedure API versioning

api_v13_02 search_path = api_v13_02, public; api_v13_01 search_path = api_v13_01, public;

slide-34
SLIDE 34

Database Tables

Stored Procedure API versioning

api_v13_02 search_path = api_v13_02, public; api_v13_01

slide-35
SLIDE 35

▶ Tests are done to the whole API version ▶ No API migrations needed ▶ Deployments are fully automated Stored Procedure API versioning

slide-36
SLIDE 36

Agenda

  • access data
  • change data models without downtimes
  • shard without limits
  • monitor

How we

slide-37
SLIDE 37

Easy schema changes

  • PostgreSQL

▶ Schema changes with minimal locks with: ADD/RENAME/DROP COLUMN ADD/DROP DEFAULT VALUE ▶ CREATE/DROP INDEX CONCURRENTLY ▷ Constraints are still difficult to ALTER (becoming much better in 9.4)

slide-38
SLIDE 38

Easy schema changes

  • Stored Procedure API layer

▶ Can fill missing data on the fly ▶ Helps to change data structure without application noticing it

slide-39
SLIDE 39

Easy schema changes

  • Read and write to old structure
  • Write to both structures, old and new.

Try to read from new, fallback to old

  • Migrate data
  • Read from new, write to old and new
slide-40
SLIDE 40

Easy schema changes

  • Schema changes using SQL script files

○ SQL scripts written by developers (DBDIFFs) ○ registering DBDIFFs with Versioning ○ should be reviewed by DB guys ○ DB guys are rolling DB changes on the live system

slide-41
SLIDE 41

Easy schema changes

BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); CREATE TABLE z_data.order_address (

  • a_id int SERIAL,
  • a_country z_data.country,
  • a_city varchar(64),
  • a_street varchar(128), ...

); ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT;

DBDIFF SQL

slide-42
SLIDE 42

Easy schema changes

BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); \i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT;

DBDIFF SQL

slide-43
SLIDE 43

Easy schema changes

BEGIN; SELECT _v.register_patch('ZEOS-5430.order'); \i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO ‘3s’; ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id); COMMIT;

DBDIFF SQL

slide-44
SLIDE 44

Easy schema changes

slide-45
SLIDE 45

Easy schema changes

slide-46
SLIDE 46

Easy schema changes

No downtime due to migrations or deployment since we use PostgreSQL

slide-47
SLIDE 47

Easy schema changes

One downtime due to migrations or deployment since we use PostgreSQL

slide-48
SLIDE 48

Horror story

▷ Invest in staging environments ▷ Do not create artificial process bottlenecks ▷ Educate yourself

slide-49
SLIDE 49

Agenda

  • access data
  • change data models without downtimes
  • shard without limits
  • monitor

How we

slide-50
SLIDE 50

One big database

▶ Joins between any entities ▶ Perfect for BI ▶ Simple access strategy ▶ Less machines to manage

slide-51
SLIDE 51

One big database

▷ Data does not fit into memory ▷ OLTP becomes slower ▷ Longer data migration times ▷ Database maintenance tasks take longer

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Sharded database

▶ Data fits into memory ▶ IO bottleneck wider ▶ OLTP is fast again ▶ Data migrations are faster ▶ Database maintenance tasks are faster

slide-55
SLIDE 55

Sharded database

▷ Joins only between entities aggregates ▷ BI need more tooling ▷ Accessing data needs more tooling ▷ Managing more servers needs more tooling

slide-56
SLIDE 56

Sharded database

▷ Need more tooling

slide-57
SLIDE 57

Sharding without limits

Java Application Sproc Wrapper Database Tables Stored Procedure API Database Tables

slide-58
SLIDE 58

Sharding without limits

Java Application Sproc Wrapper

...

Shard 1 Shard 2 Shard 3 Shard N

slide-59
SLIDE 59

@SProcCall int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber, @SProcParam String email, @SProcParam Gender gender);

JAVA

Sharding with Java Sproc Wrapper

@SProcCall Article getArticle(@SProcParam @ShardKey Sku sku);

JAVA

@SProcCall(runOnAllShards = true, parallel = true) List<Order> findOrders(@SProcParam String email);

JAVA

slide-60
SLIDE 60

Entity lookup strategies

  • search on all shards (in parallel)
  • hash lookups
  • unique shard aware ID

○ Virtual Shard IDs (pre-sharding)

Sharding with Java Sproc Wrapper

slide-61
SLIDE 61

Agenda

  • access data
  • change data models without downtimes
  • shard without limits
  • monitor

How we

slide-62
SLIDE 62

Monitoring

slide-63
SLIDE 63

pg_view

slide-64
SLIDE 64

Monitoring

  • Tools

○ psql wrapper on DBA client machines ■ psql_<instance>_<ENV> ○ aliases on the host machines ■ pg_ctl_<instance> ■ psql_<instance> ■ pg_taillog_<instance> ○ helper scripts ■ assign or remove service/elastic IPs ■ backup all instances on the host

slide-65
SLIDE 65

Monitoring

  • Nagios/Icinga (being replaced by ZMON2)
  • Dedicated 24x7 monitoring team
  • Custom monitoring infrastructure ZMON2
slide-66
SLIDE 66

PGObserver

slide-67
SLIDE 67

PGObserver

slide-68
SLIDE 68

What we are working at

  • DaaS
  • Continuous deployment (including the DBs)
  • PGObserver 2.0 (join the effort!)
slide-69
SLIDE 69

Links

SProcWrapper – Java library for stored procedure access github.com/zalando/java-sproc-wrapper PGObserver – monitoring web tool for PostgreSQL github.com/zalando/PGObserver pg_view – top-like command line activity monitor github.com/zalando/pg_view

slide-70
SLIDE 70

Thank you!

slide-71
SLIDE 71

Easy schema changes

Database Tables api_v13_45

search_path = api_v13_45, public;

Order with addresses

  • rder_number
  • shipping address

Read and write to old structure

slide-72
SLIDE 72

Easy schema changes

Database Tables api_v13_46

search_path = api_v13_46, public;

Order with addresses

  • rder_number
  • shipping address
  • shipping address ID

Address

  • ID
  • City
  • Street
  • ...

Coalesce

Migrate all addresses Write to both structures, old and new Try to read from new, fallback to old

slide-73
SLIDE 73

Easy schema changes

Database Tables api_v13_47

search_path = api_v13_47, public;

Order with addresses

  • rder_number
  • shipping address
  • shipping address ID

Address

  • ID
  • City
  • Street
  • ...

Read from new Write to both structures, old and new

slide-74
SLIDE 74

Easy schema changes

Database Tables api_v13_48

search_path = api_v13_48, public;

Order with addresses

  • rder_number
  • shipping address
  • shipping address ID

Address

  • ID
  • City
  • Street
  • ...

Read and write to new Drop old structures