About me A data engineering challenge - - PowerPoint PPT Presentation

about me a data engineering challenge
SMART_READER_LITE
LIVE PREVIEW

About me A data engineering challenge - - PowerPoint PPT Presentation

About me A data engineering challenge Transaction Data store responsible for Billing Internal debugging


slide-1
SLIDE 1
slide-2
SLIDE 2
  • About me
slide-3
SLIDE 3
slide-4
SLIDE 4
  • A data engineering challenge
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
  • Transaction Data store responsible for

○ Billing ○ Internal debugging ○ Downstream services ■ Reporting ■ Analytics Warehouse

slide-8
SLIDE 8
  • OLTP (Online Transactional Processing)

■ Every write to DB = $$ exchanging hands ■ No downtime, low latency writes ■ Accuracy is crucial

  • OLAP (Online Analytical Processing)

■ Monthly financial CSV exports & list endpoints ■ Easy aggregation ■ Slice and dice over arbitrary set of columns

slide-9
SLIDE 9
  • Mistakes we made
slide-10
SLIDE 10

CX sees 2 days later, he see

slide-11
SLIDE 11

Downloaded CSV file on Jan 1 Re-pulled export on Jan 5

CSV Exports

slide-12
SLIDE 12
  • Our solution
slide-13
SLIDE 13
slide-14
SLIDE 14

1. Immutable - Records are never changed, only inserted

slide-15
SLIDE 15

Why Immutable?

  • Biggest pain point
  • Able to track changes over time (data lineage)
  • Financial data should never be mutable

○ useful for auditing ○ state is reproducible at any point in time ○ allows for correction in next accounting period

slide-16
SLIDE 16

Immutable event log

What CX observed was no fluke! July 1st July 3rd

slide-17
SLIDE 17

Digiday, 2017

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas”

slide-18
SLIDE 18
slide-19
SLIDE 19

Before After See total commissions by day

slide-20
SLIDE 20

microsoft excel stock image

slide-21
SLIDE 21

Benefit of Delta

  • Easy aggregation
  • A single service responsible for computing deltas
  • “Atomic” - self contained description of the change
  • Events can arrive out of order, and end state will be eventually

consistent With Latest State

  • Greater tolerance for missing events, later states will overwrite

incorrect earlier states

slide-22
SLIDE 22

Digiday, 2017

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas” 3. Denormalized - few tables, lots of dimensions

slide-23
SLIDE 23

More OLAP use cases than OLTP. OLAP use cases - large # of records

  • Marketing - Campaign analysis
  • Finance - Billing Exports & Invoices
  • Data team - Analytics
  • Partners - API for historical data

OLTP use cases - single record

  • Customer Support - Debugging individual orders
  • Inserting events

Why Denormalized?

slide-24
SLIDE 24

Hybrid Performance Approach

  • Use Postgres DB
  • Denormalized Data

Hybrid in the sense that data format is optimized for querying over historical time ranges yet DB is a traditional OLTP database.

slide-25
SLIDE 25

For faster performance with CSV Exports and aggregations Previous Financial Data Store New Data Store - denormalized

slide-26
SLIDE 26
slide-27
SLIDE 27

Digiday, 2017

1. Immutable 2. Deltas for Easy Aggregation - represent amounts in “deltas” 3. Denormalized - few tables, lots of dimensions 4. Separate record keeping for billing

slide-28
SLIDE 28

Why keep separate records for billing?

  • Need stable tracking of which events

fit into each invoice

  • Enable later adjustments
  • Allow changes in billing logic

○ may bill on events vs orders ○ may bill per customer vs per order ○ may bill weekly vs monthly

slide-29
SLIDE 29

Product/Service rendered Invoicing

Immutable Event Log

slide-30
SLIDE 30

Digiday, 2017

1. Immutable 2. Deltas for easy aggregation 3. Denormalized 4. Separate record keeping for billing 5. Self Heal - programmatic detection & adjustment

slide-31
SLIDE 31

Self-Heal - programmatic detection & adjustment

  • Immutable data helps with this
  • So does having separate records for billing
  • Limiting points of failure

Example:

  • Orders that were processed “late”, that didn’t make it into the last billing cycle, should

be automatically added to the next cycle

  • Automatic checks of billing records (immutable) against order event records (also

immutable)

slide-32
SLIDE 32
slide-33
SLIDE 33

Use stable ID & ordering throughout your procession pipeline

  • Ordering (seqn) and Event ID should be set as upstream as possible in the order pipeline, and

carried all the way downstream.

  • Good for debugging
slide-34
SLIDE 34
  • Dates really matter.

○ ○ ○ ○

  • Avoid floats
  • Double-Entry doesn’t matter

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43