Extreme Programming meets Real Time Data Gel Goldsby & Tom - - PowerPoint PPT Presentation

extreme programming meets real time data
SMART_READER_LITE
LIVE PREVIEW

Extreme Programming meets Real Time Data Gel Goldsby & Tom - - PowerPoint PPT Presentation

Extreme Programming meets Real Time Data Gel Goldsby & Tom Johnson, Unruly When Santa Got Stuck Up The Chimney Title position here Your title sit over it When Data Got Stuck Up The Chimney Title position here Your title sit over it


slide-1
SLIDE 1

Extreme Programming meets Real Time Data

Gel Goldsby & Tom Johnson, Unruly

slide-2
SLIDE 2

Title position here

Your title sit over it

When Santa Got Stuck Up The Chimney

slide-3
SLIDE 3

Title position here

Your title sit over it

When Data Got Stuck Up The Chimney

slide-4
SLIDE 4

Title position here

Your title sit over it

  • Hello. My name is...

Gel Goldsby Reporting and Data Team Lead Tom Johnson Senior Developer

slide-5
SLIDE 5

Title position here

Your title sit over it

We Believe In XP

slide-6
SLIDE 6

Title position here

Your title sit over it

Extreme Programming Values

  • Communication
  • Simplicity
  • Feedback
  • Courage
slide-7
SLIDE 7

Title position here

Your title sit over it

Simplicity

slide-8
SLIDE 8

Title position here

Your title sit over it

Simplicity

slide-9
SLIDE 9

Title position here

Your title sit over it

Simplicity

slide-10
SLIDE 10

Title position here

Your title sit over it

Simplicity

slide-11
SLIDE 11

Title position here

Your title sit over it

Simplicity

slide-12
SLIDE 12

Title position here

Your title sit over it

Simplicity

slide-13
SLIDE 13

Title position here

Your title sit over it

Our Reporting Pipeline

pipeline events

slide-14
SLIDE 14

Title position here

Your title sit over it

Our Reporting Pipelines

super duper wizzy pipeline events

  • ld pipeline
slide-15
SLIDE 15

Title position here

Your title sit over it

Shut It Off!

super duper wizzy pipeline events

  • ld pipeline
slide-16
SLIDE 16

Title position here

Your title sit over it

A Closer Look At Our Pipeline

pipeline events events consumer

slide-17
SLIDE 17

Title position here

Your title sit over it

It’s Not A Truck, It’s A Series of Tubes

events sequencer parser consumer nginx

slide-18
SLIDE 18

Title position here

Your title sit over it

Queueing with S3

events consumer S3 S3 S3 parser sequencer nginx

slide-19
SLIDE 19

Title position here

Your title sit over it

Queueing with S3

events consumer S3 S3 S3 S3 S3 S3 parser sequencer nginx

slide-20
SLIDE 20

Title position here

Your title sit over it

We Need More Power, Cap’n

events sequencer parser consumer nginx

slide-21
SLIDE 21

Title position here

Your title sit over it

nginx parser

We Need More Power, Cap’n

events sequencer parser consumer nginx

slide-22
SLIDE 22

Title position here

Your title sit over it

nginx nginx parser parser

We Need More Power, Cap’n

events sequencer parser consumer nginx

slide-23
SLIDE 23

Title position here

Your title sit over it

nginx nginx nginx parser parser parser

We Need More Power, Cap’n

events sequencer parser consumer nginx

slide-24
SLIDE 24

Title position here

Your title sit over it

Two Writes Can Make A Wrong

events sequencer parser consumer nginx

slide-25
SLIDE 25

Title position here

Your title sit over it

Two Writes Can Make A Wrong

events sequencer parser consumer nginx

slide-26
SLIDE 26

Title position here

Your title sit over it

Christmas was saved!

slide-27
SLIDE 27

Title position here

Your title sit over it

Simplicity

  • Each component does one thing

and does it well

slide-28
SLIDE 28

Title position here

Your title sit over it

Just Another Report, Right?

  • Improving targeting
  • Correlate events for same ad call
  • Need to join on session id
  • Needs disaggregated data
slide-29
SLIDE 29

Title position here

Your title sit over it

Aggregation

Campaign Site Acme Zombo.com Acme Zombo.com Acme Zombo.com Acme Nyan.cat Brawndo Zombo.com Brawndo Nyan.cat Brawndo Nyan.cat

slide-30
SLIDE 30

Title position here

Your title sit over it

Aggregation

Campaign Site Acme Zombo.com Acme Zombo.com Acme Zombo.com Acme Nyan.cat Brawndo Zombo.com Brawndo Nyan.cat Brawndo Nyan.cat

slide-31
SLIDE 31

Title position here

Your title sit over it

Aggregation

Campaign Site Acme Zombo.com Acme Zombo.com Acme Zombo.com Acme Nyan.cat Brawndo Zombo.com Brawndo Nyan.cat Brawndo Nyan.cat

slide-32
SLIDE 32

Title position here

Your title sit over it

Aggregation

Count Campaign Site 1 Acme Zombo.com 1 Acme Zombo.com 1 Acme Zombo.com 1 Acme Nyan.cat 1 Brawndo Zombo.com 1 Brawndo Nyan.cat 1 Brawndo Nyan.cat

slide-33
SLIDE 33

Title position here

Your title sit over it

Aggregation

Count Campaign Site 3 Acme Zombo.com 1 Acme Nyan.cat 1 Brawndo Zombo.com 2 Brawndo Nyan.cat

slide-34
SLIDE 34

Title position here

Your title sit over it

Aggregation

Count Campaign Site Lots More 3 Acme Zombo.com ... ... 1 Acme Nyan.cat ... ... 1 Brawndo Zombo.com ... … 2 Brawndo Nyan.cat ... ...

slide-35
SLIDE 35

Title position here

Your title sit over it

Lots of buckets

slide-36
SLIDE 36

Title position here

Your title sit over it

Micro-Aggregations

  • Roughly 20k events per second
  • Batched: window size 20s
  • x7 reduction factor
  • Reduces writes to db
slide-37
SLIDE 37

Title position here

Your title sit over it

Make America Aggregate Again

  • Daily
  • From ~800 million events
  • Compacts to ~2 million rows
  • 400x reduction
  • Reduces disk usage
  • Speeds up queries
slide-38
SLIDE 38

Title position here

Your title sit over it

Querying data

view historic data today’s data user query

slide-39
SLIDE 39

Title position here

Your title sit over it

Aggregatable facts

Campaign Site Acme Zombo.com Acme Zombo.com Acme Zombo.com Acme Nyan.cat Brawndo Zombo.com Brawndo Nyan.cat Brawndo Nyan.cat

slide-40
SLIDE 40

Title position here

Your title sit over it

Add in session ids

Campaign Site Session Id Acme Zombo.com Wo5Meiri Acme Zombo.com Xotaipu6 Acme Zombo.com Xu1goor7 Acme Nyan.cat eVai6OhS Brawndo Zombo.com

  • iMoo7Du

Brawndo Nyan.cat aiSh1eej Brawndo Nyan.cat rae8ieY5

slide-41
SLIDE 41

Title position here

Your title sit over it

Does not aggregate well

Campaign Site Session Id Acme Zombo.com Wo5Meiri Acme Zombo.com Xotaipu6 Acme Zombo.com Xu1goor7 Acme Nyan.cat eVai6OhS Brawndo Zombo.com

  • iMoo7Du

Brawndo Nyan.cat aiSh1eej Brawndo Nyan.cat rae8ieY5

slide-42
SLIDE 42

Title position here

Your title sit over it

What next?

slide-43
SLIDE 43

Title position here

Your title sit over it

What next? Spikes!

slide-44
SLIDE 44

Title position here

Your title sit over it

Big Data!

slide-45
SLIDE 45

Title position here

Your title sit over it

Big data: big choices

  • Many options
  • Available documentation was:

○ Academic ○ Evangelical ○ Naive/Trivial

slide-46
SLIDE 46

Title position here

Your title sit over it

Spark!

slide-47
SLIDE 47

Title position here

Your title sit over it

Big data: big costs

  • Infrastructure
  • Language (Scala)
  • Incompatible with current approach
  • Performance tradeoffs
slide-48
SLIDE 48

Title position here

Your title sit over it

Why we could step away

  • Understood our data better
  • Underestimated costs
  • We know our code
  • We can change our code
slide-49
SLIDE 49

Title position here

Your title sit over it

Feedback

  • Regular retrospectives
  • Shared understanding of “research”
  • Shared understanding of value
slide-50
SLIDE 50

Title position here

Your title sit over it

Courage

  • Not afraid to try new things
  • Not afraid to change direction
  • Not lured by what we “ought” to do
slide-51
SLIDE 51

Title position here

Your title sit over it

The Shape of our Data

slide-52
SLIDE 52

Title position here

Your title sit over it

The Shape of our Data Disaggregated

slide-53
SLIDE 53

Title position here

Your title sit over it

The Shape of our Data Disaggregated Unsampled

slide-54
SLIDE 54

Title position here

Your title sit over it

The Shape of our Data Disaggregated Unsampled Real Time

slide-55
SLIDE 55

Title position here

Your title sit over it

Programmatic Pacing Disaggregated Unsampled Real Time

slide-56
SLIDE 56

Title position here

Your title sit over it

Operational Debugging Disaggregated Unsampled Real Time

slide-57
SLIDE 57

Title position here

Your title sit over it

Auction Data Disaggregated Unsampled Real Time

slide-58
SLIDE 58

Title position here

Your title sit over it

Advertising 101

user loads page payments auction ad call user interaction

slide-59
SLIDE 59

Title position here

Your title sit over it

Funnel of data

user loads page payments auction ad call user interaction

slide-60
SLIDE 60

Title position here

Your title sit over it

Pipelines to match data shape

user loads page payments auction ad call user interaction

slide-61
SLIDE 61

Title position here

Your title sit over it

Our Actual Reporting Pipelines

payments pipeline events ad call pipeline user interaction pipeline auction pipeline

slide-62
SLIDE 62

Title position here

Your title sit over it

When We Get Overloaded...

payments pipeline events ad call pipeline user interaction pipeline auction pipeline

slide-63
SLIDE 63

Title position here

Your title sit over it

When We Get Overloaded...

payments pipeline events ad call pipeline user interaction pipeline auction pipeline

slide-64
SLIDE 64

Title position here

Your title sit over it

When We Get Overloaded...

payments pipeline events ad call pipeline user interaction pipeline auction pipeline

slide-65
SLIDE 65

Title position here

Your title sit over it

Ensuring real time performance

slide-66
SLIDE 66

Title position here

Your title sit over it

Ensuring real time performance

slide-67
SLIDE 67

Title position here

Your title sit over it

Communication

  • How data was used
  • Performance requirements

○ What was needed ○ What wasn’t needed ○ Hard vs soft requirements

slide-68
SLIDE 68

Title position here

Your title sit over it

Simplicity

  • Green cards
  • 10 pair-days total
  • Incremental
  • Separable
slide-69
SLIDE 69

Title position here

Your title sit over it

Let's talk about our databases

slide-70
SLIDE 70

Title position here

Your title sit over it

Row-based database

Column A Column B Column C Column D Column E

slide-71
SLIDE 71

Title position here

Your title sit over it

Row-based database

Column A Column B Column C Column D Column E

slide-72
SLIDE 72

Title position here

Your title sit over it

Columnar database

Column A Column B Column C Column D Column E

slide-73
SLIDE 73

Title position here

Your title sit over it

Row-based database

Column A Column B Column C Column D Column E

slide-74
SLIDE 74

Title position here

Your title sit over it

Columnar database

Column A Column B Column C Column D Column E

slide-75
SLIDE 75

Title position here

Your title sit over it

Vectorwise or Postgres?

slide-76
SLIDE 76

Title position here

Your title sit over it

Query-based routing

api user query Vectorwise Postgres

slide-77
SLIDE 77

Title position here

Your title sit over it

Query-based routing

api user query Vectorwise Postgres

slide-78
SLIDE 78

Title position here

Your title sit over it

Query-based routing

api user query Vectorwise Postgres

slide-79
SLIDE 79

Title position here

Your title sit over it

Conclusion

slide-80
SLIDE 80

Title position here

Your title sit over it

Conclusion

  • Simplicity
  • Communication
  • Feedback
  • Courage
slide-81
SLIDE 81

Title position here

Your title sit over it

Thank you!

slide-82
SLIDE 82

Title position here

Your title sit over it

Questions?

(this space intentionally left blank)