From batch to streaming to both Herman Schaaf, Senior Software - - PowerPoint PPT Presentation

from batch to streaming to both
SMART_READER_LITE
LIVE PREVIEW

From batch to streaming to both Herman Schaaf, Senior Software - - PowerPoint PPT Presentation

From batch to streaming to both Herman Schaaf, Senior Software Engineer A Story About me Herman Schaaf, Senior Software Engineer Data Platform Tribe The Cube From batch to streaming The Single Unified Log The Single Unified Log


slide-1
SLIDE 1

From batch to streaming to both

Herman Schaaf, Senior Software Engineer

slide-2
SLIDE 2

A Story

slide-3
SLIDE 3

About me

Herman Schaaf, Senior Software Engineer Data Platform Tribe

slide-4
SLIDE 4

“The Cube”

slide-5
SLIDE 5

From batch to streaming

slide-6
SLIDE 6

The Single Unified Log

slide-7
SLIDE 7

The Single Unified Log

slide-8
SLIDE 8

“Organizations which design data platforms are constrained to produce designs which are copies of their communication structures” Lesson 1: Conway’s Law is true for data platforms

slide-9
SLIDE 9

…but then metadata is critical Being self-serve is good

slide-10
SLIDE 10

So let’s talk about metadata

slide-11
SLIDE 11

A simple convention

prod.identity-service.AuditLog.identity.AuditMessage prod.flyingcircus.applog.applog.Message prod.raccoon_bandit.experiment.bandit.Metric

slide-12
SLIDE 12

Descriptive Structural Administrative

we had some of this Some, from using protobuf schemas nope.

slide-13
SLIDE 13

Lesson 2: Metadata is Critical

  • Especially relationships
  • Ideally automated
  • Ideally from the start
  • Tools like Schema Registry are a

start, but not the full solution

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

Lesson 3: Data Engineers Control the Plot Line

slide-17
SLIDE 17

business events

slide-18
SLIDE 18

From streaming to both

slide-19
SLIDE 19
slide-20
SLIDE 20

Lesson 4: Repeatability is important

  • Streams have to choose between

replays and accepting errors as permanent

  • Batch processing can be done again

any time

  • Going straight to the archive in small

batches gets the benefits of both.

slide-21
SLIDE 21

Key Takeaways

  • Conway’s Law is true for data

platforms

  • Metadata is Critical
  • Data Engineers Control the Plot Line
  • Repeatability is important
slide-22
SLIDE 22

Contact If you have any questions regarding Skyscanner please contact: Herman Schaaf herman.schaaf@skyscanner.net

Thanks

Herman Schaaf @ironzeb