LESSONS & PITFALLS
DATA AT SWEDEN'S TELEVISION
Ismail Elouafiq
DATA AT SWEDEN'S TELEVISION Ismail Elouafiq A wide spectrum of Apps - - PowerPoint PPT Presentation
LESSONS & PITFALLS DATA AT SWEDEN'S TELEVISION Ismail Elouafiq A wide spectrum of Apps A wide spectrum of Apps Running on different platforms A wide spectrum of Users STRATEGY ANALYSTS PRODUCT OWNERS A wide spectrum of Users STRATEGY
LESSONS & PITFALLS
Ismail Elouafiq
Running on different platforms
PRODUCT OWNERS STRATEGY
ANALYSTS
PRODUCT OWNERS STRATEGY
ANALYSTS
AUTHORS/ EDITORS DEVELOPERS
tl;dr:
Defining what to prioritise
tl;dr:
Defining what to prioritise
tl;dr:
Defining what to prioritise
tl;dr:
Spoilers: how and why we now use protobuf, functional data engineering and ETL practices Experimenting and iterating in small increments Defining what to prioritise
AI Deep reinforcement learning
tl;dr:
Spoilers: how and why we now use protobuf, functional data engineering and ETL practices Experimenting and iterating in small increments Defining what to prioritise
BLOCKCHAIN
tl;dr:
Experimenting and iterating in small increments Defining what to prioritise
ismail.land/velocity
tl;dr:
Experimenting and iterating in small increments Defining what to prioritise
tl;dr:
Experimenting and iterating in small increments Defining what to prioritise
ismail.land/velocity
What events should you collect?
What events should you collect?
what we want to know
How many people read the article per day
what we want to know
click scroll share
How many people read the article per day
what we can
what we want to know
click scroll share
How many people read the article per day
what we can
events
what we want to know
click scroll share
How many people read the article per day
what we can
explicit model events
what we want to know
How many people read the article per day
click scroll share
what we can
explicit model events
If you could do anything with data... What would you actually use for decision making
If you could do anything with data... What would you actually use for decision making
tl;dr:
Experimenting and iterating in small increments Defining what to prioritise
ismail.land/velocity
1
COLLECT
2
INGEST
SDK
First we need to collect data
1
COLLECT
2
INGEST
SDK
events
Event API
1
COLLECT
2
INGEST
SDK
events
Event API
publish
1
COLLECT
2
INGEST
SDK
events
Event API
publish
pub/sub
2
INGEST pub/sub
2
INGEST pub/sub
2
STORE
2
INGEST pub/sub
2
STORE Events table
2
INGEST pub/sub
2
STORE Events table
subscribe
judge-judi
write
2
INGEST pub/sub
2
STORE Events table
subscribe
judge-judi
write
3
STORE
2
INGEST
1
COLLECT
3
STORE
2
INGEST
1
COLLECT
{event_type: click} { eventType: click} {eventType: klick}
3
STORE
2
INGEST
1
COLLECT
{event_type: click} { eventType: click} {eventType: klick}
3
STORE
2
INGEST
1
COLLECT
More Issues
Multiple teams/platforms =>takes time to update the clients The schema is sent with every event Unclear types (arbitrary memory allocation)
3
STORE
2
INGEST
1
COLLECT
More Issues
Multiple teams/platforms =>takes time to update the clients The schema is sent with every event Unclear types (arbitrary memory allocation) We know the schema on all levels we have a common model for the data.. how can we make use of that...
ENTER PROTOBUF
Keepign a centralized Event Schema
ENTER PROTOBUF
Keepign a centralized Event Schema
person.proto
ENTER PROTOBUF
Keepign a centralized Event Schema
person.proto person.go person.js person.f compiler
ENTER PROTOBUF
Keepign a centralized Event Schema
person.js Person Client Person Server person.js binary serialize deserialize
1 - Define the Schema As a .proto file
ENTER PROTOBUF
Keepign a centralized Event Schema
event.proto
1 - Define the Schema As a .proto file
ENTER PROTOBUF
Keepign a centralized Event Schema
event.proto
2 - Publish libraries Publish using CI pipeline
go, js, java, swift
1 - Define the Schema As a .proto file
ENTER PROTOBUF
Keepign a centralized Event Schema
event.proto
2 - Publish libraries Publish using CI pipeline 2 - Fetch Fetch in SDKs (serialization) Fetch in Judy (deserialization) Use to generate table
go, js, java, swift
3
STORE
2
INGEST
1
COLLECT DEFINE
My work here is done!
Not really... Backward and forward compatibility Table changes Language agnostic but nor really Lack of support
The Data Pyramid
Collection and ingestion Storage, transformation, monitoring
The Data Pyramid
Collection and ingestion Storage, transformation, monitoring
The Data Pyramid
Metrics, aggregations, KPIs Collection and ingestion Learn, Optimise, Experiment Storage, transformation, monitoring
The Data Pyramid
Nirvana AI, machine learning Metrics, aggregations, KPIs Collection and ingestion Learn, Optimise, Experiment Storage, transformation, monitoring
The Data Pyramid
"The pyramids of Egypt could be explained as symbolic stairways to the stars, according to a British scientist" _ The Guardian
The Data Pyramid
"The pyramids of Egypt could be explained as symbolic stairways to the stars, according to a British scientist" _ The Guardian "The data pyramid could be explained as a symbolic stairway to the A.I., according to myself" _ Me
Endorse me on Linkedin
3
STORE
2
INGEST
1
COLLECT DEFINE
We have the data
Now what?
3
STORE
2
INGEST
1
COLLECT DEFINE
We have the data
Now what?
5 4
Batch jobs etl Streaming
Analyze
Service/API Dashboard Reports
Present
3
STORE
2
INGEST
1
COLLECT DEFINE
We have the data
Now what?
5 4
Batch jobs etl Streaming
Analyze
Service/API Dashboard Reports
Present
Some data to be aggregated
Inputs
Our mysterious job pipeline
Aggregated Table (article reads) Per DAY
Output
article reads per day
click events article titles
today- partition magic job Append
today- partition Failed magic job
today- partition magic job Append
Immutable data partitions Versioned logic
Principle: Ensuring reproducibility
On ETL design
Ensure reproducibility Practice failure in small increments Defining conventions in one place
ISMAIL.LAND/VELOCITY
keeping a tidy pipeline
Br3Ak 'em rULeS
summary...
summary...
summary...
summary... (what worked for us)
summary...
DATA DATA DATA
(what worked for us)
ismail.land/velocity