Democratizing Metric Definition & Discovery at Airbnb Lauren - - PowerPoint PPT Presentation

democratizing metric definition discovery at airbnb
SMART_READER_LITE
LIVE PREVIEW

Democratizing Metric Definition & Discovery at Airbnb Lauren - - PowerPoint PPT Presentation

Lauren Chircus / April 18, 2018 Democratizing Metric Definition & Discovery at Airbnb Lauren Chircus / April 18, 2018 Changing the paradigm on metric management Does this metrics workflow look familiar? Schedule A/B Add table Create


slide-1
SLIDE 1

Lauren Chircus / April 18, 2018

Democratizing Metric Definition & Discovery at Airbnb

slide-2
SLIDE 2

Lauren Chircus / April 18, 2018

Changing the paradigm on metric management

slide-3
SLIDE 3

Does this metrics workflow look familiar?

Create Table Schedule Airflow Job Backfill Table A/B Testing Config Anomaly Detection Config Tell consumers to update queries Add table to Superset Create Dashboard Monitor Pipelines

slide-4
SLIDE 4

Lauren Chircus

Company: Airbnb Role: Product Manager Previous Role: Data Scientist Twitter: @lchircus Fun Fact: This Airbnb near Salinas was my favorite

slide-5
SLIDE 5

You can change the paradigm!

Global Metric Config Create Dashboard

slide-6
SLIDE 6
  • 1. Airbnb’s journey
  • 2. Why you should make dimensions

first class citizens

  • 3. Why prioritize bonus features early

Changing the metric management paradigm

slide-7
SLIDE 7

Airbnb’s Journey

slide-8
SLIDE 8

Anomaly Detection A/B testing

Plethora of tools for building & accessing data Airflow

slide-9
SLIDE 9

Airflow

Anomaly Detection A/B testing

Strong, open source-based compute environment

slide-10
SLIDE 10

Consuming metrics was painful, too

Metrics weren’t reusable across tools -> discrepancies

slide-11
SLIDE 11

Consuming metrics was painful, too

Metrics weren’t reusable across tools -> discrepancies Metrics were hard to find

slide-12
SLIDE 12

Consuming metrics was painful, too

Metrics weren’t reusable across tools -> discrepancies Metrics were hard to find Required SQL knowledge or prepared dashboards

slide-13
SLIDE 13

Airflow

Anomaly Detection A/B testing

Global Metrics Framework

slide-14
SLIDE 14

What is Global Metrics?

“Global Metrics” is the concept that metrics should be defined in one place, have strong metadata, and available wherever you need them.

slide-15
SLIDE 15

Can we reuse existing infra?

Global Metrics Framework ML Feature Framework

?

slide-16
SLIDE 16

Logic & metadata store Compute data Consuming Apps

The basic frameworks look similar

slide-17
SLIDE 17

Logic & metadata store Compute data Consuming Apps

Search Fraud .... Pricing

ML: serve data to models

slide-18
SLIDE 18

Logic & metadata store Compute data Consuming Apps A/B testing Anomaly Detection

Metrics: serve data to apps

slide-19
SLIDE 19
  • Leverage as much

information as possible

  • Entirely offline
  • Diverse metric types

Metrics ML Features Metrics are different than ML features

slide-20
SLIDE 20
  • Leverage as much

information as possible

  • Entirely offline
  • Diverse metric types

Metrics ML Features

  • Prevent data leakage to

keep models clean

  • Available online and offline
  • Windowing functions

Metrics are different than ML features

slide-21
SLIDE 21

Similar basics, different details

Global Metrics Framework ML Feature Framework

slide-22
SLIDE 22

Why dimensions are 1st class citizens

slide-23
SLIDE 23

Denormalization makes analytics speedy

Image Source
slide-24
SLIDE 24

doesn’t allow joins

timestamp shape color count 12:00 square yellow 23 12:00 circle yellow 2 12:00 square red 57 12:00 circle red 188

slide-25
SLIDE 25

Many metrics are dimensional cuts

Bookings Company

slide-26
SLIDE 26

Many metrics are dimensional cuts

Bookings First Time Bookings Company Growth

slide-27
SLIDE 27

Many metrics are dimensional cuts

Bookings First Time Bookings Bookings in China Company Growth China

slide-28
SLIDE 28

Many metrics are dimensional cuts

Bookings First Time Bookings Bookings in China Business Trip Bookings Company Growth China Airbnb for Work

slide-29
SLIDE 29

Exploratory analysis across many dimensional cuts

Bookings in China from North America by host status

slide-30
SLIDE 30

Standard Star Schema

Foreign Key Foreign Key Foreign Key

slide-31
SLIDE 31

Global Metrics Framework Naming

Subject Subject Subject Dimension Source Dimension Source Dimension Source Metric Source

slide-32
SLIDE 32

YAML configs instead of tables

Destination_geo.yaml Bookings.yaml Origin_geo.yaml Host_status.yaml

Dimension Source Dimension Source Dimension Source Metric Source

slide-33
SLIDE 33

Data scientists list which dimensions to include

metric_source: bookings metrics:

  • bookings
  • nights

subjects:

  • listing
  • guest
  • host

dimensions:

  • dim_destination_china
  • dim_origin_region
  • dim_new_host
slide-34
SLIDE 34

Automatically joins to the relevant dimension sources

dim_source: destination_geo dimensions:

  • dim_destination_region
  • dim_destination_china

subject:

  • listing

metric_source: bookings metrics:

  • bookings
  • nights

subjects:

  • listing
  • guest
  • host

dimensions:

  • dim_destination_china
  • dim_origin_region
  • dim_new_host

dim_source: origin_geo dimensions:

  • dim_origin_region

subject:

  • guest

dim_source: host_status dimensions:

  • dim_new_host

subject:

  • host
slide-35
SLIDE 35

Bookings has hundreds of dimensions

Bookings in China from North America by host status by platform for work by returning users

slide-36
SLIDE 36

Expensive dimensions

Bookings in China from North America by host status by platform by returning users by Listing Lifetime Value

slide-37
SLIDE 37

Dimension sets give DS control over SLAs

metric_source: bookings metrics:

  • bookings
  • nights

dimension_sets: china_dims:

  • dim_destination_china
  • dim_origin_region

host_dims:

  • dim_new_host
  • dim_origin_region
slide-38
SLIDE 38

metric_source: bookings metrics:

  • bookings
  • nights

dimension_sets: china_dims:

  • dim_destination_china
  • dim_origin_region

host_dims:

  • dim_new_host
  • dim_origin_region

table: bookings__china_dims columns:

  • bookings
  • nights
  • dim_destination_china
  • dim_origin_region

table: bookings__host_dims columns:

  • bookings
  • nights
  • dim_new_host
  • dim_origin_region

Dimension sets give DS control over SLAs

slide-39
SLIDE 39

Global Metrics Framework = Denormalization Machine

Super powerful for ad hoc analysis

Bookings from North America in China by host status Bookings in North America by new guests by lifetime value

slide-40
SLIDE 40

Config-driven pipeline generation eliminates 3 steps

Create Table Schedule Airflow Job Backfill Table A/B Testing Config Anomaly Detection Config Tell consumers to update queries Add table to Superset Create Dashboard Monitor Pipelines Global Metric Config

slide-41
SLIDE 41

Logic & metadata store Compute data Consuming Apps A/B testing Anomaly Detection

slide-42
SLIDE 42

Serving data to apps eliminates 3 more steps

Create Table Schedule Airflow Job Backfill Table A/B Testing Config Anomaly Detection Config Tell consumers to update queries Add table to Superset Create Dashboard Monitor Pipelines Global Metric Config

slide-43
SLIDE 43

Bonus features for data scientist drive love

slide-44
SLIDE 44

Free stuff

Automatic backfills when metrics or dimensions change

slide-45
SLIDE 45

Free stuff

Automatic backfills when metrics or dimensions change Self-healing when days are missed

z

slide-46
SLIDE 46

Free stuff

Automatic backfills when metrics or dimensions change Self-healing when days are missed Dashboard generation script

z

slide-47
SLIDE 47

Bonus features eliminate 2 more steps

Create Table Schedule Airflow Job Backfill Table A/B Testing Config Anomaly Detection Config Tell consumers to update queries Add table to Superset **Create Dashboard Monitor Pipelines Global Metric Config

slide-48
SLIDE 48

Old Data Science metric workflow took >2 weeks for simple changes

Create Table Schedule Airflow Job Backfill Table A/B Testing Config Anomaly Detection Config Tell consumers to update queries Add table to Superset Create Dashboard Monitor Pipelines

slide-49
SLIDE 49

New Data Science metric workflow takes <2 days

Global Metric Config **Create Dashboard **semi-automated

slide-50
SLIDE 50

“It has dramatically reduced time to insight.”

Focusing on producers drives love

slide-51
SLIDE 51

“It has dramatically reduced time to insight.” “In our current world, even simple changes are painful. With Global Metrics, most of it becomes trivial.”

Focusing on producers drives love

slide-52
SLIDE 52

“It has dramatically reduced time to insight.” "You can put me in the satisfied customer quotes!” “In our current world, even simple changes are painful. With Global Metrics, most of it becomes trivial.”

Focusing on producers drives love

slide-53
SLIDE 53

At the time of official launch (last week)

  • >20 teams contributing
  • > 350 metrics added
  • Less-technical contributors (Finance)

Word-of-mouth adoption

slide-54
SLIDE 54
  • 1. Airbnb’s journey
  • 2. Why you should make dimensions

first class citizens

  • 3. Why prioritize bonus features early

Changing the metric management paradigm

slide-55
SLIDE 55

Where to go from here?

slide-56
SLIDE 56

More features for metric consumers

Leverage metadata in Superset integration

slide-57
SLIDE 57

More features for metric consumers

Leverage metadata in Superset integration Make metrics more discoverable

slide-58
SLIDE 58

More features for metric consumers

Leverage metadata in Superset integration Make metrics more discoverable Metric certification process

slide-59
SLIDE 59

Airflow

Anomaly Detection A/B testing

Global Metrics Framework

Open Source?

slide-60
SLIDE 60

Questions?

Twitter: @lchircus LinkedIn: linkedin.com/in/lchircus Email: lauren.chircus@airbnb.com

slide-61
SLIDE 61