Machine Learning avec Spark : La voie de la production Andr - - PowerPoint PPT Presentation

machine learning avec spark la voie de la production
SMART_READER_LITE
LIVE PREVIEW

Machine Learning avec Spark : La voie de la production Andr - - PowerPoint PPT Presentation

Andrea Baita Machine Learning avec Spark : La voie de la production Andr Bois-Crettez Tech W ech Week 2019 - Gr eek 2019 - Grenoble enoble Plan Plan Business case Machine learning Testing Production Lessons learned Business Case


slide-1
SLIDE 1
slide-2
SLIDE 2

Machine Learning avec Spark : La voie de la production

Andrea Baita André Bois-Crettez

Tech W ech Week 2019 - Gr eek 2019 - Grenoble enoble

slide-3
SLIDE 3

Plan Plan

Business case Machine learning Testing Production Lessons learned

slide-4
SLIDE 4

Business Case

Business case

slide-5
SLIDE 5

Merchants
 (Online stores) +
 Publishers
 (Ads Network) Users

Business case

slide-6
SLIDE 6

Targets

Merchant

  • Attract more buyers
  • Sell more with less budget

KelkooGroup

  • Automatization
  • Increase margin

End Users

  • See interesting products
  • Find the best offers

Business case

slide-7
SLIDE 7

Decisions to make

  • Where to show the offer 


(which site, which publisher)

  • How much to pay for it

Business case

slide-8
SLIDE 8

Problem How many clicks the offer will get ?

Business case

slide-9
SLIDE 9

Solution

Machine Learning

slide-10
SLIDE 10

Machine Learning

slide-11
SLIDE 11

How?

Machine Learning

slide-12
SLIDE 12

+ = ML MODEL (prototype)

Data Scientist Data

Machine Learning

slide-13
SLIDE 13

Lots of data

Color = type of device Click price Actual number of clicks More features are used time
 category
 merchant
 … secret ones ...

Machine Learning

slide-14
SLIDE 14

Learn first ...

Example : past data = Model

date date categoryId categoryId merchantId merchantId category category device device price price 08/04/2019 10163969 1 Accessoires Moto desktop 0.08 08/04/2019 10163969 1 Accessoires Moto mobile 0.0704 08/04/2019 10163969 1 Accessoires Moto tablet 0.18 08/04/2019 10543669 2 Lingerie Femme desktop 0.23 08/04/2019 10543669 2 Lingerie Femme mobile 0.0989 08/04/2019 12676471 3 Lunettes de vue mobile 0.1204 clicks clicks 2 21 21 22 22 10 10 2 1

with past result

Machine Learning

slide-15
SLIDE 15

… then predict !

Current data = Predict result

date date categoryId categoryId merchantId merchantId category category device device price price 11/04/2019 10163969 1 Accessoires Moto desktop 0.09 11/04/2019 10163969 1 Accessoires Moto mobile 0.08 11/04/2019 10163969 1 Accessoires Moto tablet 0.19 11/04/2019 10543669 2 Lingerie Femme desktop 0.24 11/04/2019 10543669 2 Lingerie Femme mobile 0.10 11/04/2019 12676471 3 Lunettes de vue mobile 0.13 Predicted clicks Predicted clicks 3 20 20 23 23 11 11 1 2

with Model

Machine Learning

slide-16
SLIDE 16

How do we implement it?

Machine Learning

slide-17
SLIDE 17

Scala Developer + =

ML MODEL (production ready)

Machine Learning

slide-18
SLIDE 18

Spark?

Machine Learning

slide-19
SLIDE 19

Unified analytics engine for large- scale data processing

  • interactive exploration
  • batch processing
  • SQL
  • machine learning at scale
  • ...

Machine Learning

slide-20
SLIDE 20

How do we use it?

Machine Learning

slide-21
SLIDE 21

Architecture

Ad Ad Networks Networks Raw Data Raw Data Pr Prepr eprocessing

  • cessing

Training raining Pr Prediction ediction Decision Decision

Users

Machine Learning

slide-22
SLIDE 22

Data Learn Predict and Decide Model

Machine Learning

slide-23
SLIDE 23

The model changes over time ...

Machine Learning

slide-24
SLIDE 24

… how can we deploy it?

Machine Learning

slide-25
SLIDE 25

Model deployment approaches

Train first and then deploy the model

  • Real time predictions
  • Models training is expensive
  • Training data is stable

Deploy the code, train at needs

  • Batch predictions
  • Quick model training
  • Training data evolve fast

Machine Learning

slide-26
SLIDE 26

How can we test it?

Testing

slide-27
SLIDE 27

ML testing problems

  • Behavior depends on data
  • Difficult to define exact test result
  • Code is hard to structure
  • Unit tests are challenging

Testing

slide-28
SLIDE 28

Solutions

  • Compare metrics, not values
  • Use functional testing
  • Live monitoring
  • Tracking over time

Testing

slide-29
SLIDE 29

How to define the metrics?

Production

slide-30
SLIDE 30

Define relevant metrics

Goal : evaluate quality

  • Prototyping: Statistical metrics
  • Mean Average Error, Root Mean Square Error
  • Testing: Business metrics
  • Total margin
  • Monitor: Real time metrics
  • Predicted Clicks vs. Real Clicks

Production

slide-31
SLIDE 31

Tests and Measures : where ?

Ad Ad Networks Networks Raw Data Raw Data Pr Prepr eprocessing

  • cessing

Training raining Pr Prediction ediction Decision Decision

Here H e r e

Production

Users

slide-32
SLIDE 32

How can we schedule the jobs?

Production

slide-33
SLIDE 33

Azkaban

  • Workflow job scheduler
  • Hadoop and Spark jobs
  • Graph of job dependencies
  • Alerting on failures with Nagios

Production

slide-34
SLIDE 34

How to track the model behavior?

Production

slide-35
SLIDE 35

Tracking

  • Business metrics graphs
  • Predictions vs. actual results
  • Study trends long term
  • Adapt model when market changes
  • Easy to fix: abrupt drop in quality metric
  • Harder: slow erosion of quality

Production

slide-36
SLIDE 36

Tracking with ELK

slide-37
SLIDE 37

So, what did we learn ? So, what did we learn ?

Lessons learned

slide-38
SLIDE 38

1

slide-39
SLIDE 39

2

slide-40
SLIDE 40

3

slide-41
SLIDE 41

4

slide-42
SLIDE 42

Questions ?