It Takes a Village to Raise a Machine Learning Model Lucian Lita - - PowerPoint PPT Presentation

it takes a village to raise a machine learning model
SMART_READER_LITE
LIVE PREVIEW

It Takes a Village to Raise a Machine Learning Model Lucian Lita - - PowerPoint PPT Presentation

It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver Algorithms @datariver Data Big Data Sheep @bigdatasheep n n 5yr more data is


slide-1
SLIDE 1

It Takes a Village to Raise a Machine Learning Model

Lucian Lita

@datariver

slide-2
SLIDE 2

It Takes a Village to Raise a Machine Learning Model

Lucian Lita

@datariver

slide-3
SLIDE 3

@datariver

Algorithms

slide-4
SLIDE 4

@datariver more clean data is better than more data #BigData Big Data Sheep @bigdatasheep n

n 4yr

more labeled data is better than more data #BigData Big Data Sheep @bigdatasheep n

n 3yr

more smart data is better than purple data #BigData Big Data Sheep @bigdatasheep n

n 2yr

Data

more data is better than complex algorithms #BigData Big Data Sheep @bigdatasheep n

n 5yr

**inflated historical depiction

slide-5
SLIDE 5

@datariver

Data

slide-6
SLIDE 6

@datariver

Next Frontier: well designed software architectures

Personalization, experimentation, anomaly detection, fraud detection …

slide-7
SLIDE 7

@datariver

Battle Plan

Anomaly detection quick peek Personalization deep dive

sw architecture flavor

Music streaming, advertising, medical informatics brief stories

slide-8
SLIDE 8

@datariver

slide-9
SLIDE 9

@datariver

Reasonable coverage. Segmentation. Reasonable coverage. Personalization. Product as is. No customization. x all x 1 … x 1 … x 1 … x 1 … x 1

slide-10
SLIDE 10

@datariver

  • Childhood. Approaches.
slide-11
SLIDE 11

@datariver

Deep Broad

slide-12
SLIDE 12

@datariver

Push-scientist Push-button

storage delivery API App App Optimization

  • - ML algorithms
  • - data: more, better, smarter
  • - features, selection
slide-13
SLIDE 13

@datariver

Push-scientist Push-button

storage delivery API App App storage delivery API Scale & Automation

  • - model build
  • - model deploy
  • - single instrumentation

Optimization

  • - ML algorithms
  • - data: more, better, smarter
  • - features, selection
slide-14
SLIDE 14

@datariver

Push-scientist

Invest in ML; start with a thin system How much effort put into Platform & Automation? (A) best you can do in x weeks (B) one step above prototype (C) enough baling wire & duct tape to support a first use case

slide-15
SLIDE 15

@datariver

Push-button

Invest in scale & automation; basic ML How much effort put into ML? (A) best generic model setup in y weeks? (B) noticeably better than random? (C) pack enough punch to be visible, but not more

slide-16
SLIDE 16

@datariver

Push-scientist Push-button

slide-17
SLIDE 17

@datariver

  • Adolescence. Platform Patterns.
slide-18
SLIDE 18

@datariver

periodically batch train model

App API (retrieve)

pre-computed content personalized content

API (capture)

feedback periodically run models

(A) Stored

slide-19
SLIDE 19

@datariver

periodically batch train model

App API (compute)

compute

  • n-the-fly

personalized content

API (capture)

feedback

(B) On-the Fly

slide-20
SLIDE 20

@datariver

App API (deliver)

personalized content

API (capture)

feedback Challenge accepted: asymptotically real time!

(C) Aggressive

slide-21
SLIDE 21

@datariver

App API (deliver)

personalized content

API (capture)

feedback Challenge accepted: asymptotically real time!

(C) Aggressive

slide-22
SLIDE 22

@datariver

  • Maturity. Patterns and Assumptions.
slide-23
SLIDE 23

@datariver

Content Delivery Data Capture Model Deployment Model Building Analytics Data Store

What do you really need? Do you need it now?

slide-24
SLIDE 24

@datariver

Model Building. What do you really need?

algos space data eval compute scalability HA security metrics

101010

  • perators
slide-25
SLIDE 25

@datariver

Model Building. What do you really need?

algos space data eval compute scalability HA security metrics

101010

  • perators
slide-26
SLIDE 26

@datariver

Model Deployment. What do you really need?

API

envt ditto versioning deploy sharing scalability HA security performance

Mi Mi+1

slide-27
SLIDE 27

@datariver

Personalization Delivery. What do you really need?

slide-28
SLIDE 28

@datariver

Personalization Delivery. What do you really need?

API

instrument ditto exploit explore sharing scalability HA security performance

slide-29
SLIDE 29

@datariver

Data Store. What do you really need?

API

t

content ditto performance HA history scalability triggers consumers governance sharing

slide-30
SLIDE 30

@datariver

Data Store. To HA or not to HA.

in-app revenue driver infrastructure cost build &

  • perate

now later (blasphemy)

critical user benefit known use cases

slide-31
SLIDE 31

@datariver

Data Store. APIs

slide-32
SLIDE 32

@datariver

Data Capture. What do you really need?

API

t

content ditto history triggers consumers sharing scalability HA security performance

slide-33
SLIDE 33

@datariver

  • Analytics. What do you really need?

API

t

content ditto performance history scalability consumers flexibility

slide-34
SLIDE 34

@datariver

  • Analytics. Experimentation & Personalization
slide-35
SLIDE 35

@datariver

Data Lake. What do you really need? say ‘big data lake’

  • ne more time!
slide-36
SLIDE 36

@datariver

Evolving Architecture. Before you know it…

slide-37
SLIDE 37

Apps API (delivery)

personalized content

API (capture)

feedback

API (compute)

in-app data personalized content

API (push)

direct content

Event Log

raw data

  • r features

run models

train models periodically re-run new models periodically

1 1 2 2 3 3 RT Analytics

Model Deployment Model Building

4

API (analytics)

**terribly incomplete, mildly inaccurate

4

slide-38
SLIDE 38

Not an Exact Blueprint

slide-39
SLIDE 39

As you embark …

Know this non-trivial no one-size fits all Upfront what do you really need? know thy target architecture Do it! working system in weeks fast iterations – ship & test interfaaaaaaaces!

slide-40
SLIDE 40

village model

**not drawn to effort scale

slide-41
SLIDE 41

@datariver

Software architecture is the next frontier! Fail fast still applies! Personalize your personalization platform!

slide-42
SLIDE 42

@datariver

better algorithms more, better, smarter data well designed software architectures

next frontier

slide-43
SLIDE 43

@datariver

A Brief Look at Anomaly Detection

slide-44
SLIDE 44

@datariver

Applications

¡ System health – servers, network ¡ Cyber-intrusion detection ¡ Enterprise anomaly detection ¡ Image processing ¡ Textual anomaly detection ¡ Sensor networks ¡ Fraud detection ¡ Medical anomaly detection ¡ Industrial damage detection ¡ …

slide-45
SLIDE 45

@datariver

Algorithms

¡ Supervised ¡ Unsupervised ¡ Generic statistical ¡ Information theory ¡ …

“What algorithms are you going to use?”

slide-46
SLIDE 46

@datariver

Data

Low data volume Invest in data acquisition Invest in high coverage High data volume Invest in defining signal Invest in labeling, tools, and crowdsourcing

slide-47
SLIDE 47

@datariver

Architectures Again

Capture

Data Collectors

Clickstream, User Input … Real time, DBs …

Compute

run models

Labeling

Labeling

Crowdsourcing Active learning

Processors (M&A)

broad: time bounded deep: open ended

**check assumptions

slide-48
SLIDE 48

@datariver

Advertising

slide-49
SLIDE 49

@datariver

Music Streaming

slide-50
SLIDE 50

@datariver

Medical Informatics

slide-51
SLIDE 51

@datariver

better algorithms more, better, smarter data well designed software architectures

next frontier

slide-52
SLIDE 52

@datariver

Thank you!

Lucian Lita

@datariver

[always hiring] data@intuit.com

slide-53
SLIDE 53

@datariver

Thank you!

Lucian Lita

@datariver

[always hiring] data@intuit.com

slide-54
SLIDE 54

@datariver

slide-55
SLIDE 55

@datariver

Extra Content

slide-56
SLIDE 56

@datariver

  • Security. What do you really need?
slide-57
SLIDE 57

@datariver

slide-58
SLIDE 58

@datariver

  • App. Who does the App talk to?

App API (compute)

  • - retrieve static data
  • - apply op logic
  • - compute features
  • - run model
  • - log actions

App API (retrieve)

  • - apply op logic
  • - retrieve pre-computed

content personalized content dynamic data personalized content

(a) (b)