A field guide to the machine learning zoo Theodore Vasiloudis - - PowerPoint PPT Presentation

a field guide to the machine learning zoo
SMART_READER_LITE
LIVE PREVIEW

A field guide to the machine learning zoo Theodore Vasiloudis - - PowerPoint PPT Presentation

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH From idea to objective function Formulating an ML problem Formulating an ML problem Common aspects Source: Xing (2015) Formulating an ML problem Common aspects


slide-1
SLIDE 1

A field guide to the machine learning zoo

Theodore Vasiloudis SICS/KTH

slide-2
SLIDE 2

From idea to objective function

slide-3
SLIDE 3

Formulating an ML problem

slide-4
SLIDE 4

Formulating an ML problem

  • Common aspects

Source: Xing (2015)

slide-5
SLIDE 5

Formulating an ML problem

  • Common aspects

○ Model (θ) Source: Xing (2015)

slide-6
SLIDE 6

Formulating an ML problem

  • Common aspects

○ Model (θ) ○ Data (D) Source: Xing (2015)

slide-7
SLIDE 7

Formulating an ML problem

  • Common aspects

○ Model (θ) ○ Data (D)

  • Objective function: L(θ, D)

Source: Xing (2015)

slide-8
SLIDE 8

Formulating an ML problem

  • Common aspects

○ Model (θ) ○ Data (D)

  • Objective function: L(θ, D)
  • Prior knowledge: r(θ)

Source: Xing (2015)

slide-9
SLIDE 9

Formulating an ML problem

  • Common aspects

○ Model (θ) ○ Data (D)

  • Objective function: L(θ, D)
  • Prior knowledge: r(θ)
  • ML program: f(θ, D) = L(θ, D) + r(θ)

Source: Xing (2015)

slide-10
SLIDE 10

Formulating an ML problem

  • Common aspects

○ Model (θ) ○ Data (D)

  • Objective function: L(θ, D)
  • Prior knowledge: r(θ)
  • ML program: f(θ, D) = L(θ, D) + r(θ)
  • ML Algorithm: How to optimize f(θ, D)

Source: Xing (2015)

slide-11
SLIDE 11

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

slide-12
SLIDE 12

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

  • Data (D):
  • Model (θ):
  • Objective function - L(D, θ):
  • Prior knowledge (Regularization):
  • Algorithm:
slide-13
SLIDE 13

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

  • Data (D): Features and labels, xi, yi
  • Model (θ):
  • Objective function - L(D, θ):
  • Prior knowledge (Regularization):
  • Algorithm:
slide-14
SLIDE 14

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

  • Data (D): Features and labels, xi, yi
  • Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

  • Objective function - L(D, θ):
  • Prior knowledge (Regularization):
  • Algorithm:
slide-15
SLIDE 15

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

  • Data (D): Features and labels, xi, yi
  • Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

  • Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤxi))
  • Prior knowledge (Regularization): r(w) = λ*wΤw
  • Algorithm:

Warning: Notation abuse

slide-16
SLIDE 16

Example: Improve retention at Twitter

  • Goal: Reduce the churn of users on Twitter
  • Assumption: Users churn because they don’t engage with the platform
  • Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

  • Data (D): Features and labels, xi, yi
  • Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

  • Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤxi))
  • Prior knowledge (Regularization): r(w) = λ*wΤw
  • Algorithm: Gradient Descent
slide-17
SLIDE 17

Data problems

slide-18
SLIDE 18

Data problems

  • GIGO: Garbage In - Garbage Out
slide-19
SLIDE 19

Data readiness

Source: Lawrence (2017)

slide-20
SLIDE 20

Data readiness

  • Problem: “Data” as a concept is hard to reason about.
  • Goal: Make the stakeholders aware of the state of the data at all stages

Source: Lawrence (2017)

slide-21
SLIDE 21

Data readiness

Source: Lawrence (2017)

slide-22
SLIDE 22

Data readiness

  • Band C

○ Accessibility Source: Lawrence (2017)

slide-23
SLIDE 23

Data readiness

  • Band C

○ Accessibility

  • Band B

○ Representation and faithfulness Source: Lawrence (2017)

slide-24
SLIDE 24

Data readiness

  • Band C

○ Accessibility

  • Band B

○ Representation and faithfulness

  • Band A

○ Data in context Source: Lawrence (2017)

slide-25
SLIDE 25

Data readiness

  • Band C

○ “How long will it take to bring our user data to C1 level?”

  • Band B

○ “Until we know the collection process we can’t move the data to B1.”

  • Band A

○ “We realized that we would need location data in order to have an A1 dataset.” Source: Lawrence (2017)

slide-26
SLIDE 26

Data readiness

  • Band C

○ “How long will it take to bring our user data to C1 level?”

  • Band B

○ “Until we know the collection process we can’t move the data to B1.”

  • Band A

○ “We realized that we would need location data in order to have an A1 dataset.”

slide-27
SLIDE 27

Selecting algorithm & software: “Easy” choices

slide-28
SLIDE 28

Selecting algorithms

slide-29
SLIDE 29

An ML algorithm “farm”

Source: scikit-learn.org

slide-30
SLIDE 30

The neural network zoo

Source: Asimov Institute (2016)

slide-31
SLIDE 31

Selecting algorithms

  • Always go for the simplest model you can afford
slide-32
SLIDE 32

Selecting algorithms

  • Always go for the simplest model you can afford

○ Your first model is more about getting the infrastructure right Source: Zinkevich (2017)

slide-33
SLIDE 33

Selecting algorithms

  • Always go for the simplest model you can afford

○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. Source: Zinkevich (2017)

slide-34
SLIDE 34

Selecting algorithms

  • Always go for the simplest model you can afford

○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries Source: Sculley et al. (2015)

slide-35
SLIDE 35

Selecting algorithms

  • Always go for the simplest model you can afford

○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries ■ CACE principle: Changing Anything Changes Everything Source: Sculley et al. (2015)

slide-36
SLIDE 36

Selecting software

slide-37
SLIDE 37

The ML software zoo

Leaf

slide-38
SLIDE 38

Your model vs. the world

slide-39
SLIDE 39

What are the problems with ML systems?

Data ML Code Model

Expectation

slide-40
SLIDE 40

What are the problems with ML systems?

Data ML Code Model

Reality

Sculley et al. (2015)

slide-41
SLIDE 41

Things to watch out for

slide-42
SLIDE 42
  • Data dependencies

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

slide-43
SLIDE 43
  • Data dependencies

○ Unstable dependencies

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

slide-44
SLIDE 44
  • Data dependencies

○ Unstable dependencies

  • Feedback loops

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

slide-45
SLIDE 45
  • Data dependencies

○ Unstable dependencies

  • Feedback loops

○ Direct

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

slide-46
SLIDE 46
  • Data dependencies

○ Unstable dependencies

  • Feedback loops

○ Direct ○ Indirect

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

slide-47
SLIDE 47

Bringing it all together

slide-48
SLIDE 48

Bringing it all together

  • Define your problem as optimizing your objective function using data
  • Determine (and monitor) the readiness of your data
  • Don't spend too much time at first choosing an ML framework/algorithm
  • Worry much more about what happens when your model meets the world.
slide-49
SLIDE 49

Thank you.

@thvasilo tvas@sics.se

slide-50
SLIDE 50

Sources

  • Google auto-replies: Shared photos, and text
  • Silver et al. (2016): Mastering the game of Go
  • Xing (2015): A new look at the system, algorithm and theory foundations of Distributed ML
  • Lawrence (2017): Data readiness levels
  • Asimov Institute (2016): The Neural Network Zoo
  • Zinkevich (2017): Rules of Machine Learning - Best Practices for ML Engineering
  • Sculley et al. (2015): Hidden Technical Debt in Machine Learning Systems