Building an A.I. Cloud What We Learned from PredictionIO Simon Chan - - PowerPoint PPT Presentation

building an a i cloud
SMART_READER_LITE
LIVE PREVIEW

Building an A.I. Cloud What We Learned from PredictionIO Simon Chan - - PowerPoint PPT Presentation

Building an A.I. Cloud What We Learned from PredictionIO Simon Chan Sr. Director, Product Management, Salesforce Co-founder, PredictionIO PhD, University College London simon@salesforce.com The A.I. Developer Platform Dilemma Simple


slide-1
SLIDE 1

Building an A.I. Cloud

What We Learned from PredictionIO

slide-2
SLIDE 2

Simon Chan

  • Sr. Director, Product Management, Salesforce

Co-founder, PredictionIO PhD, University College London

simon@salesforce.com

slide-3
SLIDE 3

The A.I. Developer Platform Dilemma

Flexible Simple

slide-4
SLIDE 4

Every prediction problem is unique.

slide-5
SLIDE 5

3 Approaches to Customize Prediction GUI Automation Custom Code & Script

Flexible Simple

slide-6
SLIDE 6

10 KEY STEPS

to build-your-own A.I. P.S. Choices = Complexity

slide-7
SLIDE 7

One platform, build multiple apps. Here are 3 examples.

  • 1. E-Commerce

Recommend products

  • 2. Subscription

Predict churn

  • 3. Social Network

Detect spam

slide-8
SLIDE 8

Let’s Go Beyond Textbook Tutorial

// Example from Spark ML website import org.apache.spark.ml.classification.LogisticRegression // Load training data val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training)

slide-9
SLIDE 9

Define the Prediction Problem

Be clear about the goal

1

slide-10
SLIDE 10

Define the Prediction Problem

Basic Ideas:

  • What is the Business Goal?

▫ Better user experience ▫ Maximize revenue ▫ Automate a manual task ▫ Forecast future events

  • What’s the input query?
  • What’s the prediction output?
  • What is a good prediction?
  • What is a bad prediction?
slide-11
SLIDE 11

Decide on the Presentation

It’s (still) all about human perception

2

slide-12
SLIDE 12

Decide on the Presentation

NOT SPAM SPAM NOT SPAM True Negative False Negative SPAM False Positive True Positive

Actual Predicted

Mailing List and Social Network, for example, may tolerate false predictions differently

slide-13
SLIDE 13

Decide on the Presentation

Some UX/UI Choices:

  • Toleration to Bad Prediction?
  • Suggestive or Decisive?
  • Prediction Explainability?
  • Intelligence Visualization?
  • Human Interactive?
  • Score; Ranking; Comparison; Charts; Groups
  • Feedback Interface

▫ Explicit or Implicit

slide-14
SLIDE 14

Import Free-form Data Source

Life is more complicated than MNIST and MovieLens datasets

3

slide-15
SLIDE 15

Import Free-form Data Sources

Some Types of Data:

  • User Attributes
  • Item (product/content) Attributes
  • Activities / Events

Estimate (guess) what you need.

slide-16
SLIDE 16

Import Free-form Data Sources

Some Ways to Transfer Data:

  • Transactional versus Batch
  • Batch Frequency
  • Full data or Changed Delta Only

Don’t forget continuous data sanity checking

slide-17
SLIDE 17

Construct Features & Labels from Data

Make it algorithm-friendly!

4

slide-18
SLIDE 18

Construct Features & Labels from Data

Some Ways of Transformation:

  • Qualitative to Numerical
  • Normalize and Weight
  • Aggregate - Sum, Average?
  • Time Range
  • Missing Records

Different algorithms may need different things

Label-specific:

  • Delayed Feedback
  • Implicit Assumptions
  • Reliability of Explicit Opinion
slide-19
SLIDE 19

Construct Features & Labels from Data Qualitative to Numerical

// Example from Spark ML website - TF-IDF import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (0, "I wish Java could use case classes"), (1, "Logistic regression models are neat") )).toDF("label", "sentence") val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words") val wordsData = tokenizer.transform(sentenceData) val hashingTF = new HashingTF() .setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20) val featurizedData = hashingTF.transform(wordsData) val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features") val idfModel = idf.fit(featurizedData) val rescaledData = idfModel.transform(featurizedData)

slide-20
SLIDE 20

Set Evaluation Metrics

Measure things that matter

5

slide-21
SLIDE 21

Set Evaluation Metrics

Some Challenges:

  • How to Define an Offline Evaluation that

Reflects Real Business Goal?

  • Delayed Feedback (again)
  • How to Present The Results to Everyone?
  • How to Do Live A/B Test?
slide-22
SLIDE 22

Clarify “Real-time”

The same word can mean different things

6

slide-23
SLIDE 23

Clarify “Real-time”

Different Needs:

  • Batch Model Update, Batch Queries
  • Batch Model Update, Real-time Queries
  • Real-time Model Update, Real-time Queries

When to Train/Re-train for Batch?

slide-24
SLIDE 24

Find the Right Model

The “cool” modeling part - algorithms and hyperparameters

7

slide-25
SLIDE 25

Find the Right Model Example of model hyperparameter selection

// Example from Spark ML website // We use a ParamGridBuilder to construct a grid of parameters to search over. val paramGrid = new ParamGridBuilder() .addGrid(hashingTF.numFeatures, Array(10, 100, 1000)) .addGrid(lr.regParam, Array(0.1, 0.01)).build() // Note that the evaluator here is a BinaryClassificationEvaluator and its default metric // is areaUnderROC. val cv = new CrossValidator() .setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(2) // Use 3+ in practice // Run cross-validation, and choose the best set of parameters. val cvModel = cv.fit(training)

slide-26
SLIDE 26

Find the Right Model

Some Typical Challenges:

  • Classification, Regression, Recommendation
  • r Something Else?
  • Overfitting / Underfitting
  • Cold-Start (New Users/Items)
  • Data Size
  • Noise
slide-27
SLIDE 27

Serve Predictions

Time to Use the Result

8

slide-28
SLIDE 28

Serve Predictions

Some Approaches:

  • Real-time Scoring
  • Batch Scoring

Real-time Business Logics/Filters is often added

  • n top.
slide-29
SLIDE 29

Collect Feedback for Improvement

Machine Learning is all about “Learning”

9

slide-30
SLIDE 30

Collect Feedback for Improvement

Some Mechanism:

  • Explicit User Feedback

▫ Allow users to correct, or express opinion

  • n, prediction manually
  • Implicit Feedback

▫ Learn from subsequence effects of the previous predictions ▫ Compare predicted results with the actual reality

slide-31
SLIDE 31

Keep Monitoring & Be Crisis-Ready

Make sure things are still working

10

slide-32
SLIDE 32

Keep Monitoring & Be Crisis-Ready

Some Ideas:

  • Real-time Alert (email, slack, pagerduty)
  • Daily Reports
  • Possibly integrate with existing monitoring tools
  • Ready for production rollback
  • Version control

For both prediction accuracy and production issues.

slide-33
SLIDE 33

Summary: Processes We Need to Simplify Define the Prediction Problem Decide on the Presentation Import Free-form Data Sources Construct Features & Labels from Data Set Evaluation Metrics Clarify “Real-Time” Find the Right Model Serve Predictions Collect Feedback for Improvement Keep Monitoring & Be Crisis-Ready

slide-34
SLIDE 34

The Future of A.I.

is the automation of A.I.

slide-35
SLIDE 35

Thanks! Any Questions?

WE ARE HIRING.

simon@salesforce.com @simonchannet