Simplifying ML Workflows with Apache Beam & TensorFlow Extended - - PowerPoint PPT Presentation

simplifying ml workflows with apache beam tensorflow
SMART_READER_LITE
LIVE PREVIEW

Simplifying ML Workflows with Apache Beam & TensorFlow Extended - - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language


slide-1
SLIDE 1

Simplifying ML Workflows with Apache Beam & TensorFlow Extended

Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC +

slide-2
SLIDE 2

Apache Beam

Portable data-processing pipelines +

slide-3
SLIDE 3

Example pipelines

+ Java Python

slide-4
SLIDE 4

Cross-language Portability Framework

+

Language B SDK Language A SDK Language C SDK Runner 1 Runner 3 Runner 2 The Beam Model Language A Language C Language B The Beam Model

slide-5
SLIDE 5

Python compatible runners

+

Direct runner (local machine): Google Cloud Dataflow: Apache Flink: Apache Spark: Now Now Q2-Q3 Q3-Q4

slide-6
SLIDE 6

TensorFlow Extended

End-to-end machine learning in production +

slide-7
SLIDE 7

“Doing ML in production is hard.”

  • Everyone who has ever tried
slide-8
SLIDE 8

Because, in addition to the actual ML...

ML Code

+

slide-9
SLIDE 9

...you have to worry about so much more.

Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring

Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems ML Code

+

slide-10
SLIDE 10

In this talk, I will...

+

slide-11
SLIDE 11

In this talk, I will...

Show you how to apply transformations...

TensorFlow Transform

Show you how to apply transformations...

+

slide-12
SLIDE 12

In this talk, we will...

Show you how to apply transformations... ... consistently between Training and Serving

TensorFlow Transform TensorFlow Estimators TensorFlow Serving

+

slide-13
SLIDE 13

In this talk, we will...

Introduce something new...

TensorFlow Transform TensorFlow Estimators TensorFlow Serving TensorFlow Model Analysis

+

slide-14
SLIDE 14

TensorFlow Transform

Consistent In-Graph Transformations in Training and Serving +

slide-15
SLIDE 15

Typical ML Pipeline

batch processing During training “live” processing During serving data request

+

slide-16
SLIDE 16

Typical ML Pipeline

batch processing During training “live” processing During serving data request

+

slide-17
SLIDE 17

TensorFlow Transform

tf.Transform batch processing During training transform as tf.Graph During serving data request

+

slide-18
SLIDE 18

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. X Y Z

+

slide-19
SLIDE 19

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. X Y Z

mean stddev normalize

+

slide-20
SLIDE 20

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. X Y Z

mean stddev normalize multiply

+

slide-21
SLIDE 21

X Y Z

mean stddev normalize multiply

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own.

quantiles bucketize

A

+

slide-22
SLIDE 22

X Y Z

mean stddev normalize multiply quantiles bucketize

A

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. B

+

slide-23
SLIDE 23

X Y Z

mean stddev normalize multiply quantiles bucketize

A B

Defining a preprocessing function in TF Transform

def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. C

+

slide-24
SLIDE 24

mean stddev normalize multiply quantiles bucketize

Analyzers

Reduce (full pass) Implemented as a distributed data pipeline

Transforms

Instance-to-instance (don’t change batch dimension) Pure TensorFlow

+

slide-25
SLIDE 25

Analyze

normalize multiply bucketize

constant tensors

data

mean stddev normalize multiply quantiles bucketize

slide-26
SLIDE 26

What can be done with TF Transform?

Pretty much anything. tf.Transform batch processing

+

slide-27
SLIDE 27

What can be done with TF Transform?

Anything that can be expressed as a TensorFlow Graph Pretty much anything. tf.Transform batch processing Serving Graph

+

slide-28
SLIDE 28

Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses

Some common use-cases...

slide-29
SLIDE 29

Apply another TensorFlow Model

Some common use-cases...

Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses

slide-30
SLIDE 30

github.com/tensorflow/transform

slide-31
SLIDE 31

TensorFlow Model Analysis

Scaleable, sliced, and full-pass metrics

Introducing…

+

slide-32
SLIDE 32

Let’s Talk about Metrics...

  • How accurate?
  • Converged model?
  • What about my TB sized eval set?
  • Slices / subsets?
  • Across model versions?

+

slide-33
SLIDE 33

ML Fairness: analyzing model mistakes by subgroup

Specificity (False Positive Rate) Sensitivity (True Positive Rate) ROC Curve

All groups

Learn more at ml-fairness.com

slide-34
SLIDE 34

ML Fairness: analyzing model mistakes by subgroup

Specificity (False Positive Rate) Sensitivity (True Positive Rate) ROC Curve

All groups Group A Group B

Learn more at ml-fairness.com

slide-35
SLIDE 35

ML Fairness: understand the failure modes of your models

slide-36
SLIDE 36

ML Fairness: Learn More ml-fairness.com

slide-37
SLIDE 37

How does it work?

... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn) ... Inference Graph (SavedModel)

SignatureDef

+

slide-38
SLIDE 38

... estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn) ...

How does it work?

Eval Graph (SavedModel)

SignatureDef Eval Metadata

Inference Graph (SavedModel)

SignatureDef

+

slide-39
SLIDE 39

github.com/tensorflow/model-analysis

slide-40
SLIDE 40

Summary

Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon

  • n-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers

large-scale data processing in the TF libraries below. tf.Transform: Consistent in-graph transformations in training and serving. tf.ModelAnalysis: Scalable, sliced, and full-pass metrics. +