TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald - - PowerPoint PPT Presentation

tensorflow extended tfx
SMART_READER_LITE
LIVE PREVIEW

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald - - PowerPoint PPT Presentation

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An End-to-End ML Platform Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization Shared Configuration


slide-1
SLIDE 1

TensorFlow Extended (TFX)

An End-to-End ML Platform Clemens Mewald

slide-2
SLIDE 2

Figure 1: High-level component overview of a machine learning platform.

Data Ingestion Data Analysis + Validation Data Transformation Trainer Model Evaluation and Validation Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization

TensorFlow Extended (TFX): An End-to-End ML Platform

slide-3
SLIDE 3

TFX powers our most important bets and products...

(incl. )

Major Products AlphaBets

slide-4
SLIDE 4

Figure 1: High-level component overview of a machine learning platform.

Data Ingestion TensorFlow Data Validation TensorFlow Transform Estimator

  • r Keras

Model TensorFlow Model Analysis TensorFlow Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization

So far, we’ve made some of our libraries available.

slide-5
SLIDE 5

… and some of our most important partners.

slide-6
SLIDE 6

Figure 1: High-level component overview of a machine learning platform.

Data Ingestion TensorFlow Data Validation TensorFlow Transform Estimator

  • r Keras

Model TensorFlow Model Analysis TensorFlow Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization

So far, we’ve made some of our libraries available.

slide-7
SLIDE 7

Figure 1: High-level component overview of a machine learning platform.

Data Ingestion TensorFlow Data Validation TensorFlow Transform Estimator

  • r Keras

Model TensorFlow Model Analysis TensorFlow Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization

Today, we share the horizontal layers that integrate libraries in one product.

slide-8
SLIDE 8

Building Components out of Libraries

Data Ingestion TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis Honoring Validation Outcomes TensorFlow Data Validation TensorFlow Serving ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher Model Server

slide-9
SLIDE 9

What makes a Component

Model Validator

Packaged binary

  • r container
slide-10
SLIDE 10

What makes a Component

Last Validated Model New (Candidate) Model

Model Validator

Validation Outcome

Well defined inputs and outputs

slide-11
SLIDE 11

What makes a Component

Config Last Validated Model New (Candidate) Model

Model Validator

Validation Outcome

Well defined configuration

slide-12
SLIDE 12

Metadata Store

What makes a Component

Config Last Validated Model New (Candidate) Model

Model Validator

Validation Outcome

Context

slide-13
SLIDE 13

Metadata Store

What makes a Component

Trainer

Config Last Validated Model New (Candidate) Model New Model

Model Validator

Validation Outcome

Pusher

New (Candidate) Model Validation Outcome TensorFlow Serving

slide-14
SLIDE 14

Metadata Store? That’s new

slide-15
SLIDE 15

Trainer

Metadata Store? That’s new

Task-Aware Pipelines

Transform

slide-16
SLIDE 16

Trainer

Metadata Store? That’s new

Task-Aware Pipelines

Input Data Transformed Data Trained Models Serving System

Task- and Data-Aware Pipelines

Pipeline + Metadata Storage Training Data

Transform Trainer Transform

slide-17
SLIDE 17

What’s in the Metadata Store?

Trained Models

Type definitions of Artifacts and their Properties

E.g., Models, Data, Evaluation Metrics

slide-18
SLIDE 18

What’s in the Metadata Store?

Trained Models

Type definitions of Artifacts and their Properties

E.g., Models, Data, Evaluation Metrics

Trainer

Execution Records (Runs) of Components

E.g., Runtime Configuration, Inputs + Outputs

slide-19
SLIDE 19

What’s in the Metadata Store?

Trained Models

Type definitions of Artifacts and their Properties

E.g., Models, Data, Evaluation Metrics

Trainer

Execution Records (Runs) of Components

E.g., Runtime Configuration, Inputs + Outputs

Lineage Tracking Across All Executions

E.g., to recurse back to all inputs of a specific artifact

slide-20
SLIDE 20

List all training runs and attributes

slide-21
SLIDE 21

Visualize lineage of a specific model

Model artifact that was created

slide-22
SLIDE 22

Visualize data a model was trained on

slide-23
SLIDE 23

Visualize sliced eval metrics associated with a model

slide-24
SLIDE 24

Launch TensorBoard for a specific model run

slide-25
SLIDE 25

Launch TensorBoard to compare multiple model runs

slide-26
SLIDE 26

Compare data statistics for multiple models

slide-27
SLIDE 27

Examples of Metadata-Powered Functionality

Use-cases enabled by lineage tracking

slide-28
SLIDE 28

Examples of Metadata-Powered Functionality

Use-cases enabled by lineage tracking Compare previous model runs

slide-29
SLIDE 29

Examples of Metadata-Powered Functionality

Use-cases enabled by lineage tracking Compare previous model runs Carry-over state from previous models

slide-30
SLIDE 30

Examples of Metadata-Powered Functionality

Use-cases enabled by lineage tracking Compare previous model runs Carry-over state from previous models Re-use previously computed outputs

slide-31
SLIDE 31

How do we orchestrate TFX?

Component Legend

slide-32
SLIDE 32

How do we orchestrate TFX?

Component Legend Metadata Store Driver and Publisher

slide-33
SLIDE 33

How do we orchestrate TFX?

Component Executor Legend Metadata Store Driver and Publisher

slide-34
SLIDE 34

How do we orchestrate TFX?

TFX Config Component Executor Legend Metadata Store Driver and Publisher

slide-35
SLIDE 35

def create_pipeline(): """Implements the chicago taxi pipeline with TFX.""" examples = csv_input(os.path.join(data_root, 'simple')) example_gen = CsvExampleGen(input_base=examples) statistics_gen = StatisticsGen(input_data=...) infer_schema = SchemaGen(stats=...) validate_stats = ExampleValidator(stats=..., schema=...) transform = Transform(input_data=..., schema=..., module_file=...) trainer = Trainer( module_file=taxi_module_file, transformed_examples=transform.outputs.transformed_examples, transform_output=transform.outputs.transform_output, schema=infer_schema.outputs.output, train_steps=10000, eval_steps=5000, warm_starting=True) model_analyzer = Evaluator(examples=..., model_exports=...) model_validator = ModelValidator(examples=..., model=...) pusher = Pusher(model_export=..., model_blessing=..., serving_model_dir=...) return [example_gen, statistics_gen, infer_schema, validate_stats, transform, trainer, model_analyzer, model_validator, pusher] pipeline = TfxRunner(airflow_config).run(create_pipeline())

slide-36
SLIDE 36

def train_and_maybe_evaluate(hparams): schema = taxi.read_schema(hparams.schema_file) train_input = lambda: model.input_fn(...) eval_input = lambda: model.input_fn(...) serving_receiver_fn = lambda: model.example_serving_receiver_fn(...) train_spec = tf.estimator.TrainSpec(...) eval_spec = tf.estimator.EvalSpec(...) exporter = tf.estimator.FinalExporter('chicago-taxi', serving_receiver_fn) run_config = tf.estimator.RunConfig( save_checkpoints_steps=999, keep_checkpoint_max=1) estimator = model.build_estimator(...) tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) return estimator

slide-37
SLIDE 37

def build_estimator(tf_transform_dir, config, hidden_units=None): # receive Schema and Transform metadata metadata_dir = os.path.join(tf_transform_dir, transform_fn_io.TRANSFORMED_METADATA_DIR) transformed_metadata = metadata_io.read_metadata(metadata_dir) transformed_feature_spec = transformed_metadata.schema.as_feature_spec() transformed_feature_spec.pop(taxi.transformed_name(taxi.LABEL_KEY)) real_valued_columns = [...] categorical_columns = [...] return tf.estimator.DNNLinearCombinedClassifier( config=config, linear_feature_columns=categorical_columns, dnn_feature_columns=real_valued_columns, dnn_hidden_units=hidden_units or [100, 70, 50, 25])

slide-38
SLIDE 38

How do we orchestrate TFX?

TFX Config Component Executor Legend Metadata Store Driver and Publisher

slide-39
SLIDE 39

Your own runtime ... Kubeflow Runtime

Bring your very own favorite orchestrator

Metadata Store Component Driver and Publisher Executor Legend Airflow Runtime TFX Config

slide-40
SLIDE 40

Examples of orchestrated TFX pipelines

Airflow Kubeflow Pipelines

slide-41
SLIDE 41

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

TFX: Putting it all together.

Airflow Runtime

slide-42
SLIDE 42

Get started with TensorFlow Extended (TFX)

An End-to-End ML Platform

github.com/tensorflow/tfx tensorflow.org/tfx

slide-43
SLIDE 43

TFX End-to-End Example

Chicago Taxi Cab Dataset

slide-44
SLIDE 44

TFX End-to-End Example

Categorical Features trip_start_hour trip_start_day trip_start_month pickup/dropoff_census_tract pickup/dropoff_community_area Dense Float Features trip_miles fare trip_seconds Bucket Features pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude Vocab Features payment_type company

Chicago Taxi Cab Dataset

Features Label = tips > (fare * 20%)

slide-45
SLIDE 45

TFX End-to-End Example

Chicago Taxi Cab Dataset

string_to_int bucketize scale_to_z_score

Features Transforms Label = tips > (fare * 20%)

Categorical Features trip_start_hour trip_start_day trip_start_month pickup/dropoff_census_tract pickup/dropoff_community_area Dense Float Features trip_miles fare trip_seconds Bucket Features pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude Vocab Features payment_type company

slide-46
SLIDE 46

TFX End-to-End Example

Chicago Taxi Cab Dataset

string_to_int bucketize scale_to_z_score

Features Transforms Model (Wide+Deep) Label = tips > (fare * 20%)

Categorical Features trip_start_hour trip_start_day trip_start_month pickup/dropoff_census_tract pickup/dropoff_community_area Dense Float Features trip_miles fare trip_seconds Bucket Features pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude Vocab Features payment_type company

slide-47
SLIDE 47

TFX End-to-End Example

Chicago Taxi Cab Dataset

string_to_int bucketize scale_to_z_score

Features Transforms Model (Wide+Deep)

Label = tips > (fare * 20%)

Categorical Features trip_start_hour trip_start_day trip_start_month pickup/dropoff_census_tract pickup/dropoff_community_area Dense Float Features trip_miles fare trip_seconds Bucket Features pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude Vocab Features payment_type company

slide-48
SLIDE 48

Data Validation and Transformation

Clemens Mewald

slide-49
SLIDE 49

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Overview

Airflow Runtime

Data Ingestion Data Transformation Data Analysis & Validation

slide-50
SLIDE 50

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

ExampleGen

Airflow Runtime

slide-51
SLIDE 51

Component: ExampleGen

Example Gen Raw Data

Inputs and Outputs

CSV TF Record Split TF Record Data Training Eval

slide-52
SLIDE 52

Component: ExampleGen

examples = csv_input(os.path.join(data_root, 'simple')) example_gen = CsvExampleGen(input_base=examples)

Configuration

Example Gen Raw Data

Inputs and Outputs

CSV TF Record Split TF Record Data Training Eval

slide-53
SLIDE 53

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Data Analysis & Validation

Airflow Runtime

Data Analysis & Validation

slide-54
SLIDE 54

Why Data Validation is important

ML

slide-55
SLIDE 55

Why Data Validation is important

garbage in garbage out ML

slide-56
SLIDE 56

Why Data Validation is important

Data understanding is important for model understanding

“Why are my tip predictions bad in the morning hours?”

slide-57
SLIDE 57

Why Data Validation is important

Data understanding is important for model understanding

“What are expected values for payment types?”

Treat data as you treat code

slide-58
SLIDE 58

Why Data Validation is important

Data understanding is important for model understanding

“Is this new taxi company name a typo or a new company?”

Treat data as you treat code Catching errors early is critical

slide-59
SLIDE 59

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

StatisticsGen

Airflow Runtime

slide-60
SLIDE 60

Component: StatisticsGen

StatisticsGen Data ExampleGen

Inputs and Outputs

Statistics

  • Training
  • Eval
  • Serving logs (for skew detection)
slide-61
SLIDE 61

Component: StatisticsGen

StatisticsGen Data ExampleGen

Inputs and Outputs

Statistics

  • Captures shape of data
  • Visualization highlights unusual stats
  • Overlay helps with comparison
slide-62
SLIDE 62

Component: StatisticsGen

statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)

Configuration Visualization

StatisticsGen Data ExampleGen

Inputs and Outputs

Statistics

slide-63
SLIDE 63

Why are my tip predictions bad in the morning hours?

slide-64
SLIDE 64

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

SchemaGen

Airflow Runtime

slide-65
SLIDE 65

Component: SchemaGen

SchemaGen Statistics StatisticsGen

Inputs and Outputs

Schema

  • High-level description of the data

○ Expected features ○ Expected value domains ○ Expected constraints ○ and much more!

  • Codifies expectations of “good” data
  • Initially inferred, then user-curated
slide-66
SLIDE 66

Component: SchemaGen

SchemaGen Statistics StatisticsGen

Inputs and Outputs

Schema

infer_schema = SchemaGen(stats=statistics_gen.outputs.output)

Configuration Visualization

slide-67
SLIDE 67

What are expected values for payment types?

slide-68
SLIDE 68

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

ExampleValidator

Airflow Runtime

slide-69
SLIDE 69

Component: ExampleValidator

Example Validator Statistics Schema StatisticsGen SchemaGen

Inputs and Outputs

Anomalies Report

  • Missing features
  • Wrong feature valency
  • Training/serving skew
  • Data distribution drift
  • ...
slide-70
SLIDE 70

Component: ExampleValidator

Example Validator Statistics Schema StatisticsGen SchemaGen

Inputs and Outputs

Anomalies Report

validate_stats = ExampleValidator( stats=statistics_gen.outputs.output, schema=infer_schema.outputs.output)

Configuration Visualization

slide-71
SLIDE 71

Is this new taxi company name a typo or a new company?

slide-72
SLIDE 72

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Transform

Airflow Runtime

slide-73
SLIDE 73

Recap: End-to-End Example

Chicago Taxi Cab Dataset

string_to_int bucketize scale_to_z_score

Features Transforms Model (Wide+Deep) Label = tips > (fare * 20%)

Categorical Features trip_start_hour trip_start_day trip_start_month pickup/dropoff_census_tract pickup/dropoff_community_area Dense Float Features trip_miles fare trip_seconds Bucket Features pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude Vocab Features payment_type company

slide-74
SLIDE 74

Using tf.Transform for feature transformations.

slide-75
SLIDE 75

Using tf.Transform for feature transformations.

slide-76
SLIDE 76

Using tf.Transform for feature transformations.

Training Serving

slide-77
SLIDE 77

Component: Transform

Transform Data Schema Transform Graph Transformed Data ExampleGen SchemaGen Trainer

Inputs and Outputs

  • User-provided transform code (TF Transform)
  • Schema for parsing

Code

slide-78
SLIDE 78

Component: Transform

Transform Data Schema Transform Graph Transformed Data ExampleGen SchemaGen Trainer

Inputs and Outputs Transform Graph

  • Applied at training time
  • Embedded in serving graph

(Optional) Transformed Data

  • For performance optimization

Code

slide-79
SLIDE 79

Component: Transform

transform = Transform( input_data=example_gen.outputs.examples, schema=infer_schema.outputs.output, module_file=taxi_module_file)

Configuration

for key in _DENSE_FLOAT_FEATURE_KEYS:

  • utputs[_transformed_name(key)] = transform.scale_to_z_score(

_fill_in_missing(inputs[key])) # ...

  • utputs[_transformed_name(_LABEL_KEY)] = tf.where(

tf.is_nan(taxi_fare), tf.cast(tf.zeros_like(taxi_fare), tf.int64), # Test if the tip was > 20% of the fare. tf.cast( tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64)) # ...

Code

Transform Data Schema Transform Graph Transformed Data ExampleGen SchemaGen Trainer

Inputs and Outputs

Code

slide-80
SLIDE 80

def preprocessing_fn(inputs): ... for key in taxi.DENSE_FLOAT_FEATURE_KEYS:

  • utputs[key] = transform.scale_to_z_score(inputs[key])

for key in taxi.VOCAB_FEATURE_KEYS:

  • utputs[key] = transform.string_to_int(inputs[key],

top_k=taxi.VOCAB_SIZE, num_oov_buckets=taxi.OOV_SIZE) for key in taxi.BUCKET_FEATURE_KEYS:

  • utputs[key] = transform.bucketize(inputs[key],

taxi.FEATURE_BUCKET_COUNT) ...

slide-81
SLIDE 81

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Overview

Airflow Runtime

Data Ingestion Data Transformation Data Analysis & Validation

slide-82
SLIDE 82

Training

Clemens Mewald

slide-83
SLIDE 83

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Trainer

Airflow Runtime

slide-84
SLIDE 84

Component: Trainer

Trainer Data Schema Transform SchemaGen Evaluator

Inputs and Outputs

Code Transform Graph Model Validator Pusher Model(s)

  • User-provided training code (TensorFlow)
  • Optionally, transformed data
slide-85
SLIDE 85

Component: Trainer

Trainer Data Schema Transform SchemaGen Evaluator

Inputs and Outputs

Code Transform Graph Model Validator Pusher

Highlight: SavedModel Format

TensorFlow Serving TensorFlow Model Analysis

Train, Eval, and Inference Graphs

SignatureDef

Eval Metadata SignatureDef

Model(s)

slide-86
SLIDE 86

Component: Trainer

trainer = Trainer( module_file=taxi_module_file, transformed_examples=transform.outputs.transformed_examples, schema=infer_schema.outputs.output, transform_output=transform.outputs.transform_output, train_steps=10000, eval_steps=5000, warm_starting=True)

Configuration Code: Just TensorFlow :)

Trainer Data Schema Transform SchemaGen Evaluator

Inputs and Outputs

Code Transform Graph Model Validator Pusher Model(s)

slide-87
SLIDE 87

def train_and_maybe_evaluate(hparams): schema = taxi.read_schema(hparams.schema_file) train_input = lambda: model.input_fn(...) eval_input = lambda: model.input_fn(...) serving_receiver_fn = lambda: model.example_serving_receiver_fn(...) train_spec = tf.estimator.TrainSpec(...) eval_spec = tf.estimator.EvalSpec(...) exporter = tf.estimator.FinalExporter('chicago-taxi', serving_receiver_fn) run_config = tf.estimator.RunConfig( save_checkpoints_steps=999, keep_checkpoint_max=1) estimator = model.build_estimator(...) tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) return estimator

slide-88
SLIDE 88

def build_estimator(tf_transform_dir, config, hidden_units=None): # receive Schema and Transform metadata metadata_dir = os.path.join(tf_transform_dir, transform_fn_io.TRANSFORMED_METADATA_DIR) transformed_metadata = metadata_io.read_metadata(metadata_dir) transformed_feature_spec = transformed_metadata.schema.as_feature_spec() transformed_feature_spec.pop(taxi.transformed_name(taxi.LABEL_KEY)) real_valued_columns = [...] categorical_columns = [...] return tf.estimator.DNNLinearCombinedClassifier( config=config, linear_feature_columns=categorical_columns, dnn_feature_columns=real_valued_columns, dnn_hidden_units=hidden_units or [100, 70, 50, 25])

slide-89
SLIDE 89

model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test)

Keras: TF 2.0

slide-90
SLIDE 90

model = tf.keras.models.Sequential([ tf.keras.layers.Dense(64, input_shape=[10]), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(10, activation='softmax')]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Going big: tf.distribute.Strategy

slide-91
SLIDE 91

strategy = tf.distribute.MirroredStrategy() with strategy.scope(): model = tf.keras.models.Sequential([ tf.keras.layers.Dense(64, input_shape=[10]), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(10, activation='softmax')]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Going big: Multi-GPU

slide-92
SLIDE 92

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy() with strategy.scope(): model = tf.keras.models.Sequential([ tf.keras.layers.Dense(64, input_shape=[10]), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(10, activation='softmax')]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Coming soon: Multi-node synchronous

slide-93
SLIDE 93

saved_model_path = tf.keras.experimental.export_saved_model( model, '/path/to/model') new_model = tf.keras.experimental.load_from_saved_model( saved_model_path) new_model.summary()

To SavedModel and beyond

slide-94
SLIDE 94

Model Evaluation and Analysis

Clemens Mewald

slide-95
SLIDE 95

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Model Analysis & Validation

Airflow Runtime

slide-96
SLIDE 96

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Evaluator

Airflow Runtime

slide-97
SLIDE 97

Why Model Evaluation is important

Assess overall model quality overall

“How well can I predict trips that result in tips > 20%?”

slide-98
SLIDE 98

Why Model Evaluation is important

Assess overall model quality overall

“Why are my tip predictions sometimes wrong?”

Assess model quality on specific segments / slices

slide-99
SLIDE 99

Why Model Evaluation is important

Assess overall model quality overall

“Am I getting better at predicting trips with tips > 20%?”

Assess model quality on specific segments / slices Track performance over time

slide-100
SLIDE 100

Component: Evaluator

Evaluator Data Model ExampleGen Trainer

Inputs and Outputs

Evaluation Metrics

  • Evaluation split of data
  • Eval spec for slicing of metrics
slide-101
SLIDE 101

Component: Evaluator

Evaluator Data Model ExampleGen Trainer

Inputs and Outputs

Evaluation Metrics

model_analyzer = Evaluator( examples=examples_gen.outputs.output, eval_spec=taxi_eval_spec, model_exports=trainer.outputs.output)

Configuration Visualization

slide-102
SLIDE 102

How well can I predict trips that result in tips > 20%?

slide-103
SLIDE 103

Why are my tip predictions sometimes wrong?

slide-104
SLIDE 104

Am I getting better at predicting trips with tips > 20%?

slide-105
SLIDE 105

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

ModelValidator

Airflow Runtime

slide-106
SLIDE 106

Why Model Validation is important

Avoid pushing models with degraded quality

“Why am I suddenly getting tipped less?”

slide-107
SLIDE 107

Why Model Validation is important

Avoid pushing models with degraded quality.

“Why am I no longer getting recommendations for trips?”

Avoid breaking downstream components (e.g. serving)

slide-108
SLIDE 108

Component: ModelValidator

Model Validator Data ExampleGen Trainer

Inputs and Outputs

Validation Outcome Model (x2)

  • Evaluation split of data
  • Last validated model
  • New candidate model
slide-109
SLIDE 109

Component: ModelValidator

Model Validator Data ExampleGen Trainer

Inputs and Outputs

Validation Outcome Model (x2)

model_validator = ModelValidator( examples=examples_gen.outputs.output, model=trainer.outputs.output, eval_spec=taxi_mv_spec)

Configuration

  • Configuration options

○ Validate using current eval data ○ “Next-day eval”, validate using unseen data

slide-110
SLIDE 110

Kubeflow Runtime

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Pusher

Airflow Runtime

slide-111
SLIDE 111

Component: Pusher

Pusher Validation Outcome Model Validator

Inputs and Outputs

Pusher Pusher Deployment Options

slide-112
SLIDE 112

Component: Pusher

Pusher Validation Outcome Model Validator

Inputs and Outputs

Pusher Pusher Deployment Options

pusher = Pusher( model_export=trainer.outputs.output, model_blessing=model_validator.outputs.blessing, serving_model_dir=serving_model_dir)

Configuration

  • Block push on validation outcome
  • Push destinations supported today

○ Filesystem ○ TF Serving model server

slide-113
SLIDE 113

Model Deployment

Clemens Mewald

slide-114
SLIDE 114

This is where you are ...

$ curl -d '{"instances": [60, 27.05, ...]}' -X POST http://localhost:8501/v1/models/chicago_taxi:predict { "results": [1.0] }

This is where you want to be ...

assets/ variables/ variables.data-*****-of-***** variables.index saved_model.pb

A Trained SavedModel

slide-115
SLIDE 115

Deploy anywhere

JavaScript Edge devices Servers

Serving Lite .JS

slide-116
SLIDE 116

TensorFlow Serving

Flexible

Multi-tenancy Optimize with GPU and TensorRT gRPC or REST API

slide-117
SLIDE 117

TensorFlow Serving

High-Performance

Low-latency Request Batching Traffic Isolation

slide-118
SLIDE 118

TensorFlow Serving

Production-Ready

Used for years at Google, millions of QPS Scale in minutes Dynamic version refresh

slide-119
SLIDE 119

$ apt-get install tensorflow-model-server $ tensorflow_model_server

  • -port=8501
  • -model_name=chicago_taxi
  • -model_base_path='/path/to/savedmodel'

Deploy a REST API for your model in minutes ..

$ docker run -p 8501:8501 \

  • v '/path/to/savedmodel':/models/chicago_taxi
  • e MODEL_NAME=chicago_taxi -t tensorflow/serving

... or locally on your host ... ... using Docker ...

slide-120
SLIDE 120

Easily enable* hardware acceleration

$ docker run -p 8501:8501 \

  • v '/path/to/savedmodel':/models/chicago_taxi
  • e MODEL_NAME=chicago_taxi -t tensorflow/serving

$ docker run --runtime=nvidia -p 8501:8501 \

  • v /path/to/savedmodel:/models/chicago_taxi
  • e MODEL_NAME=chicago_taxi -t tensorflow/serving:latest-gpu

* Only possible if running on a host equipped with a GPU and nvidia-docker installed

slide-121
SLIDE 121

Optimize your model for serving using TensorRT

$ saved_model_cli convert --dir '/path/to/savedmodel'

  • -output_dir '/path/to/trt-savedmodel' --tag_set serve tensorrt

$ docker run --runtime=nvidia -p 8501:8501 \

  • v /path/to/trt-savedmodel:/models/chicago_taxi
  • e MODEL_NAME=chicago_taxi -t tensorflow/serving:latest-gpu
slide-122
SLIDE 122

ExampleGen StatisticsGen SchemaGen Example Validator Transform

Putting it all together again

Training + Eval Data

slide-123
SLIDE 123

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer

Putting it all together again

Training + Eval Data

slide-124
SLIDE 124

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator

Putting it all together again

Training + Eval Data

slide-125
SLIDE 125

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Putting it all together again

Training + Eval Data

slide-126
SLIDE 126

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Putting it all together again

slide-127
SLIDE 127

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Putting it all together again

slide-128
SLIDE 128

ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher

TFX Config Metadata Store

Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS

Putting it all together again

Kubeflow Runtime Airflow Runtime

slide-129
SLIDE 129

Get started with TensorFlow Extended (TFX)

An End-to-End ML Platform

github.com/tensorflow/tfx tensorflow.org/tfx