simplifying ml workflows with apache beam tensorflow
play

Simplifying ML Workflows with Apache Beam & TensorFlow Extended - PowerPoint PPT Presentation

Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC + Apache Beam Portable data-processing pipelines + Example pipelines Python Java + Cross-language


  1. Simplifying ML Workflows with Apache Beam & TensorFlow Extended Tyler Akidau @takidau Software Engineer at Google Apache Beam PMC +

  2. Apache Beam Portable data-processing pipelines +

  3. Example pipelines Python Java +

  4. Cross-language Portability Framework Language A Language B Language C SDK SDK SDK The Beam Model Runner 1 Runner 2 Runner 3 The Beam Model Language A Language B Language C +

  5. Python compatible runners Direct runner (local machine): Now Google Cloud Dataflow: Now Apache Flink: Q2-Q3 Apache Spark: Q3-Q4 +

  6. TensorFlow Extended End-to-end machine learning in production +

  7. “Doing ML in production is hard.” -Everyone who has ever tried

  8. Because, in addition to the actual ML... ML Code +

  9. ...you have to worry about so much more. Data Monitoring Verification Configuration Data Collection Analysis Tools ML Code Serving Process Management Machine Infrastructure Tools Resource Feature Extraction Management + Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

  10. In this talk, I will... +

  11. In this talk, I will... Show you how to apply transformations... Show you how to apply transformations... TensorFlow Transform +

  12. In this talk, we will... Show you how to apply transformations... ... consistently between Training and Serving TensorFlow TensorFlow TensorFlow Transform Estimators Serving +

  13. In this talk, we will... Introduce something new... TensorFlow TensorFlow TensorFlow TensorFlow Model Transform Estimators Serving Analysis +

  14. TensorFlow Transform Consistent In-Graph Transformations in Training and Serving +

  15. Typical ML Pipeline During training During serving request data batch processing “ live ” processing +

  16. Typical ML Pipeline During training During serving request data batch processing “ live ” processing +

  17. TensorFlow Transform During training During serving request data tf.Transform batch processing transform as tf.Graph +

  18. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( tft.normalize(x) * y), "B": tensorflow_fn(y, z), "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

  19. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) } Many operations available for dealing with text and numeric features, user can define their own. +

  20. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } Many operations available for dealing with text and numeric features, user can define their own. +

  21. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A

  22. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B

  23. Defining a preprocessing function in TF Transform X Y Z def preprocessing_fn(inputs): x = inputs['X'] ... return { "A": tft.bucketize( mean stddev tft.normalize(x) * y), "B": tensorflow_fn(y, z), normalize "C": tft.ngrams(z) multiply } quantiles Many operations available for dealing with text and numeric features, user can define their own. bucketize + A B C

  24. mean stddev Analyzers Transforms normalize Reduce (full pass) Instance-to-instance (don’t multiply change batch dimension) Implemented as a distributed data pipeline quantiles Pure TensorFlow bucketize +

  25. data constant tensors mean stddev normalize normalize Analyze multiply multiply quantiles bucketize bucketize

  26. What can be done with TF Transform? tf.Transform batch processing Pretty much anything. +

  27. What can be done with TF Transform? tf.Transform batch processing Serving Graph Anything that can be expressed Pretty much anything. as a TensorFlow Graph +

  28. Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses

  29. Some common use-cases... Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses Apply another TensorFlow Model

  30. github.com/tensorflow/transform

  31. Introducing… TensorFlow Model Analysis Scaleable, sliced, and full-pass metrics +

  32. Let’s Talk about Metrics... ● How accurate? ● Converged model? ● What about my TB sized eval set? ● Slices / subsets? ● Across model versions? +

  33. ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Sensitivity (True Positive Rate) Specificity (False Positive Rate) Learn more at ml-fairness.com

  34. ML Fairness: analyzing model mistakes by subgroup ROC Curve All groups Group A Sensitivity (True Positive Rate) Group B Specificity (False Positive Rate) Learn more at ml-fairness.com

  35. ML Fairness: understand the failure modes of your models

  36. ML Fairness: Learn More ml-fairness.com

  37. How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( estimator=estimator, eval_input_receiver_fn=eval_input_fn) ... +

  38. How does it work? Inference Graph (SavedModel) ... SignatureDef estimator = DNNLinearCombinedClassifier(...) estimator.train(...) estimator.export_savedmodel( serving_input_receiver_fn=serving_input_fn) tfma.export.export_eval_savedmodel ( Eval Graph (SavedModel) estimator=estimator, eval_input_receiver_fn=eval_input_fn) Eval SignatureDef Metadata ... +

  39. github.com/tensorflow/model-analysis

  40. Summary Apache Beam: Data-processing framework the runs locally and scales to massive data, in the Cloud (now) and soon on-premise via Flink (Q2-Q3) and Spark (Q3-Q4). Powers large-scale data processing in the TF libraries below. tf.Transform: Consistent in-graph transformations in training and serving. tf.ModelAnalysis: Scalable, sliced, and full-pass metrics. +

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend