Architecture of an NLP Deployment
Michelle Casbon QCon São Paulo May 9, 2018
Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - - PowerPoint PPT Presentation
Michelle Casbon QCon So Paulo May 9, 2018 Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures 3 @texasmichelle
Michelle Casbon QCon São Paulo May 9, 2018
@texasmichelle
2
@texasmichelle
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
1 2 3 4 5
3
@texasmichelle
Move along Is this a clearly defined problem? Can it be solved in a deterministic way? Do that Dive in No No Yes Yes
4
Source: David Andrzejewski @davidandrzej@texasmichelle
Counting things is still really hard.
5
@texasmichelle
2
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
1 3 4 5
6
@texasmichelle
In the beginning Yesteryear Today Tomorrow
Hand-crafted, artisan systems Purpose-built Cobbled-together with tools from
Future
7
@texasmichelle
@texasmichelle
Web application
Frontend Orchestration Layer NLP Microservice Microservice Microservice Microservice Model Store OLAP OLTP
@texasmichelle
Data Warehouse
Analytics Layer NLP Microservice ETL Model Store OLAP OLTP OLTP OLTP
@texasmichelle
Data Pipeline
NLP Microservice Model Store OLTP Formatter
Source
Lookup
Serving REST server Prediction Featurization Model retrieval Training Model building Featurization Data segmentation Training data preparation Data analysis Data validation Data transformation Data ingestion
NLP microservice(s)
Cross-validation Evaluation Prediction Featurization Data segmentation
@texasmichelle
Web application
Frontend Orchestration Layer NLP Microservice Microservice Microservice Microservice Model Store OLAP OLTP
@texasmichelle
1 2
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
3 4 5
14
@texasmichelle
Perception
Configuration Data Collection Data Verification Analysis Tools Feature Extraction
Monitoring Serving Infrastructure Process Management Resource Management UI Application Logic
15
@texasmichelle
Reality
Configuration Data Collection Data Verification Analysis Tools Feature Extraction
ML Code
Monitoring Serving Infrastructure Process Management Resource Management UI Application Logic
16
@texasmichelle
Feature Extraction Data Ingestion Data Exploration Data Transformation Data Validation Data Analysis Training Data Segmentation Model Building Model Validation Model Versioning Model Auditing Distributed Training Continuous Training Process Management Configuration Resource Management Monitoring Logging Continuous Delivery Authentication/ Authorization Serving Infrastructure UI Business Logic Load Balancing
Data Featurization Training Application Platform 17
@texasmichelle
2
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
1 3 4 5
18
@texasmichelle
Guiding principles
Robustness Resiliency Fault-tolerant High availability Autoscaling Constrained resource consumption Per microservice Versioning Models Data Hyperparameters System config Continuous delivery Optimize for person- hours
@texasmichelle
Guiding principles
Everything in one place Everything in source control Automation Tests Deployment Empowerment If you don't need to manage it yourself, don't Take it with you
Store everything Positive & negative training data Add feedback to the UI Logging Monitoring Communicate progress Goals Measure Traction Transparency
@texasmichelle
2
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
1 3 4 5
21
@texasmichelle
Duolingo legacy
Source: Rewriting Duolingo's engine in Scala
@texasmichelle
Duolingo today
Source: Rewriting Duolingo's engine in Scala
@texasmichelle
architecture
24
Frontend Orchestration Layer Microservice Microservice Microservice NLP Microservice NLP Microservice
@texasmichelle
Qordoba on GCP
25
@texasmichelle
Qordoba
26
@texasmichelle
@texasmichelle
GitOps
Create a new feature PR review Create a new feature PR review Deployment Verify feature Deployment Verify feature Create a new feature PR review Deployment Verify feature Create a new feature PR review Deployment Verify feature Create a new feature PR review Deploymen
@texasmichelle
Kubeflow
Who Data scientists ML researchers Software engineers Product managers Why Because building a platform is too big of a problem to tackle alone What Portable ML products on k8s 0.1 release
https://github.com/kubeflow/kubeflow
30
Make it easy for everyone to develop, deploy, & manage portable, scalable ML everywhere
Composability Single, unified tool for common processes Portability Entire stack Scalability Native to k8s Reduce variability between services & environments Full product lifecycle Support specialized hardware, like GPUs Reduce costs Improve model performance
31
Kubeflow
Kubernetes-native platform for ML Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns Microservices Manage infra declaratively Package infrastructure components together Ksonnet Move between local -> dev -> test -> prod ->
Support multiple ML frameworks Tensorflow Pytorch Scikit Xgboost Et al.
@texasmichelle
E2E Example
○ How to summarize text and generate features from GitHub Issues using deep learning with Keras and TensorFlow https://github.com/kubeflow/examples/tree/master/github_issue_summarization
Source: Hamel Husain
@texasmichelle
Exploration/experimentation
○ Security ○ Reproducibility ○ Resource allocation ○ Scale beyond a laptop ○ Centralized storage
@texasmichelle
E2E Example
○ TFJob ○ tensor2tensor
@texasmichelle
Try it yourself
Jeremy Lewi Ankush Agarwal
Special thanks
Just the beginning
libraries, and example models
○ github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ kubeflow-discuss@googlegroups.com
@texasmichelle
2
Agenda
Evolution of NLP architectures Components Guiding principles Implementations Future
1 3 4 5
37
@texasmichelle
Future Michelle
July 24-27, 2018 San Francisco g.co/next18