Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - - PowerPoint PPT Presentation

architecture of an nlp deployment whoami
SMART_READER_LITE
LIVE PREVIEW

Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - - PowerPoint PPT Presentation

Michelle Casbon QCon So Paulo May 9, 2018 Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures 3 @texasmichelle


slide-1
SLIDE 1

Architecture of an NLP Deployment

Michelle Casbon QCon São Paulo
 May 9, 2018

slide-2
SLIDE 2

@texasmichelle

whoami

2

slide-3
SLIDE 3

@texasmichelle

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

1 2 3 4 5

3

slide-4
SLIDE 4

@texasmichelle

Move along Is this a clearly defined problem? Can it be solved in a deterministic way? Do that Dive in No No Yes Yes

ML decision tree

4

Source: David Andrzejewski @davidandrzej
slide-5
SLIDE 5

@texasmichelle

Counting things is still really hard.

MACHINE LEARNING

5

slide-6
SLIDE 6

@texasmichelle

2

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

1 3 4 5

6

slide-7
SLIDE 7

@texasmichelle

In the beginning Yesteryear Today Tomorrow

Evolution of NLP architectures

Hand-crafted, artisan systems Purpose-built Cobbled-together with tools from

  • ther domains

Future

7

slide-8
SLIDE 8

@texasmichelle

Application examples

slide-9
SLIDE 9

@texasmichelle

Web application

Frontend Orchestration Layer NLP Microservice Microservice Microservice Microservice Model Store OLAP OLTP

slide-10
SLIDE 10

@texasmichelle

Data Warehouse

Analytics Layer NLP Microservice ETL Model Store OLAP OLTP OLTP OLTP

slide-11
SLIDE 11

@texasmichelle

Data Pipeline

NLP Microservice Model Store OLTP Formatter

Source

Lookup

slide-12
SLIDE 12

Serving REST server Prediction Featurization Model retrieval Training Model building Featurization Data segmentation Training data preparation Data analysis Data validation Data transformation Data ingestion

NLP microservice(s)

Cross-validation Evaluation Prediction Featurization Data segmentation

slide-13
SLIDE 13

@texasmichelle

Web application

Frontend Orchestration Layer NLP Microservice Microservice Microservice Microservice Model Store OLAP OLTP

slide-14
SLIDE 14

@texasmichelle

1 2

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

3 4 5

14

slide-15
SLIDE 15

@texasmichelle

Perception

Configuration Data Collection Data Verification Analysis Tools Feature Extraction

ML Code

Monitoring Serving Infrastructure Process Management Resource Management UI Application Logic

15

slide-16
SLIDE 16

@texasmichelle

Reality

Configuration Data Collection Data Verification Analysis Tools Feature Extraction

ML Code

Monitoring Serving Infrastructure Process Management Resource Management UI Application Logic

16

slide-17
SLIDE 17

@texasmichelle

Feature Extraction Data Ingestion Data Exploration Data Transformation Data Validation Data Analysis Training Data Segmentation Model Building Model Validation Model Versioning Model Auditing Distributed Training Continuous Training Process Management Configuration Resource Management Monitoring Logging Continuous Delivery Authentication/ Authorization Serving Infrastructure UI Business Logic Load Balancing

Data Featurization Training Application Platform 17

slide-18
SLIDE 18

@texasmichelle

2

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

1 3 4 5

18

slide-19
SLIDE 19

@texasmichelle

Guiding principles

Robustness Resiliency Fault-tolerant High availability Autoscaling Constrained resource consumption Per microservice Versioning Models Data Hyperparameters System config Continuous delivery Optimize for person- hours

slide-20
SLIDE 20

@texasmichelle

Guiding principles

Everything in one place Everything in source control Automation Tests Deployment Empowerment If you don't need to manage it yourself, don't Take it with you

Store everything Positive & negative training data Add feedback to the UI Logging Monitoring Communicate progress Goals Measure Traction Transparency

slide-21
SLIDE 21

@texasmichelle

2

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

1 3 4 5

21

slide-22
SLIDE 22

@texasmichelle

Duolingo legacy

Source: Rewriting Duolingo's engine in Scala

slide-23
SLIDE 23

@texasmichelle

Duolingo today

Source: Rewriting Duolingo's engine in Scala

  • Redesigned architecture
  • Refactored code from Python to Scala
  • Latency dropped from 750ms to 14ms
  • Engine uptime increased from 99.9% to 100%
slide-24
SLIDE 24

@texasmichelle

architecture

24

Frontend Orchestration Layer Microservice Microservice Microservice NLP Microservice NLP Microservice

slide-25
SLIDE 25

@texasmichelle

Qordoba on GCP

25

slide-26
SLIDE 26

@texasmichelle

Qordoba

26

slide-27
SLIDE 27

@texasmichelle

slide-28
SLIDE 28

@texasmichelle

GitOps

  • Optimizes for person-hours
  • Empowers engineers & data scientists
  • Cluster state is always recoverable, with a historical record

Create a new feature PR review Create a new feature PR review Deployment Verify feature Deployment Verify feature Create a new feature PR review Deployment Verify feature Create a new feature PR review Deployment Verify feature Create a new feature PR review Deploymen

slide-29
SLIDE 29

@texasmichelle

Kubeflow

Who Data scientists ML researchers Software engineers Product managers Why Because building a platform is too big of a problem to tackle alone What Portable ML products on k8s 0.1 release

https://github.com/kubeflow/kubeflow

slide-30
SLIDE 30

30

Make it easy for everyone to develop, deploy, & manage portable, scalable ML everywhere

Composability Single, unified tool for common processes Portability Entire stack Scalability Native to k8s Reduce variability between services & environments Full product lifecycle Support specialized hardware, like GPUs Reduce costs Improve model performance

slide-31
SLIDE 31

31

Kubeflow

Kubernetes-native platform for ML Run wherever k8s runs Use k8s to manage ML tasks CRDs for distributed training Adopt k8s patterns Microservices Manage infra declaratively Package infrastructure components together Ksonnet Move between local -> dev -> test -> prod ->

  • nprem

Support multiple ML frameworks Tensorflow Pytorch Scikit Xgboost Et al.

slide-32
SLIDE 32

@texasmichelle

E2E Example

  • GitHub Issue Summarization

○ How to summarize text and generate features from GitHub Issues using deep learning with Keras and TensorFlow https://github.com/kubeflow/examples/tree/master/github_issue_summarization

  • Kubeflow installation with ksonnet
  • Persistent disk usage
  • Jupyterhub

Source: Hamel Husain

slide-33
SLIDE 33

@texasmichelle

Exploration/experimentation

  • Choose a dataset
  • Slice and dice
  • Try out various means of featurization
  • Train a number of models & compare
  • Plot various statistics along the way
  • Jupyterhub on k8s

○ Security ○ Reproducibility ○ Resource allocation ○ Scale beyond a laptop ○ Centralized storage

slide-34
SLIDE 34

@texasmichelle

E2E Example

  • Scaling featurization and training

○ TFJob ○ tensor2tensor

  • Model deployment with SeldonIO
  • Accessing via a simple web app
  • Teardown
slide-35
SLIDE 35

@texasmichelle

Try it yourself

  • GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization
  • Katacoda: https://www.katacoda.com/kubeflow
  • http://gh-demo.kubeflow.org/

Jeremy Lewi Ankush Agarwal

Special thanks

slide-36
SLIDE 36

Just the beginning

  • Easier setup
  • Utilize more k8s features
  • Add support for packages, frameworks,

libraries, and example models

  • You tell us! Get involved

○ github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ kubeflow-discuss@googlegroups.com

slide-37
SLIDE 37

@texasmichelle

2

Agenda

Evolution of NLP architectures Components Guiding principles Implementations Future

1 3 4 5

37

slide-38
SLIDE 38

@texasmichelle

OK Google, build me a classifier.

Future Michelle

slide-39
SLIDE 39

July 24-27, 2018 San Francisco g.co/next18