Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - PowerPoint PPT Presentation

Michelle Casbon QCon São Paulo   May 9, 2018 Architecture of an NLP Deployment

whoami � 2 @texasmichelle

Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures � 3 @texasmichelle

Is this a No clearly Move along defined problem? ML Yes decision Can it be Yes solved in a tree Do that deterministic way? No Dive in Source: David Andrzejewski @davidandrzej � 4 @texasmichelle

MACHINE LEARNING Counting things is still really hard. � 5 @texasmichelle

Evolution of NLP architectures Hand-crafted, Purpose-built artisan systems In the beginning Yesteryear Today Tomorrow Cobbled-together with tools from Future other domains � 7 @texasmichelle

Application examples @texasmichelle

Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle

Data Warehouse Analytics Layer OLTP NLP ETL OLAP Microservice OLTP OLTP Model Store @texasmichelle

Data Pipeline NLP Formatter Lookup OLTP Source Microservice Model Store @texasmichelle

NLP microservice(s) Training data preparation Data Data ingestion Data validation Data analysis transformation Training Data Featurization Model building segmentation Serving Model retrieval Featurization Prediction REST server Cross-validation Data Featurization Prediction Evaluation segmentation

Frontend Web application Orchestration Layer Microservice Microservice NLP Microservice Microservice Model OLTP OLAP Store @texasmichelle

Perception Data Resource Serving UI Collection Management Infrastructure Process Management ML Code Data Verification Configuration Feature Application Monitoring Analysis Tools Extraction Logic � 15 @texasmichelle

Reality Data Resource Serving UI Collection Management Infrastructure Process Configuration Monitoring ML Code Management Data Feature Analysis Application Verification Extraction Tools Logic � 16 @texasmichelle

Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Configuration Extraction Infrastructure Process Data Exploration Model Validation Business Logic Management Data Model Resource UI Transformation Versioning Management Data Validation Model Auditing Load Balancing Monitoring Data Analysis Distributed Logging Training Training Data Continuous Continuous Segmentation Training Delivery Authentication/ Authorization � 17 @texasmichelle

Guiding principles Robustness Resiliency High availability Autoscaling Fault-tolerant Constrained resource Versioning Continuous delivery consumption Models Optimize for person- Per microservice hours Data Hyperparameters System config @texasmichelle

Guiding principles Everything in one Everything in Automation Empowerment place source control Tests Deployment If you don't need to Take it with you Store everything Communicate progress manage it yourself, Positive & negative don't training data Goals Add feedback to the UI Measure Logging Traction Monitoring Transparency @texasmichelle

Duolingo legacy Source: Rewriting Duolingo's engine in Scala @texasmichelle

Duolingo today Source: Rewriting Duolingo's engine in Scala • Redesigned architecture • Refactored code from Python to Scala • Latency dropped from 750ms to 14ms • Engine uptime increased from 99.9% to 100% @texasmichelle

architecture Frontend Orchestration Layer Microservice NLP Microservice NLP Microservice Microservice Microservice � 24 @texasmichelle

Qordoba on GCP � 25 @texasmichelle

Qordoba � 26 @texasmichelle

@texasmichelle

GitOps Optimizes for person-hours ● Empowers engineers & data scientists ● Cluster state is always recoverable, with a historical record ● Create a new feature PR review Create a new feature Deployment Create a new PR review Verify feature feature Deployment PR review Create a new Verify feature feature Deployment Create a new PR review Verify feature feature Deployment PR review @texasmichelle Verify feature Deploymen

Kubeflow Who What Why Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers 0.1 release alone Software engineers Product managers https://github.com/kubeflow/kubeflow @texasmichelle

Make it easy for everyone to develop, deploy, & manage portable, scalable ML everywhere Composability Portability Scalability Full product Support lifecycle specialized Single, unified tool Entire stack Native to k8s hardware, like for common GPUs Reduce variability processes between services & Reduce costs environments Improve model performance � 30

Kubeflow Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML frameworks platform for ML components together Microservices Tensorflow Run wherever k8s runs Ksonnet Manage infra Pytorch Use k8s to manage ML declaratively Move between local -> tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al. � 31

E2E Example GitHub Issue Summarization ● How to summarize text and generate features from GitHub Issues using deep learning with Keras and ○ TensorFlow https://github.com/kubeflow/examples/tree/master/github_issue_summarization Kubeflow installation with ksonnet ● Persistent disk usage ● Jupyterhub ● Source: Hamel Husain @texasmichelle

Exploration/experimentation Choose a dataset ● Slice and dice ● Try out various means of featurization ● ● Train a number of models & compare Plot various statistics along the way ● Jupyterhub on k8s ● Security ○ Reproducibility ○ Resource allocation ○ Scale beyond a laptop ○ Centralized storage ○ @texasmichelle

E2E Example Scaling featurization and training ● TFJob ○ ○ tensor2tensor Model deployment with SeldonIO ● Accessing via a simple web app ● Teardown ● @texasmichelle

Try it yourself GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● Katacoda: https://www.katacoda.com/kubeflow ● http://gh-demo.kubeflow.org/ ● Special thanks Jeremy Lewi Ankush Agarwal @texasmichelle

Just the beginning Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, ● libraries, and example models You tell us! Get involved ● github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ kubeflow-discuss@googlegroups.com ○

OK Google, build me a classifier. Future Michelle @texasmichelle

g.co/next18 July 24-27, 2018 San Francisco

Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - PowerPoint PPT Presentation

Michelle Casbon QCon So Paulo May 9, 2018 Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures 3 @texasmichelle

Myth Busters Open source and Security By Aseem Jakhar $ whoami $ whoami We break break

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

DEPLOYMENT BAT REVIEW TANKER TOWLINE DEPLOYMENT BAT REVIEW TANKER TOWLINE DEPLOYMENT BAT REVIEW

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

HIGH-PERFORMANCE VMS USING OPENSTACK NOVA by Nikola ipanov $ WHOAMI $ WHOAMI Software

Counterattack Turning the tables on exploitation attempts from tools like Metasploit whoami

Taking you API to the next level Django Rest Framework $whoami Carlos Martnez Backend

Platforms FTW! Matt OKeefe $ whoami Developer -> Architect -> CTO $ whoami

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

O p t i mi z i n g A c c e s s A c r o s s Mu l t i p l e H i e r a

From Factbook to Dashboard at The University of Texas System Office of Strategic Initiatives Dr.

Outcomes Based Business Analytics West Virginia Department of Health & Human Resources

4 th QS-Maple Abu Dhabi, UAE, May 2014 1 DATA IN TEACHING AND LEARNING Why should we teach

Current Developments in Database Research Dr. Eike Schallehn Prof. Dr. Gunter Saake { eike|saake

My experience with PostgreSQL and Orange in data mining $ whoami Im a lecturer at UC

LaGov LaGov Version 2.0 2 Before we get started ... Logistics Ground Rules Has

Project General Presentation This project hasreceived funding from the European Unions Horizon

Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda - PowerPoint PPT Presentation

Michelle Casbon QCon So Paulo May 9, 2018 Architecture of an NLP Deployment whoami 2 @texasmichelle Agenda 1 2 3 4 5 Evolution of NLP Components Guiding principles Implementations Future architectures 3 @texasmichelle

Myth Busters Open source and Security By Aseem Jakhar $ whoami $ whoami We break break

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

DEPLOYMENT BAT REVIEW TANKER TOWLINE DEPLOYMENT BAT REVIEW TANKER TOWLINE DEPLOYMENT BAT REVIEW

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

HIGH-PERFORMANCE VMS USING OPENSTACK NOVA by Nikola ipanov $ WHOAMI $ WHOAMI Software

Counterattack Turning the tables on exploitation attempts from tools like Metasploit whoami

Taking you API to the next level Django Rest Framework $whoami Carlos Martnez Backend

Platforms FTW! Matt OKeefe $ whoami Developer -&gt; Architect -&gt; CTO $ whoami

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

O p t i mi z i n g A c c e s s A c r o s s Mu l t i p l e H i e r a

From Factbook to Dashboard at The University of Texas System Office of Strategic Initiatives Dr.

Outcomes Based Business Analytics West Virginia Department of Health &amp; Human Resources

4 th QS-Maple Abu Dhabi, UAE, May 2014 1 DATA IN TEACHING AND LEARNING Why should we teach

Current Developments in Database Research Dr. Eike Schallehn Prof. Dr. Gunter Saake { eike|saake

My experience with PostgreSQL and Orange in data mining $ whoami Im a lecturer at UC

LaGov LaGov Version 2.0 2 Before we get started ... Logistics Ground Rules Has

Project General Presentation This project hasreceived funding from the European Unions Horizon

Platforms FTW! Matt OKeefe $ whoami Developer -> Architect -> CTO $ whoami

Outcomes Based Business Analytics West Virginia Department of Health & Human Resources