Secure and efficient deep learning everywhere Octomizer Outline Who - PowerPoint PPT Presentation

Secure and efficient deep learning everywhere

Octomizer Outline Who we are (recap) Deployment pain The vision The Octomizer: TVM for everyone 2

Simple, secure, and efficient Drive TVM adoption Expand the set of users who can deployment of ML models in Core infrastructure deploy ML models: the edge and the cloud and improvements Services, automation, and integrations Apache TVM ecosystem OctoML 3

Founding Team - The Octonauts Luis Ceze Jason Knight Tianqi Chen Thierry Moreau Jared Roesch Co-founder, CEO Co-founder, CPO Co-founder, CTO Co-founder, Architect Co-founder, Architect PhD in Computer Architecture PhD in Computational PhD in Machine Learning PhD in Computer Architecture (soon) PhD in Programming and Compilers Biology and Machine Professor at CMU-CS Languages Professor at UW-CSE Learning Venture Partner, Madrona Ventures Previously: HLI, Previously: IBM Research, consulting Nervana, Intel for Microsoft, Apple, Qualcomm 40+ years of combined experience in computer systems design and machine learning 4

Deployment Pain/Complexity ● Model ingestion ● Performance estimation and comparison ○ Cartesian product of models, frameworks, and hardware ● Optimization ○ O0, O1, O2 ○ Target settings: march, mtune, mcpu ○ Size reductions ○ Quantization, pruning, distillation ● Custom operators (scheduling, cross hardware support) ● Lack of portability / varying coverage across frameworks ● Model integration ○ Output portability ○ Packaging (Android APK, iOS ipa, Python wheel, Maven artifact, etc) 5

Deep learning deployment should be easy. For everyone. TVM is core to making that happen. … but it’s only the first (important!) step 6

The Machine Learning Lifecycle Cloud inference Data collection, curation, annotation Model optimization Deployment Model training ● ● Quantization Packaging ● ● Custom kernels Binary size ● ● Framework Integration Edge/embedded ● modifications Build chain setup inference ● Hardware vendor partnerships Model development 7

Octomizer: deep learning optimization as a service TensorFlow, Pytorch, ONNX serialized models API and web UI Octomizer Support for efficient and secure execution Optimize over multiple clouds for Optimize for edge deployment. training and inference at scale. Longer battery life, smaller form Better latency, lower OP ex. factor, lower part cost, etc. 8

Demo (frontend and optimization) ● Simple, easy to use Python API pip install octomizer ○ export OCTOML_ACCESS_TOKEN=... ○ import octomizer model = octomizer.upload(model, params, 'resnet-18') job = model.start_job('autotvm', { # also 'onnxrt' etc!!. 'hardware': 'gcp/<instance_type>', 'TVM_NUM_THREADS': 1, 'tvm_hash': '!!.' }) while job.get_status().status != 'COMPLETE': sleep(1) model.download_pkg("base_model", 'python') # Package with default schedules model.download_pkg("optimized_model", 'python', job) 9

Octomizer optimization TensorFlow, Pytorch, ONNX Optimized ● Code generation of operator library serialized models deployment artifacts ○ Auto-tuning per hardware target, operator, and operator parameters ● Hardware targets supported: API and web UI ○ GCP cloud instances ○ ARM A class CPU/GPU ○ ARM M class microcontrollers ● On the roadmap: Octomizer ○ AWS and Azure cloud instances ○ Quantization ○ Hardware-aware architecture search ○ Compression/distillation Auto-tuning using OctoML clusters 10

Demo (visualization) 11

Octomizer under the hood ● Entire stack designed for easy, cross-cloud and private cloud/on-prem deployment ● Consists of: ○ Kubernetes ○ Kustomize for declarative deployments ○ Rust + Actix-web for robust, safe and simple deployments ○ Only external service dependency is an object store ○ Support for TVM RPC Trackers for external device management/execution ● OctoML hosted Octomizer today supports ○ GCP cloud instances ○ ARM A class CPU/GPU ○ ARM M class microcontrollers ○ More to come... 12

ML Workloads and Requirements Upcoming Hardware Existing HW (accelerator, SOC, ● CPU HW IP blocks, …) ● GPU ● FPGA Stay tuned... ● uControllers Focus today Efficient and secure execution (and perf/power estimation) 13

Stay tuned through twitter (@octoml) or email. Next steps Reach out if you have use cases to share: jknight@octoml.ai Looking for private beta partners. We are hiring see octoml.ai for more details! 14

Secure and efficient deep learning everywhere Octomizer Outline Who - PowerPoint PPT Presentation

Secure and efficient deep learning everywhere Octomizer Outline Who we are (recap) Deployment pain The vision The Octomizer: TVM for everyone 2 Simple, secure, and efficient Drive TVM adoption Expand the set of users who can deployment of

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

How Secure are Secure How Secure are Secure Interdomain Routing Protocols? Interdomain Routing

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Secure Returns SAFE AS A VAULT Secure SECONDS TO UPLOAD Efficient ACCESSIBLE ANYWHERE

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Secure and Efficient Access to Outsourced Data Secure and Efficient Access to Outsourced Data

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

BGP Here, There and Everywhere Tor Ldre 2 BGP Here, There and Everywhere The networking

Content Everywhere Content Everywhere www.erg.com Or, navigating digital communications without

Poll Everywhere Quick Guide Google Slides Part I: Creating Polls at the Poll Everywhere web

Content Everywhere Content Everywhere www.erg.com Or, navigating digital communications without

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Advanced Programming Lab 4 Collections and Streams A Collection is a group of individual

CPSC 102 Build Tools Z A IN R IZ V I What Are Build Tools? Automate the compiling of

Incremental Analysis of Interference Among Aspects Interference Among Aspects Authors: Emilia

make Eric McCreath Overview In this lecture we will: introduce the idea of automatic build

Hacking Maven how to add steroids on Maven di Massimiliano Dess InseHacking Maven Abstract

Apache Buildr in Action A short intro BED 2012 Dr. Halil-Cem Grsoy, adesso AG 29.03.12 About

Open Security Controls Assessment Language (OSCAL) Lunch with the OSCAL Developers David

Quicksort algorithm Average case analysis After today, you should be able to implement