P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L

LeCun’s Law and the Rise of Deep Learning Gradient-Based Learning Applied to Document Recognition, LeCun et al., 1998 6,000 5371 4,800 Citation Count 3711 3,600 2472 2,400 1469 1,200 762 495 318 315 264 195 167 155 118 137 72 37 43 20 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

TRANSLATION SPARK AR OCULUS VR BLOOD DONATIONS

400T+ PREDICTIONS PER DAY

1B+ PHONES RUNNING NEURAL NETS GLOBALLY

WHAT IS PYTORCH? DYNAMIC HARDWARE NEURAL ACCELERATED NETWORKS INFERENCE EAGER & DISTRIBUTED SIMPLICITY GRAPH-BASED TRAINING OVER EXECUTION COMPLEXITY

BUILT BY THE DESIGNED FOR BUILT FOR COMMUNITY RESEARCHERS PRODUCTION

~1,200 50%+ 22K C O N T R I B U T O R S Y O Y G R O W T H P Y T O R C H F O R U M U S E R S

DESIGNED FOR BUILT BY THE BUILT FOR RESEARCHERS COMMUNITY PRODUCTION

GROWTH IN ARXIV MENTIONS IN RESEARCH PAPERS

U D A C I T Y 16K+ S T U D E N T S E N R O L L E D I N C O U R S E S F A S T . A I Practical Deep Learning for Part 2: Deep Learning from 21M Coders, V3 the Foundations M I N U T E S O F W A T C H T I M E A Code-First Introduction to Introduction to Machine I N T H E L A S T 1 2 M O N T H S Natural Language Processing Learning for Coders

BUILT FOR BUILT BY THE DESIGNED FOR PRODUCTION COMMUNITY RESEARCHERS

PRODUCTION RESEARCH

P Y T O R C H

C O R E P R I N C I P L E S DEVELOPER BUILDING EFFICIENCY FOR SCALE

DEVELOPER EFFICIENCY ENABLING A HIGH VELOCITY OF MODEL ITERATION AND INNOVATION

C L E A N A P I S `

N A M E D T E N S O R S Today, we name and access dimensions by comment: # Tensor[N, C, H, W] images = torch.randn(32, 3, 56, 56) images.sum(dim=1) ` images.select(dim=1, index=0) But naming explicitly leads to more readable and maintainable code: NCHW = [‘N’, ‘C’, ‘H’, ‘W’] EXPERIMENTAL images = torch.randn(32, 3, 56, 56, names=NCHW)

T O R C H S C R I P T class RNN(nn.Module): Models are Python TorchScript def __init__(self, W_h, U_h, W_y, b_h, b_y): programs, an optimizable subset of super(RNN, self).__init__() self.W_h = nn.Parameter(W_h) Python self.U_h = nn.Parameter(U_h) self.W_y = nn.Parameter(W_y) self.b_h = nn.Parameter(b_h) + Same “models are programs” idea self.b_y = nn.Parameter(b_y) def forward(self, x, h): + Production deployment y = [] ` for t in range(x.size(0)): + No Python dependency h = torch.tanh(x[t] @ self.W_h + h @ self.U_h + self.b_h) + Compilation for performance y += [torch.tanh(h @ self.W_y + self.b_y)] if t % 10 == 0: optimization print ("stats: ", h.mean(), h.var()) return torch.stack(y), h # one annotation! script_rnn = torch.jit.script(RNN(W_h, U_h, W_y, b_h, b_y))

C O R E P R I N C I P L E S DEVELOPER BUILDING EFFICIENCY FOR SCALE

BUILDING FOR SCALE HIGH PERFORMANCE EXECUTION FOR MODEL TRAINING AND INFERENCE

G R O W T H O F D A T A I N M L P I P E L I N E S 3X 30% 50% FB data used in an ML FB data used in an ML ML Data Growth in pipeline in 2018 pipeline TODAY One Year

S C A L E O F M L T R A I N I N G A T F A C E B O O K RANKING ENGINEERS WORKFLOWS TRAINED COMPUTE CONSUMED 2X 3X 3X INCREASE INCREASE INCREASE

O P T I M I Z I N G F O R H A R D W A R E B A C K E N D S PYTORCH DEVELOPMENT ENV PYTORCH JIT MKL-DNN Cuda/CuDNN (Q)NNPACK FBGEMM XLA Glow TVM

1 2 3 Feature Engineering Training Inference sxm2 Lightning Tioga Pass Tioga Pass (30X Flash Drives JBOF) (Dual CPU, High Mem) Bryce Canyon Big Basin Twin Lakes (70X HDDs + Integrated Compute) (8X GPU + 2X CPU) (Single socket CPU card, Low Mem)

Q U A N T I Z A T I O N Efficient inference on server and mobile devices using reduced precision math. POST QUANTIZATION TRAINING AWARE DYNAMIC QUANTIZATION TRAINING QUANTIZATION ACCURACY & PERF SIMPLICITY OF USE CONTROL 4x 2-4x LESS MEMORY COMPUTE SPEEDUP

P Y T O R C H R E S E A R C H P R O D U C T I O N + P R O T O T Y P I N G D E P L O Y M E N T

NAMED TENSORS

PyTorch set the bar for ML Developer UX by focusing on expressivity and productivity "I want to write a program, not to (manually) build a graph" Where are similar areas for improvement today?

Data has semantic meaning! But we force users to drop that context and use an abstract "Tensor" mathematical object Type to enter a caption.

Key Insight: Named Dimensions Inspired by and done in collaboration with Prof. Alexander Rush, now Cornell Tech.

Key Insight: Named Dimensions Today we name and access dimensions by comment

Key Insight: Named Dimensions But naming explicitly leads to more readable and Today we name and access dimensions by comment maintainable code

By retaining semantic meaning, we also avoid common " Tensor Pitfalls " - Accidental Broadcasting - Accidental Alignment

Accidental Broadcasting We didn't expect broadcasting to happen, but it did:

Accidental Broadcasting We didn't expect broadcasting to happen, but it did: We can catch this automatically!

Accidental Broadcasting Broadcast by position, but check that dimension names are We didn't expect broadcasting to happen, but it did: aligned. We can catch this automatically!

By retaining semantic meaning, we also avoid common " Tensor Pitfalls " - Accidental Broadcasting - Accidental Alignment

Accidental Alignment No 1->N broadcast occurs across semantically distinct dimensions, but size happens to match.

Accidental Alignment No 1->N broadcasting occurs across semantically distinct But there are so many formats! dimensions, but size happens to match.

Accidental Alignment No 1->N broadcasting occurs across semantically distinct But there are so many formats! dimensions, but size happens to match. There is a "time bomb" if I ever normalize the wrong format and the "unaligned" dimensions have the same size!

Accidental Alignment No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match.

Accidental Alignment No 1->N broadcasting occurs across semantically distinct If we broadcast by name ( align_as ), we only need a single dimensions, but size happens to match. normalize function for all formats

What about mixing named and unnamed Tensors? I don't want to convert my entire program at once...

Coexistence with Unnamed Named Tensors can coexist with Unnamed Tensors. Let's remove the requirement that mean, stdv are named

Coexistence with Unnamed refine_names lifts unnamed tensors to be named tensors Named Tensors can coexist with Unnamed Tensors. Let's remove the requirement that mean, stdv are named

N a m e d T e n s o r s Experimental in 1.3 Future Work C o r e F u n c t i o n a l i t y T u t o r i a l E x p a n d e d C o v e r a g e See our in-depth Expanded NN package coverage Common torch operators are MultiheadedAttention tutorial supported in eager mode Named autograd support (Unnamed) autograd is supported Serialization, multiprocessing, distributed, JIT, mypy

P y T o r c h J I T / T o r c h S c r i p t

What is the PyTorch JIT? A compiler and language infrastructure for machine learning

Production Requirements P O R T A B I L I T Y P E R F O R M A N C E Models should run anywhere Whole-program optimization

Problem Statement We need a system that can: 1. Capture the structure of PyTorch programs. 2. Use that structure to optimize.

Problem Statement We need a system that can: 1. Capture the structure of PyTorch programs. TorchScript 2. Use that structure to optimize. JIT Compiler

TorchScript A static, high-performance subset of Python. 1. Prototype your model with PyTorch 2. Control flow is preserved 3. First-class support for lists, dicts, etc.

PyTorch JIT An optimizing just-in-time compiler for PyTorch programs. 1. Lightweight, thread-safe interpreter 2. Easy to write custom transformations 3. Not just for inference! Autodiff support.

C A S E S T U D Y Recursive Neural Network Grammars — Complex dynamic behavior based on the inputs — Typically written in pure C++

Complex Control Flow

Use common data structures

Define your own classes

W H A T ' S N E X T ? JIT as a Platform Q U A N T I Z A T I O N M O B I L E B A C K E N D S A lightweight interpreter that can Support for lowering models to Model quantization done safely and run on-device. static graph compilers, like TVM, automatically using JIT Glow, XLA. transformations.

Q U A N T I Z A T I O N

P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L - PowerPoint PPT Presentation

P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L LeCuns Law and the Rise of Deep Learning Gradient-Based Learning Applied to Document Recognition, LeCun et al., 1998 6,000 5371 4,800 Citation Count 3711 3,600 2472 2,400

Assignment 6: Photo Essay Presentation Overview: For this assignment, you will present your

Logical Frameworks Lilongwe, Malawi 23-27 May 2011 Session Objectives Understand what

Tatsuhiro Nakada Yoshiro Sato, Yasushi Hara, Shigehisa Sugiyama, Takaaki Ogata and Yasuhiro Aoki

Presentations All researchers should be able to present their work to an audience. There are some

Regulation versus Regulation versus Reality Reality Regulatory Compliance should not be the

Cyber Security Research on Industrial Control Systems SM Yiu Department of Computer Science The

Assessment of the Effects of the Living Your Values Workshop THE UNIVERSITY OF CONNECTICUT

Validus Holdings, Ltd. Investor Presentation Third Quarter 2016 1 Cautionary Note Regarding

Nuclear war cannot be won and must never be fought. OK then, how about agreeing never to

Family Planning in Pakistan: A Site of Resistance At present, Pakistan with a population of 207

Air-Sea Battle and the Pacific Europe is a Landscape, the Pacific a Seascape - Robert Kaplan

#7 Thinking in possibilities for federated log out. Marcel den Reijer & Fouad Makioui

Integrated Problems in Practice Management (IPPM) Exit Counseling and Debt Management Emir

Kronos Workforce Dimensions: Timekeeping for Salary Paid Employees Overview Objectives After

TGA Business Services Samantha Luff TBS/eBS Helpdesk Product Billing and Industry Assistance

SCDPS Grants https://www.scdpsgrants.com/ October 2019 1 SCDPS Grants Table of Contents

TimeClock Plus Employee Instructions Clocking In/Out External ID Student = 7 digit student ID

WASHINGTON COUNTY Budget Committee Meeting May 23, 2013 Agenda Welcome Introductions

Braydon Johnson-McCormick Lee Deaner President & CEO VP Training & Sales

UI AS A GLEGG SERVICE RIOT GAMES @RIOTJUL ESFERN TEAM ORIENTED 100+ CHAMPIONS MODERN

S O C I A L D Y N A M I C S C U LT U R E S E N S I T I V E W O R K I N G W I T H O L D E R M I

Youth Survey 2011 In collaboration with LYSB and Lyme-Old Lyme Public Schools CASFY Mission To

STATE OF THE INDUSTRY Presented by H.E. Ezekiel Lol Gatkuoth Minister of Petroleum AT A GLANCE

Athletics Fine Arts Community Campus Academics Eco Awareness Attends League of Lions Event