INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro

Computation Graph ➤ Dynamic: you re-build the computation graph for each input ➤ Eager: each operation is immediately computed ( no lazy computation) Interfaces ➤ Python ➤ C++ (somewhat experimental) Components ➤ torch: tensors (with gradient computation ability) ➤ torch.nn.functionnal: functions that manipulates tensors ➤ torch.nn: neural network (sub-)components (e.g. a ffi ne transformation) ➤ torch.optim: optimizers

1. TENSORS

TENSORS torch.Tensor ➤ dtype: type of elements ➤ shape: shape/size of the tensor ➤ device: device where the tensor is stored (i.e. cpu, gpu) ➤ requires_grad: do we want to backpropagate gradient to this tensor? dtype ➤ torch.long (signed integer) ➤ torch.float/torch.float32 (default type) ➤ torch.bool ➤ torch.double/torch.float64 ➤ … https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype

CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True )

CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True ) Creating an initialized tensor torch.zeros((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.ones((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.rand((2, 4, 4), dtype=torch.float, requires_grad= True ) https://pytorch.org/docs/stable/torch.html#creation-ops

CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool)

CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool) clone() Create a copy of a tensor t1 = torch.ones((1,)) t2 = t1.clone() t2[0] = 3 print (t1, t2) tensor([1.]) tensor([3.])

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs)

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9 Creating matrices t3 = torch.tensor([[0, 1], [2, 3]]) ➤ First row: 0, 1 ➤ Second row: 2, 3

OPERATIONS Out-of-place operations ➤ Create a new tensor, i.e. memory is allocated to store the results ➤ Set back-propagation information if required   (i.e. if at least one the inputs has requires_grad=True) In-place operations ➤ Modify the data of the tensor (no memory allocation) ➤ Easy to identify: name ending with an underscore ➤ Can be problematic for gradient computation: • Can break the backpropagation algorithm } • Forget the forward value Be careful when   requires_grad=True https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2

OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2 t3 = torch.mul(t1, t2) This is not matrix t3 = t1.mul(t2) ! multiplication t3 = t1 * t2 t3 = torch.div(t1, t2) t3 = t1.div(t2) t3 = t1 / t2 t3 = torch.matmul(t1, t2) t3 = t1.matmul(t2) t3 = t1 @ t2

IN-PLACE OPERATIONS t1.add_(t2) t1.sub_(t2) t1.mul_(t2) t1.div_(t2) Note: no in-place matrix multiplication

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp »

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,)) b = torch.rand((1,)) c = torch.rand((1,), requires_grad= True ) tmp = a * b d = tmp * c ! Backprop will FAIL! # erase the data of tmp torch.zero_(tmp)

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,), requires_grad= True ) c = torch.rand((1,)) tmp = a * b d = tmp * c This is OK! # erase the data of tmp torch.zero_(tmp)

ACTIVATION FUNCTIONS import torch import torch.nn import torch.nn.functional as F t1 = torch.rand((2, 10)) « Standard » activations t2 = torch.relu(t1) t2 = torch.tanh(t1) t2 = torch.sigmoid(t1) torch.relu_(t1) torch.tanh_(t1) torch.sigmoid_(t1) Other activations t2 = F.leaky_relu(t1) t2 = F.leaky_relu_(t1) t2 = F.elu(t1) t2 = F.elu_(t1)

BROADCASTING c a b = + ! Invalid dimensions

BROADCASTING c a b Copy rows so that = + dimensions are correct

BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting a = torch.rand((3, 3)) b = torch.rand((1, 3)) # explicitly copy the data b.repeat((3, 1)) # implicit construction # (no duplicated memory) b.expand((3, -1))

BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting Implicit broadcasting a = torch.rand((3, 3)) Many operations will automatically b = torch.rand((1, 3)) broadcast dimensions RTFM! ⇒ # explicitly copy the data a = torch.rand((3, 3)) b.repeat((3, 1)) b = torch.rand((1, 3)) c = a + b # implicit construction # (no duplicated memory) b.expand((3, -1)) https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics https://pytorch.org/docs/stable/torch.html#torch.add

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,)) No gradient is required so the b = torch.rand((1,)) ! call to backward will fail! c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 # by default gradient is accumulated, # so if we want to recompute a gradient, # we have to erase the previous one manually! a.grad.zero_() Explicitly set the c2.backward(torch.tensor([2.])) incoming sensitivity

2. NEURAL NETWORKS

MODULES AND PARAMETERS torch.nn.Module To build a neural network, we store in a module: ➤ Parameters of the network ➤ Other modules Benefits ➤ Execution mode: we can set the network in training or in test mode   (e.g. to automatically apply or discard dropout) ➤ Move whole network to a device ➤ Retrieve all learnable parameters of the network ➤ …

SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) )

SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) ) def forward(self, inputs): z = x @ self.W.data.transpose(0, 1) \ + self.bias.data.transpose(0, 1) return torch.relu(z) Transpose everything because of Pytorch data format Non-linearity!

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Text Classifjcation using PyTorch Jindich Libovick November 28, 2018 B4M36NLP Introduction

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Text Sentiment Analysis with rNN on the IMDB Dataset PyTorch and TensorFlow Comparative

Neural Translation with Pytorch GTC 2017 JEREMY HOWARD @JEREMYPHOWARD Im assuming some

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

TORCHSCRIPT: OPTIMIZED EXECUTION OF PY TORCH PROGRAMS Presenter Zachary DeVito PyTorch

Investigating scalability of recurrent network using dynamic batching in PyTorch Devin Taylor

Automatic Differentiation in PyTorch Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan,

cs391R - Intorduction to Pytorch Yifeng Zhu Department of Computer Science The University of

S9243 Fast and Accurate Object Detection Floris Chabert , Solutions Architect with PyTorch and

AUTOMATIC MIXED PRECISION IN PYTORCH Michael Carilli and Michael Ruberry, 3/20/2019 THIS TALK

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

DISTRIBUTED STREAMING TEXT EMBEDDING METHOD => DISTRIBUTED TRAINING WITH PYTORCH SNU 2018 - 2

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

&gt;&gt;&gt; ELEG5491: Introduction to Deep Learning &gt;&gt;&gt; PyTorch Tutorials Name: GE

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Text Classifjcation using PyTorch Jindich Libovick November 28, 2018 B4M36NLP Introduction

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Text Sentiment Analysis with rNN on the IMDB Dataset PyTorch and TensorFlow Comparative

Neural Translation with Pytorch GTC 2017 JEREMY HOWARD @JEREMYPHOWARD Im assuming some

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

TORCHSCRIPT: OPTIMIZED EXECUTION OF PY TORCH PROGRAMS Presenter Zachary DeVito PyTorch

Investigating scalability of recurrent network using dynamic batching in PyTorch Devin Taylor

Automatic Differentiation in PyTorch Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan,

cs391R - Intorduction to Pytorch Yifeng Zhu Department of Computer Science The University of

S9243 Fast and Accurate Object Detection Floris Chabert , Solutions Architect with PyTorch and

AUTOMATIC MIXED PRECISION IN PYTORCH Michael Carilli and Michael Ruberry, 3/20/2019 THIS TALK

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

DISTRIBUTED STREAMING TEXT EMBEDDING METHOD =&gt; DISTRIBUTED TRAINING WITH PYTORCH SNU 2018 - 2

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

DISTRIBUTED STREAMING TEXT EMBEDDING METHOD => DISTRIBUTED TRAINING WITH PYTORCH SNU 2018 - 2