introduction to pytorch
play

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)


  1. INTRODUCTION TO PYTORCH Caio corro

  2. Computation Graph ➤ Dynamic: you re-build the computation graph for each input ➤ Eager: each operation is immediately computed ( no lazy computation) Interfaces ➤ Python ➤ C++ (somewhat experimental) Components ➤ torch: tensors (with gradient computation ability) ➤ torch.nn.functionnal: functions that manipulates tensors ➤ torch.nn: neural network (sub-)components (e.g. a ffi ne transformation) ➤ torch.optim: optimizers

  3. 1. TENSORS

  4. TENSORS torch.Tensor ➤ dtype: type of elements ➤ shape: shape/size of the tensor ➤ device: device where the tensor is stored (i.e. cpu, gpu) ➤ requires_grad: do we want to backpropagate gradient to this tensor? dtype ➤ torch.long (signed integer) ➤ torch.float/torch.float32 (default type) ➤ torch.bool ➤ torch.double/torch.float64 ➤ … https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype

  5. CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True )

  6. CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True ) Creating an initialized tensor torch.zeros((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.ones((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.rand((2, 4, 4), dtype=torch.float, requires_grad= True ) https://pytorch.org/docs/stable/torch.html#creation-ops

  7. CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool)

  8. CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool) clone() Create a copy of a tensor t1 = torch.ones((1,)) t2 = t1.clone() t2[0] = 3 print (t1, t2) tensor([1.]) tensor([3.])

  9. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs)

  10. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9

  11. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9 Creating matrices t3 = torch.tensor([[0, 1], [2, 3]]) ➤ First row: 0, 1 ➤ Second row: 2, 3

  12. OPERATIONS Out-of-place operations ➤ Create a new tensor, i.e. memory is allocated to store the results ➤ Set back-propagation information if required 
 (i.e. if at least one the inputs has requires_grad=True) In-place operations ➤ Modify the data of the tensor (no memory allocation) ➤ Easy to identify: name ending with an underscore ➤ Can be problematic for gradient computation: • Can break the backpropagation algorithm } • Forget the forward value Be careful when 
 requires_grad=True https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

  13. OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2

  14. OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2 t3 = torch.mul(t1, t2) This is not matrix t3 = t1.mul(t2) ! multiplication t3 = t1 * t2 t3 = torch.div(t1, t2) t3 = t1.div(t2) t3 = t1 / t2 t3 = torch.matmul(t1, t2) t3 = t1.matmul(t2) t3 = t1 @ t2

  15. IN-PLACE OPERATIONS t1.add_(t2) t1.sub_(t2) t1.mul_(t2) t1.div_(t2) Note: no in-place matrix multiplication

  16. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d

  17. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp »

  18. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,)) b = torch.rand((1,)) c = torch.rand((1,), requires_grad= True ) tmp = a * b d = tmp * c ! Backprop will FAIL! # erase the data of tmp torch.zero_(tmp)

  19. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,), requires_grad= True ) c = torch.rand((1,)) tmp = a * b d = tmp * c This is OK! # erase the data of tmp torch.zero_(tmp)

  20. ACTIVATION FUNCTIONS import torch import torch.nn import torch.nn.functional as F t1 = torch.rand((2, 10)) « Standard » activations t2 = torch.relu(t1) t2 = torch.tanh(t1) t2 = torch.sigmoid(t1) torch.relu_(t1) torch.tanh_(t1) torch.sigmoid_(t1) Other activations t2 = F.leaky_relu(t1) t2 = F.leaky_relu_(t1) t2 = F.elu(t1) t2 = F.elu_(t1)

  21. BROADCASTING c a b = + ! Invalid dimensions

  22. BROADCASTING c a b Copy rows so that = + dimensions are correct

  23. BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting a = torch.rand((3, 3)) b = torch.rand((1, 3)) # explicitly copy the data b.repeat((3, 1)) # implicit construction # (no duplicated memory) b.expand((3, -1))

  24. BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting Implicit broadcasting a = torch.rand((3, 3)) Many operations will automatically b = torch.rand((1, 3)) broadcast dimensions RTFM! ⇒ # explicitly copy the data a = torch.rand((3, 3)) b.repeat((3, 1)) b = torch.rand((1, 3)) c = a + b # implicit construction # (no duplicated memory) b.expand((3, -1)) https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics https://pytorch.org/docs/stable/torch.html#torch.add

  25. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,)) No gradient is required so the b = torch.rand((1,)) ! call to backward will fail! c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

  26. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

  27. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 Explicitly set the incoming sensitivity

  28. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 # by default gradient is accumulated, # so if we want to recompute a gradient, # we have to erase the previous one manually! a.grad.zero_() Explicitly set the c2.backward(torch.tensor([2.])) incoming sensitivity

  29. 2. NEURAL NETWORKS

  30. MODULES AND PARAMETERS torch.nn.Module To build a neural network, we store in a module: ➤ Parameters of the network ➤ Other modules Benefits ➤ Execution mode: we can set the network in training or in test mode 
 (e.g. to automatically apply or discard dropout) ➤ Move whole network to a device ➤ Retrieve all learnable parameters of the network ➤ …

  31. SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) )

  32. SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) ) def forward(self, inputs): z = x @ self.W.data.transpose(0, 1) \ + self.bias.data.transpose(0, 1) return torch.relu(z) Transpose everything because of Pytorch data format Non-linearity!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend