AMMI Introduction to Deep Learning 9.1. Transposed convolutions - - PowerPoint PPT Presentation

ammi introduction to deep learning 9 1 transposed
SMART_READER_LITE
LIVE PREVIEW

AMMI Introduction to Deep Learning 9.1. Transposed convolutions - - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 9.1. Transposed convolutions Fran cois Fleuret https://fleuret.org/ammi-2018/ Fri Nov 9 22:39:08 UTC 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Constructing deep generative architectures


slide-1
SLIDE 1

AMMI – Introduction to Deep Learning 9.1. Transposed convolutions

Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Fri Nov 9 22:39:08 UTC 2018

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

slide-2
SLIDE 2

Constructing deep generative architectures requires layers to increase the signal dimension, the contrary of what we have done so far with feed-forward networks.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 1 / 14

slide-3
SLIDE 3

Constructing deep generative architectures requires layers to increase the signal dimension, the contrary of what we have done so far with feed-forward networks. Generative processes that consist of optimizing the input rely on back-propagation to expend the signal from a low-dimension representation to the high-dimension signal space.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 1 / 14

slide-4
SLIDE 4

Constructing deep generative architectures requires layers to increase the signal dimension, the contrary of what we have done so far with feed-forward networks. Generative processes that consist of optimizing the input rely on back-propagation to expend the signal from a low-dimension representation to the high-dimension signal space. The same can be done in the forward pass with transposed convolution layers whose forward operation corresponds to a convolution layer backward pass.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 1 / 14

slide-5
SLIDE 5

Consider a 1d convolution with a kernel κ yi = (x ⊛ κ)i =

  • a

xi+a−1 κa =

  • u

xu κu−i+1.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 2 / 14

slide-6
SLIDE 6

Consider a 1d convolution with a kernel κ yi = (x ⊛ κ)i =

  • a

xi+a−1 κa =

  • u

xu κu−i+1. We get ∂퓁 ∂x

  • u

= ∂퓁 ∂xu =

  • i

∂퓁 ∂yi ∂yi ∂xu =

  • i

∂퓁 ∂yi κu−i+1. which looks a lot like a standard convolution layer, except that the kernel coefficients are visited in reverse order.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 2 / 14

slide-7
SLIDE 7

This is actually the standard convolution operator from signal processing. If ∗ denotes this operation, we have (x ∗ κ)i =

  • a

xa κi−a+1.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 3 / 14

slide-8
SLIDE 8

This is actually the standard convolution operator from signal processing. If ∗ denotes this operation, we have (x ∗ κ)i =

  • a

xa κi−a+1. Coming back to the backward pass of the convolution layer, if y = x ⊛ κ then ∂퓁 ∂x

  • =

∂퓁 ∂y

  • ∗ κ.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 3 / 14

slide-9
SLIDE 9

In the deep-learning field, since it corresponds to transposing the weight matrix

  • f the equivalent fully-connected layer, it is called a transposed convolution.

     κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3     

T

=          κ1 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ3         

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 4 / 14

slide-10
SLIDE 10

In the deep-learning field, since it corresponds to transposing the weight matrix

  • f the equivalent fully-connected layer, it is called a transposed convolution.

     κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3 κ1 κ2 κ3     

T

=          κ1 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ1 κ3 κ2 κ3          While a convolution can be seen as a series of inner products, a transposed convolution can be seen as a weighted sum of translated kernels.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 4 / 14

slide-11
SLIDE 11

Convolution layer 1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-12
SLIDE 12

Convolution layer 1 4

  • 1

2

  • 2

1 3 3 1

Input W Kernel w

1 2

  • 1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-13
SLIDE 13

Convolution layer

Output W − w + 1

1 2

  • 1

w

9 1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-14
SLIDE 14

Convolution layer

Output W − w + 1

9 1 2

  • 1

w

1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-15
SLIDE 15

Convolution layer

Output W − w + 1

9 1 2

  • 1

w

1 1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-16
SLIDE 16

Convolution layer

Output W − w + 1

9 1 1 2

  • 1

w

3 1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-17
SLIDE 17

Convolution layer

Output W − w + 1

9 1 3 1 2

  • 1

w

  • 5

1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-18
SLIDE 18

Convolution layer

Output W − w + 1

9 1 3

  • 5

1 2

  • 1

w

  • 3

1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-19
SLIDE 19

Convolution layer

Output W − w + 1

9 1 3

  • 5
  • 3

1 2

  • 1

w

6 1 4

  • 1

2

  • 2

1 3 3 1

Input W

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-20
SLIDE 20

Convolution layer

Output W − w + 1

9 1 3

  • 5
  • 3

6 1 4

  • 1

2

  • 2

1 3 3 1

Input W Kernel w

1 2

  • 1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 5 / 14

slide-21
SLIDE 21

Transposed convolution layer 1 2

  • 1

Kernel w Input W

2 3

  • 1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-22
SLIDE 22

Transposed convolution layer

Output W + w − 1

1 2

  • 1

2 4

  • 2

Input W

2 3

  • 1

2

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-23
SLIDE 23

Transposed convolution layer

Output W + w − 1

1 2

  • 1

2 4

  • 2

3 6

  • 3

Input W

2 3

  • 1

2 7

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-24
SLIDE 24

Transposed convolution layer

Output W + w − 1

1 2

  • 1

2 4

  • 2

3 6

  • 3

Input W

2 3

  • 1

2 7 4

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-25
SLIDE 25

Transposed convolution layer

Output W + w − 1

1 2

  • 1

2 4

  • 2

3 6

  • 3
  • 1
  • 2

1

Input W

2 3

  • 1

2 7 4

  • 4
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-26
SLIDE 26

Transposed convolution layer

Output W + w − 1

2 4

  • 2

3 6

  • 3
  • 1
  • 2

1

Input W

2 3

  • 1

2 7 4

  • 4
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-27
SLIDE 27

Transposed convolution layer

Output W + w − 1

1 2

  • 1

Kernel w Input W

2 3

  • 1

2 7 4

  • 4
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 6 / 14

slide-28
SLIDE 28

torch.nn.functional.conv_transpose1d implements the operation we just

  • described. It takes as input a batch of multi-channel samples, and produces a

batch of multi-channel samples.

>>> x = torch.tensor([[[0., 0., 1., 0., 0., 0., 0.]]]) >>> k = torch.tensor([[[1., 2., 3.]]]) >>> F.conv1d(x, k) tensor([[[ 3., 2., 1., 0., 0.]]])

⊛ =

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 7 / 14

slide-29
SLIDE 29

torch.nn.functional.conv_transpose1d implements the operation we just

  • described. It takes as input a batch of multi-channel samples, and produces a

batch of multi-channel samples.

>>> x = torch.tensor([[[0., 0., 1., 0., 0., 0., 0.]]]) >>> k = torch.tensor([[[1., 2., 3.]]]) >>> F.conv1d(x, k) tensor([[[ 3., 2., 1., 0., 0.]]])

⊛ =

>>> F.conv_transpose1d(x, k) tensor([[[ 0., 0., 1., 2., 3., 0., 0., 0., 0.]]])

∗ =

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 7 / 14

slide-30
SLIDE 30

The class torch.nn.ConvTranspose1d embeds that operation into a torch.nn.Module.

>>> x = torch.tensor([[[ 2., 3., 0., -1.]]]) >>> m = nn.ConvTranspose1d(1, 1, kernel_size=3) >>> m.bias.data.zero_() tensor([ 0.]) >>> m.weight.data.copy_(Tensor([ 1, 2, -1 ])) tensor([[[ 1., 2., -1.]]]) >>> y = m(x) >>> y tensor([[[ 2., 7., 4., -4., -2., 1.]]])

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 8 / 14

slide-31
SLIDE 31

Transposed convolutions also have a dilation parameter that behaves as for convolution and expends the kernel size without increasing the number of parameters by making it sparse.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 9 / 14

slide-32
SLIDE 32

Transposed convolutions also have a dilation parameter that behaves as for convolution and expends the kernel size without increasing the number of parameters by making it sparse. They also have a stride and padding parameters, however, due to the relation between convolutions and transposed convolutions:

  • While for convolutions stride and padding are defined in the input

map, for transposed convolutions these parameters are defined in the

  • utput map, and the latter modulates a cropping operation.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 9 / 14

slide-33
SLIDE 33

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

Kernel w Input W

2 3

  • 1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-34
SLIDE 34

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

2 4

  • 2

Input W

2 3

  • 1

2 4

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-35
SLIDE 35

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

2 4

  • 2

3 6

  • 3

s Input W

2 3

  • 1

2 4 1 6

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-36
SLIDE 36

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

2 4

  • 2

3 6

  • 3

s s Input W

2 3

  • 1

2 4 1 6

  • 3

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-37
SLIDE 37

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

2 4

  • 2

3 6

  • 3

s s

  • 1
  • 2

1

s Input W

2 3

  • 1

2 4 1 6

  • 3
  • 1
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-38
SLIDE 38

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

2 4

  • 2

3 6

  • 3

s s

  • 1
  • 2

1

s Input W

2 3

  • 1

2 4 1 6

  • 3
  • 1
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-39
SLIDE 39

Transposed convolution layer (stride = 2)

Output s(W − 1) + w

1 2

  • 1

Kernel w Input W

2 3

  • 1

2 4 1 6

  • 3
  • 1
  • 2

1

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 10 / 14

slide-40
SLIDE 40

The composition of a convolution and a transposed convolution of same parameters keep the signal size [roughly] unchanged.

  • A convolution with a stride greater than one may ignore parts of the
  • signal. Its composition with the corresponding transposed convolution

generates a map of the size of the observed area.

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 11 / 14

slide-41
SLIDE 41

The composition of a convolution and a transposed convolution of same parameters keep the signal size [roughly] unchanged.

  • A convolution with a stride greater than one may ignore parts of the
  • signal. Its composition with the corresponding transposed convolution

generates a map of the size of the observed area. For instance, a 1d convolution of kernel size w and stride s composed with the transposed convolution of same parameters maintains the signal size W , only if ∃q ∈ N, W = w + s q. W s s s w

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 11 / 14

slide-42
SLIDE 42

It has been observed that transposed convolutions may create some grid-structure artifacts, since generated pixels are not all covered similarly. For instance with a 4 × 4 kernel and stride 3

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 12 / 14

slide-43
SLIDE 43

An alternative is to use an analytic up-scaling, implemented in the PyTorch modules nn.Upsample.

>>> x = torch.tensor([[[[ 1., 2. ], [ 3., 4. ]]]]) >>> b = nn.Upsample(scale_factor = 3, mode = ’bilinear’) >>> b(x) tensor([[[[ 1.0000, 1.0000, 1.3333, 1.6667, 2.0000, 2.0000], [ 1.0000, 1.0000, 1.3333, 1.6667, 2.0000, 2.0000], [ 1.6667, 1.6667, 2.0000, 2.3333, 2.6667, 2.6667], [ 2.3333, 2.3333, 2.6667, 3.0000, 3.3333, 3.3333], [ 3.0000, 3.0000, 3.3333, 3.6667, 4.0000, 4.0000], [ 3.0000, 3.0000, 3.3333, 3.6667, 4.0000, 4.0000]]]])

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 13 / 14

slide-44
SLIDE 44

An alternative is to use an analytic up-scaling, implemented in the PyTorch modules nn.Upsample.

>>> x = torch.tensor([[[[ 1., 2. ], [ 3., 4. ]]]]) >>> b = nn.Upsample(scale_factor = 3, mode = ’bilinear’) >>> b(x) tensor([[[[ 1.0000, 1.0000, 1.3333, 1.6667, 2.0000, 2.0000], [ 1.0000, 1.0000, 1.3333, 1.6667, 2.0000, 2.0000], [ 1.6667, 1.6667, 2.0000, 2.3333, 2.6667, 2.6667], [ 2.3333, 2.3333, 2.6667, 3.0000, 3.3333, 3.3333], [ 3.0000, 3.0000, 3.3333, 3.6667, 4.0000, 4.0000], [ 3.0000, 3.0000, 3.3333, 3.6667, 4.0000, 4.0000]]]]) >>> u = nn.Upsample(scale_factor = 3, mode = ’nearest’) >>> u(x) tensor([[[[ 1., 1., 1., 2., 2., 2.], [ 1., 1., 1., 2., 2., 2.], [ 1., 1., 1., 2., 2., 2.], [ 3., 3., 3., 4., 4., 4.], [ 3., 3., 3., 4., 4., 4.], [ 3., 3., 3., 4., 4., 4.]]]])

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 13 / 14

slide-45
SLIDE 45

Such module is usually combined with a convolution to learn local corrections to undesirable artifacts of the up-scaling. In practice, a transposed convolution such as

nn.ConvTranspose2d(nic, noc, kernel_size = 3, stride = 2, padding = 1, output_padding = 1),

can be replaced by

nn.Upsample(scale_factor = 2, mode = ’bilinear’) nn.Conv2d(nic, noc, kernel_size = 3, padding = 1)

Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 9.1. Transposed convolutions 14 / 14

slide-46
SLIDE 46

The end