Deep learning 10.2. Causal convolutions Fran cois Fleuret - PowerPoint PPT Presentation

Deep learning 10.2. Causal convolutions Fran¸ cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020

If we use an autoregressive model with a masked input f : { 0 , 1 } T × R T → R C the input differs from a position to another. During training, even though the full input is known, common computation is lost. Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 1 / 25

We can avoid having the mask itself as input if the model predicts a distribution for every position of the sequence, that is f : R T → R T × C . It can be used for synthesis with x 1 ← sample ( f 1 (0 , . . . , 0)) x 2 ← sample ( f 2 ( x 1 , 0 , . . . , 0)) x 3 ← sample ( f 3 ( x 1 , x 2 , 0 , . . . , 0)) . . . x T ← sample ( f T ( x 1 , x 2 , . . . , x T − 1 , 0)) where the 0s simply fill in for unknown values. Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 2 / 25

If additionally, the model is such that “future values” do not influence the prediction at a certain time, that is ∀ t , x 1 , . . . , x t , α 1 , . . . , α T − t , β 1 , . . . , β T − t , f t +1 ( x 1 , . . . , x t , α 1 , . . . , α T − t ) = f t +1 ( x 1 , . . . , x t , β 1 , . . . , β T − t ) then, we have in particular f 1 (0 , . . . , 0) = f 1 ( x 1 , . . . , x T ) f 2 ( x 1 , 0 , . . . , 0) = f 2 ( x 1 , . . . , x T ) f 3 ( x 1 , x 2 , 0 , . . . , 0) = f 3 ( x 1 , . . . , x T ) . . . f T ( x 1 , x 2 , . . . , x T − 1 , 0) = f T ( x 1 , . . . , x T ) Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 3 / 25

Which provides a tremendous computational advantage during training, since � 퓁 ( f , x ) = 퓁 ( f u ( x 1 , . . . , x u − 1 , 0 , . . . , 0) , x u ) u � = 퓁 ( f u ( x 1 , . . . , x T ) , x u ) . � �� u Computed once Such models are referred to as causal, since the future cannot affect the past. Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 4 / 25

We can illustrate this with convolutional models. Standard convolutions let information flow “to the past,” and masked input was a way to condition only on already generated values. Padding Padding x 1 x 2 x 3 x 4 x 5 x 6 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 0 0 0 0 Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 5 / 25

We can illustrate this with convolutional models. Standard convolutions let information flow “to the past,” and masked input was a way to condition only on already generated values. Padding Padding x 1 x 2 x 3 x 4 x 5 x 6 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 0 0 0 0 Forbidden Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 5 / 25

We can illustrate this with convolutional models. Standard convolutions let information flow “to the past,” and masked input was a way to condition only on already generated values. Padding Padding x 1 x 2 x 3 x 4 x 5 x 5 x 6 0 0 0 0 0 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 0 0 0 0 0 0 Masked Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 5 / 25

Such a model can be made causal with convolutions that let information flow only to the future, combined with a first convolution that hides the present. Padding x 1 x 2 x 3 x 4 x 5 x 6 0 0 x 1 x 2 x 3 x 4 x 5 x 6 0 0 Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 6 / 25

Another option for the first layer is to shift the input by one entry to hide the present. Padding x 1 x 2 x 3 x 4 x 5 x 6 0 0 x 1 x 2 x 3 x 4 x 5 0 0 0 Padded-shifted right Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 7 / 25

PyTorch’s convolutional layers do no accept asymmetric padding, but we can do it with F.pad , which even accepts negative padding to remove entries. For a n -dim tensor, the padding specification is ( start n , end n , start n − 1 , end n − 1 , . . . , start n − k , end n − k ) >>> x = torch.randint(10, (2, 1, 5)) >>> x tensor([[[1, 6, 3, 9, 1]], [[4, 8, 2, 2, 9]]]) >>> F.pad(x, (-1, 1)) tensor([[[6, 3, 9, 1, 0]], [[8, 2, 2, 9, 0]]]) >>> F.pad(x, (0, 0, 2, 0)) tensor([[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [1, 6, 3, 9, 1]], [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [4, 8, 2, 2, 9]]]) Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 8 / 25

PyTorch’s convolutional layers do no accept asymmetric padding, but we can do it with F.pad , which even accepts negative padding to remove entries. For a n -dim tensor, the padding specification is ( start n , end n , start n − 1 , end n − 1 , . . . , start n − k , end n − k ) >>> x = torch.randint(10, (2, 1, 5)) >>> x tensor([[[1, 6, 3, 9, 1]], [[4, 8, 2, 2, 9]]]) >>> F.pad(x, (-1, 1)) tensor([[[6, 3, 9, 1, 0]], [[8, 2, 2, 9, 0]]]) >>> F.pad(x, (0, 0, 2, 0)) tensor([[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [1, 6, 3, 9, 1]], [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [4, 8, 2, 2, 9]]]) Similar processing can be achieved with the modules nn.ConstantPad1d , nn.ConstantPad2d , or nn.ConstantPad3d . Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 8 / 25

Some train sequences 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 9 / 25

Model class NetToy1d(nn.Module): def __init__(self, nb_classes, ks = 2, nc = 32): super().__init__() self.pad = (ks - 1, 0) self.conv0 = nn.Conv1d(1, nc, kernel_size = 1) self.conv1 = nn.Conv1d(nc, nc, kernel_size = ks) self.conv2 = nn.Conv1d(nc, nc, kernel_size = ks) self.conv3 = nn.Conv1d(nc, nc, kernel_size = ks) self.conv4 = nn.Conv1d(nc, nc, kernel_size = ks) self.conv5 = nn.Conv1d(nc, nb_classes, kernel_size = 1) def forward(self, x): x = F.relu(self.conv0(F.pad(x, (1, -1)))) x = F.relu(self.conv1(F.pad(x, self.pad))) x = F.relu(self.conv2(F.pad(x, self.pad))) x = F.relu(self.conv3(F.pad(x, self.pad))) x = F.relu(self.conv4(F.pad(x, self.pad))) x = self.conv5(x) return x.permute(0, 2, 1).contiguous() Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 10 / 25

Training loop for sequences in train_input.split(args.batch_size): input = (sequences - mean)/std output = model(input) loss = cross_entropy( output.view(-1, output.size(-1)), sequences.view(-1) ) optimizer.zero_grad() loss.backward() optimizer.step() Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 11 / 25

Synthesis generated = train_input.new_zeros((48,) + train_input.size()[1:]) flat = generated.view(generated.size(0), -1) for t in range(flat.size(1)): input = (generated.float() - mean) / std output = model(input) logits = output.view(flat.size() + (-1,))[:, t] dist = torch.distributions.categorical.Categorical(logits = logits) flat[:, t] = dist.sample() Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 12 / 25

Some generated sequences 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 13 / 25

The global structure may not be properly generated. 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 14 / 25

The global structure may not be properly generated. 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 This can be fixed with dilated convolutions to have a larger context. Fran¸ cois Fleuret Deep learning / 10.2. Causal convolutions 14 / 25

Deep learning 10.2. Causal convolutions Fran cois Fleuret - PowerPoint PPT Presentation

Deep learning 10.2. Causal convolutions Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 If we use an autoregressive model with a masked input f : { 0 , 1 } T R T R C the input differs from a position to another. During

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Deep learning 10.2. Causal convolutions Fran cois Fleuret https://fleuret.org/ee559/ Nov 1,

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

AMMI Introduction to Deep Learning 9.1. Transposed convolutions Fran cois Fleuret

Fast Convolutions Via the Overlap- and-Save Method Using Shared Memory FFT Karel Admek , Sofia

Dense Predictions Using Dilated Convolutions Najmus Ibrahim University of Toronto Institute for

Time-aware Large Kernel Convolutions Vasileios Lioutas and Yuhong Guo ICML | 2020 Brief Overview

Laplace Transforms and Convolutions Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Formality for algebroid stacks The 2-groupoid of Hochschild cochains Star products and the

Lucky in the Cloud With Diagrams Jan Khnlein Miro Spnemann @jankoehnlein @sponemann Sprotty

Quantum Mechanics with Horizons Stefan Leichenauer Caltech (soon UC Berkeley) Outline

Discussion of Adapting to Radical Change: The Benefits of Short-Horizon Investors by

Site Building with Stanfords Themes Megan Miller, Web Designer, Stanford Web Services Stanford

Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN Style-based generator

Variable-width contouring for additive manufacturing Samuel Hornus , Tim Kuipers , Olivier

Free Picard Categories Michael Horst The Ohio State University horst.59@osu.edu