Assignment 5 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof - PDF document

Assignment 5 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof April 2020 1 Convolutional Neural Networks Basics Consider the following CNN architecture: conv: conv: input: 100 filters 50 filters pooling: size 3 × 3 size 5 × 5 256 × 256 × 3 size 2 × 2 stride 1 stride 3 conv: fully connected: 30 filters pooling: size 3 × 3 10 outputs size 2 × 4 flattened stride 1 In all layers we do not add the bias term (for simplicity), and the nonlinear activations are omitted throughout. Note that the third dimension of each conv filter is omitted as it can be inferred from the previous layer, while pooling (with each pixel appearing exactly once in all pooling windows) is performed on each channel/feature map separately. Answer the following questions. [For (b) (c) (d), you will receive partial credit if you correctly derive some of the intermediate results.] 1. (5 pts) What can we do to achieve the same output size (excluding the dimension of channels/feature maps) as the input size for each convolutional layer ? Explain using a filter with size (2 w +1) × (2 h +1) and stride 1 on an input image with size W × H . 2. (5 pts) Without padding what is the output size of each intermediate layer (except the first input layer and the last fully connected layer)? 3. (5 pts) What is the receptive field of a single (output) neuron on the last convolutional layer (ignoring any boundary issue)? 4. (5 pts) How many parameters do the above architecture have? For this exercise we assume the output image size of each convolutional layer is the same as its input image size (except the number of channels, of course). 1

2 CNN Implementation Note : Please mention your Python version (and maybe the version of all other packages) in the code. In this exercise you are going to run some experiments involving CNNs. You need to know Python and install the following libraries: PyTorch, Numpy and all their dependencies. You can find detailed instructions and tutorials for each of these libraries on the respective websites. For all experiments, running on CPU is sufficient. You do not need to run the code on GPUs. Before start, we suggest you review what we learned about each layer in CNN, and read at least this tutorial. 1. Train a VGG11 net on the MNIST dataset. VGG11 was an earlier version of VGG16 and can be found as model A in Table 1 of this paper, whose Section 2.1 also gives you all the details about each layer. The goal is to get as close to 0 loss as possible. Note that our input dimension is different from the VGG paper. You need to resize each image in MNIST from its original size 28 × 28 to 32 × 32 [why?], and it might be necessary to change at least one other layer of VGG11. [This experiment will take up to 1 hour on a CPU, so please be cautious of your time. If this running time is not bearable, you may cut the training set by 1/10, so only have ∼ 600 images per class instead of the regular ∼ 6000.] 2. Once you’ve done the above, the next goal is to inspect the training process. Create the following plots: (a) test accuracy vs the number of epochs (say 3 ∼ 5) (b) training accuracy vs the number of epochs (c) test loss vs the number of epochs (d) training loss vs the number of epochs [If running more than 1 epoch is computationally infeasible, simply run 1 epoch and try to record the accuracy / loss every few minibatches.] 3 Recurrent Neural Network Implementation In this question, you will implement a recurrent neural network to classify Twitter messages as positive or negative sentiment. You are provided two files, tweets train.csv and tweets test.csv that contain the training data and test data. Each line of the files consists of a tweet and its sentiment, either negative (0) or positive (1). To classify the tweets, implement an LSTM using PyTorch. Each tweet should be treated as a sequence of ASCII characters. The input to the LSTM at each step should be a one-hot encoding of the ASCII character. Since there are 128 ASCII characters, this should be a vector of size 128. 2

The size of the hidden layer should also be 128. Use a linear layer to map the final hidden layer to the output (size 2). Finally, apply the softmax function to obtain class probabilities. Use negative log likelihood as the loss function with stochastic gradient descent or any related optimization algorithm. Train on the training set using a batch size of 32 and a learning rate of 0.0001 for at least 100000 iterations. Plot curves for (a) the test set accuracy vs. number of iterations and (b) the training set accuracy vs. number of iterations. You may compute accuracy every 1000 iterations to speed up the process. 4 Variational Auto-Encoder In Variational autoencoder, we optimize the evidence lower bound � � L ( θ, φ ; x ) = − D KL [ q φ ( z | x ) || p θ ( z )] + E q φ ( z | x ) log p θ ( x | z ) where q φ ( z | x ) is the variational distribution with variational parameter φ which approximates the posterior p ( z | x ). - Given z ∈ R 1 , p ( z ) ∼ N (0 , 1) and q ( z | x ) ∼ N ( µ z , σ 2 z ), write down D KL [ q φ ( z | x ) || p θ ( z )] in terms of µ z and σ z . - Assuming q φ ( z | x ) is a Gaussian, the decoder network computes its mean µ z and its variance σ 2 . Why do we model σ 2 z in log space using neural networks instead of directly model σ 2 z ? Why do we need the reparameterization trick, instead of directly sampling from the latent distribution N ( µ z , σ 2 z )? - For decoder, we use sometimes a Multi-Layer Perceptions with either Bernoulli (in case of binary data) or Gaussian (in case of real-valued data) outputs. The expected reconstruction error (cross-entropy term) � log p θ ( x ( i ) | z ) � can be estimated by sampling, that is E q φ ( z | x ( i ) ) L = 1 � � � log p θ ( x ( i ) | z ) log p θ ( x ( i ) | z ) � � E q φ ( z | x ( i ) ) L l =1 if data x given z follows a multivariate Bernoulli with dimension D , how should this reconstruction loss term look like? 5 The Final Small-Project This part of assignment should be done by the students who decided to attend the final exam. Collaboration policy : This assignment should be done individually. 3

5.1 Implementation of VAE In the penultimate lecture, we have learned about approximating posteriors with variational inference, using the reparameterization trick for VI, and deep gen- erative models for images using variational autoencoders. We also learnt more about convolutional neural networks in the final lecture and their applications in the computer vision. In this project, you will bring together both constructs to train a model that can generate images of different classes of galaxies. The data is available here. The training images are JPG images of 61578 galaxies. The probability distributions for the classifications for each of the training images are given in solutions training rev1 . There are 37 different classes of galaxy types and at the end of training process, you should be able to generate these categories and compare with the original images. You’ll need to install PyTorch to use the starter code. The objective is to minimize the reconstruction error using cross-entropy and the Kullback–Leibler divergence. In the following you’ll find a code snippet that you can start to complete and make it work for this dataset. import torch import torch.utils.data from torch import nn, optim from torch.nn import functional as F from torchvision import datasets, transforms from torchvision.utils import save_image class VAE(nn.Module): def __init__(self, input_chanel=3, zdim=512, image_size=424): super(VAE, self).__init__() self.z_dim = z_dim # encoder part self.encoder_conv1 = nn.Conv2d(input_chanel, zdim//16, kernel_size=4, stride=2, padding=1) self.encoder_bn1 = nn.BatchNorm2d(zdim//16) #You must extend this part #decoder part self.decoder_conv1 = nn.ConvTranspose2d(zdim, zdim//2, kernel_size=4, stride=1, padding=0) self.decoder_bn1 = nn.BatchNorm2d(zdim//2) #You must extend this part def encode(self, x): x1 = F.leaky_relu(self.encoder_bn1(self.encoder_conv1(x)), negative_slope=0.2) #Fill this part with some CNN architecture def decode(self, z): #complete this part 4

Assignment 5 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof - PDF document

Assignment 5 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof April 2020 1 Convolutional Neural Networks Basics Consider the following CNN architecture: conv: conv: input: 100 filters 50 filters pooling: size 3 3 size 5 5 256

Assignment #3 Which is something you may wish to do since it is Assignment #3 So You Want to

Assignment #4 Which is something that you may wish to So you want to animate a headless

Assignment 1 Inf2D The Assignment is out now!

CSE 158/258 Web Mining and Recommender Systems Assignment 2 Assignment 2 Open-ended Due

MCC assignment info Slides will be available in Noppa Assignment assistants: Rasmus Eskola

CSE 158 Web Mining and Recommender Systems Assignment 2 Assignment 2 Open-ended Due Dec 3

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

CSE 158/258 Web Mining and Recommender Systems Assignment 1 Assignment 1 Two recommendation

Announcements Assignment 4 due today. Assignment 5 uploaded to website and Piazza. Will be due

JAVASCRIPT PROGRAMMING Functions Examples Homework assignment

CSE 190 Data Mining and Predictive Analytics Assignment 2 Assignment 2 Open-ended Due June

CS 2112 Lab 10: Assignment 6 CS 2112 Lab 10: Assignment 6 November 5 / 7, 2018 CS 2112 Lab 10:

CS255 Programming Assignment #1 Programming Assignment #1 Due: Friday Feb 10 th (11:59pm)

Assignment Design Assignment Design Across the Curriculum: Across the Curriculum: Cueing for

SSA in Scheme Static single assignment (SSA) : assignment conversion (boxing),

CSE 255 Lecture 5 Data Mining and Predictive Analytics Assignment 2 Assignment 2 Three

Assignment I: Calculator Objective The goal of this assignment is to recreate the demonstration

ECE 124 Assignment #2: Algebric simplification and applications 05/15/18 Presented by:

Topic 17 You have just been introduced to an assignment operator in Assignment, Local State,

Assignment 3 Oct 4th Kyle Klassy Announcements Assignment 2 will be graded by October 8th

CSE 158 Web Mining and Recommender Systems Assignment 1 Assignment 1 Two recommendation tasks

CSE 158 Web Mining and Recommender Systems Assignment 1 Assignment 1 Two recommendation tasks

Announcements Please turn in Assignment 2 and Assignment 3 is uploaded to Piazza and the course