# CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully - PowerPoint PPT Presentation

## CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks Neuron Non-linearity Softmax layer DNN training Loss function and regularization SGD and backprop Learning rate Overfitting

1. CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS

2. Neural networks • Fully connected networks • Neuron • Non-linearity • Softmax layer • DNN training • Loss function and regularization • SGD and backprop • Learning rate • Overfitting – dropout, batchnorm • CNN, RNN, LSTM, GRU <- This class

3. Notes on non-linearity • Sigmoid Models get stuck if fall go far away from 0. Output always positive https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

4. Notes on non-linearity • Tanh Output can be +-. Models get stuck if fall go far away from 0 https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

5. Notes on non-linearity • ReLU High gradient in positive. Fast compute. Gradient doesn’t move in negative https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

6. Notes on non-linearity • Leaky ReLU Negative part now have some gradient. Real task doesn’t seem that much better than ReLU https://www.analyticsvidhya.com/blog/2017/10/fundamentals- deep-learning-activation-functions-when-to-use-them/

7. So we learn about basis and projections

8. Projections and Neural network weights • w T x w x w x

9. Projections and neural network weights • W T x w 1 x w 2 w 1 x w 2

10. Projections and neural network weights • W T x w 1 x fisher projection = V T W T x = (WV) T x w 2 LDA projections v 1 w 1 y x = y w 2 v 2

11. Projections and neural network weights • W T x w 1 x fisher projection = V T W T x = (WV) T x w 2 Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

12. Projections and neural network weights • W T x • Neural network layers as feature transform • Non-linearity prevents merging of layers w 1 x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

13. Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Need another filter that is shifted x • Can we do better? w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

14. Convolution • Continuous convolution • Meaning of (t-T) : Flip then shift

15. Convolution visually Demo https://en.wikipedia.org/wiki/Convolution

16. Convolution discrete • Discrete convolution • Continuous convolution • Same concept as continuous version

17. Matched filters • We can use convolution to detect things that match our pattern • Convolution can be considered as a filter (Why? Take ASR next semester J ) • If the filter detects our pattern, it will show up as a nice peak even if there are noise. • Demo

18. Matched filters Red: matched filter Blue: signal Matched peak

19. Convolution and Cross-Correlation • Convolution • (Cross)-Correlation Convolution and cross-correlation are the same if g(t) is symmetric (even function). For some unknown reason, people use convolution in CNN to mean cross-correlation. From this point onwards, when we say convolution we mean cross-correlation.

20. 2D convolution • Flip and shifts in 2D • But, we no longer flips Our match filter Will get some peak here

21. Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Need another filter that is shifted x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

22. Shift in feature space • W T x • What happens if I have a person that is off-frame? w 1 • Ans: Convolution with W as filter x w 2 fisher projection = V T W T x = (WV) T x Wv 1 Wv 2 LDA projections v 1 w 1 y x = y w 2 v 2

23. Convolutional Neural networks • A neural network with convolutions! (cross-correlation to be precise) • But we have peaks at different location From the point of view of a network, these are two different things.

24. Pooling layers/Subsampling layers • Combine different locations into one • One possible method is to use a max • Interpretation: Yes, I found a cat somewhere Max 1 filter

25. Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 filter1 Convolution output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

26. 4 5 6 Convolutional filters x 4*1 + 5*2 + 6*3 = 32 1 2 3 • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

27. Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

28. Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

29. Convolutional filters • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

30. Pooling/subsampling • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

31. max Pooling/subsampling = 6 4 5 6 • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

32. Pooling/subsampling • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

33. Pooling/subsampling • Convolutional layer consists of • Small filter patches • Pooling to remove variation Layer output1 33x33 Convolution output1 98x98 3x3 Max filter with no overlap Layer output2 Convolution output2

34. CNN overview • Filter size, number of filters, filter shifts, and pooling rate are all parameters • Usually followed by a fully connected network at the end • CNN is good at learning low level features • DNN combines the features into high level features and classify https://en.wikipedia.org/wiki/Convolutional_neural_network#/media/File:Typical_cnn.png

35. Parameter sharing in convolution neural networks • W T x • Cats at different location might need two neurons for different locations in fully w 1 connect NNs. x • CNN shares the parameters in 1 filter w 2 • The network is no longer fully connected Layer n+1 pooling convolutional Layer n Layer n-1 Layer n-1

36. Pooling/subsampling • Max filter -> Maxout • Backward pass? • Gradient pass through the maximum location, 0 otherwise

37. Norms (p-norm or Lp-norm) • For any real number p > 1 • For p = ∞ • We’ll see more of p-norms when we get to neural networks https://en.wikipedia.org/wiki/Lp_space

38. Pooling/subsampling • Max filter -> Maxout • Backward pass? • Gradient pass through the maximum location, 0 otherwise • P-norm filter • Fully connected layer – (1x1 convolutions) • Recently, people care less about the meaning of pooling as way to introduce a shift invariance, but more as a dimension reduction (since conv layers usually has a higher dimension than the input)

39. 1x1 Convolutions • Convolutional layer consists of • Small filter patches Convolution • Pooling to remove variation output1 98x98 Convolution filter1 output2 filter2 Input image Filter3 100 x 100 3x3 Convolution output3

40. 1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output2 Convolution output3

41. 1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Sum over channels Convolution output2 Convolution output3

42. 1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Sum over channels Convolution output2 Convolution output3

43. 1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output1 98x98 Sum over channels Convolution output2 Convolution output3

44. 1x1 Convolutions 1x1 filters (in space) 1x1xK from the previous output Convolution output1 98x98 Convolution output2 Convolution 98x98 output1 98x98 Sum over channels Convolution output2 If we have less 1x1 filters than previous level, we just perform dimensionality reduction. Convolution output3

45. Common schemes • INPUT -> [CONV -> RELU -> POOL]*N -> [FC -> RELU]*M -> FC • INPUT -> [CONV -> RELU -> CONV -> RELU -> POOL]*N -> [FC -> RELU]*M -> FC • If you working with images, just use a winning architecture.

Recommend

More recommend