deconstructing data science
play

Deconstructing Data Science David Bamman, UC Berkeley Info 290 - PowerPoint PPT Presentation

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 16: Neural networks Mar 16, 2017 https://www.forbes.com/sites/kevinmurnane/2016/04/01/what-is-deep-learning-and-how-is-it-useful Neural network libraries The


  1. 
 Deconstructing Data Science David Bamman, UC Berkeley 
 Info 290 
 Lecture 16: Neural networks Mar 16, 2017

  2. https://www.forbes.com/sites/kevinmurnane/2016/04/01/what-is-deep-learning-and-how-is-it-useful

  3. Neural network libraries

  4. The perceptron, again x β 1 -0.5 not if � F � 1 i x i β i ≥ 0 y i = ˆ − 1 otherwise bad 1 -1.7 movie 0 0.3

  5. The perceptron, again x β β 1 if � F x 1 � 1 i x i β i ≥ 0 y i = ˆ − 1 otherwise 1 -0.5 not β 2 x 2 y bad 1 -1.7 β 3 x 3 movie 0 0.3

  6. Neural networks • Two core ideas: • Non-linear activation functions • Multiple layers

  7. W V x 1 W 1,1 h 1 V 1 W 1,2 W 2,1 y W 2,2 x 2 V 2 h 2 W 3,1 W 3,2 x 3 Input Output “Hidden” Layer

  8. W V x 1 h 1 y x 2 h 2 x 3 W x V y -0.5 1.3 not 1 4.1 -1 bad 0.4 0.08 1 -0.9 movie 1.7 3.1 0

  9. W V x 1 h 1 y x 2 h 2 x 3 � F � the hidden nodes are � h j = f x i W i , j completely determined by the input and weights i = 1

  10. W V x 1 h 1 y x 2 h 2 x 3 � F � � h 1 = f x i W i , 1 i = 1

  11. Activation functions 1 σ ( z ) = 1 + exp( − z ) 1.00 0.75 0.50 y 0.25 0.00 -10 -5 0 5 10 x

  12. Activation functions tanh( z ) = exp( z ) − exp( − z ) exp( z ) + exp( − z ) 1.0 0.5 0.0 y -0.5 -1.0 -10 -5 0 5 10 x

  13. Activation functions rectifier ( z ) = max( 0 , z ) 10.0 7.5 5.0 y 2.5 0.0 -10 -5 0 5 10 x

  14. W V x 1 h 1 y x 2 h 2 x 3 � F � � h 1 = σ x i W i , 1 y = V 1 h 1 + V 2 h 2 i = 1 ˆ � F � � h 2 = σ x i W i , 2 i = 1

  15. W V x 1 h 1 y x 2 h 2 x 3 � F � F � �� � �� � � y = V 1 σ x i W i , 1 + V 2 σ x i W i , 2 ˆ i = 1 i = 1 � �� � � �� � h 1 h 2 we can express y as a function only of the input x and the weights W and V

  16. � F � F � �� � �� � � y = V 1 σ x i W i , 1 + V 2 σ x i W i , 2 ˆ i = 1 i = 1 � �� � � �� � h 1 h 2 This is hairy, but differentiable Backpropagation: Given training samples of <x,y> pairs, we can use stochastic gradient descent to find the values of W and V that minimize the loss.

  17. x + α (-2x) 0 [ α = 0.1] x .1(-2x) -25 8.00 -1.60 6.40 -1.28 5.12 -1.02 -x^2 4.10 -0.82 -50 3.28 -0.66 2.62 -0.52 2.10 -0.42 -75 1.68 -0.34 1.34 -0.27 1.07 -0.21 0.86 -0.17 -100 0.69 -0.14 -10 -5 0 5 10 x d We can get to maximum value of this dx − x 2 = − 2 x function by following the gradient 17

  18. Neural network structures x 1 h 1 1 y x 2 h 2 x 3 Output one real value

  19. Neural network structures 0 y x 1 h 1 1 y x 2 h 2 0 y x 3 Multiclass: output 3 values, only one = 1 in training data

  20. Neural network structures 1 y x 1 h 1 1 y x 2 h 2 0 y x 3 output 3 values, several = 1 in training data

  21. Regularization • Increasing the number of parameters = increasing the possibility for overfitting to training data

  22. Regularization • L2 regularization: penalize W and V for being too large • Dropout: when training on a <x,y> pair, randomly remove some node and weights. • Early stopping: Stop backpropagation before the training error is too small.

  23. Deeper networks W 1 W 2 V x 1 h 1 h 2 x 2 y h 2 h 2 x 3 h 2 x 3

  24. http://neuralnetworksanddeeplearning.com/chap1.html

  25. Higher order features learned for image recognition 
 Lee et al. 2009 (ICML)

  26. Autoencoder • Unsupervised neural network, where y = x • Learns a low-dimensional representation of x x 1 x 1 h 1 x 2 x 2 h 2 x 3 x 3

  27. Feedforward networks x 1 h 1 y x 2 h 2 x 3

  28. Recurrent networks input x hidden layer h label y

  29. Interpretability x β β 1 x 1 exp( x � β ) P ( y = 1 | x, β ) = 1 + exp( x � β ) β 2 1 -0.5 not x 2 y bad 1 -1.7 β 3 x 3 movie 0 0.3 With a single-layer linear model (logistic/linear regression, perceptron) there’s an immediate relationship between x and y apparent in β

  30. Interpretability x 1 h 1 y x 2 h 2 x 3 Non-linear activation functions induce dependencies between the inputs.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend