Deconstructing Data Science David Bamman, UC Berkeley Info 290 - PowerPoint PPT Presentation

  Deconstructing Data Science David Bamman, UC Berkeley   Info 290   Lecture 16: Neural networks Mar 16, 2017

https://www.forbes.com/sites/kevinmurnane/2016/04/01/what-is-deep-learning-and-how-is-it-useful

Neural network libraries

The perceptron, again x β 1 -0.5 not if � F � 1 i x i β i ≥ 0 y i = ˆ − 1 otherwise bad 1 -1.7 movie 0 0.3

The perceptron, again x β β 1 if � F x 1 � 1 i x i β i ≥ 0 y i = ˆ − 1 otherwise 1 -0.5 not β 2 x 2 y bad 1 -1.7 β 3 x 3 movie 0 0.3

Neural networks • Two core ideas: • Non-linear activation functions • Multiple layers

W V x 1 W 1,1 h 1 V 1 W 1,2 W 2,1 y W 2,2 x 2 V 2 h 2 W 3,1 W 3,2 x 3 Input Output “Hidden” Layer

W V x 1 h 1 y x 2 h 2 x 3 W x V y -0.5 1.3 not 1 4.1 -1 bad 0.4 0.08 1 -0.9 movie 1.7 3.1 0

W V x 1 h 1 y x 2 h 2 x 3 � F � the hidden nodes are � h j = f x i W i , j completely determined by the input and weights i = 1

W V x 1 h 1 y x 2 h 2 x 3 � F � � h 1 = f x i W i , 1 i = 1

Activation functions 1 σ ( z ) = 1 + exp( − z ) 1.00 0.75 0.50 y 0.25 0.00 -10 -5 0 5 10 x

Activation functions tanh( z ) = exp( z ) − exp( − z ) exp( z ) + exp( − z ) 1.0 0.5 0.0 y -0.5 -1.0 -10 -5 0 5 10 x

Activation functions rectifier ( z ) = max( 0 , z ) 10.0 7.5 5.0 y 2.5 0.0 -10 -5 0 5 10 x

W V x 1 h 1 y x 2 h 2 x 3 � F � � h 1 = σ x i W i , 1 y = V 1 h 1 + V 2 h 2 i = 1 ˆ � F � � h 2 = σ x i W i , 2 i = 1

W V x 1 h 1 y x 2 h 2 x 3 � F � F � �� y = V 1 σ x i W i , 1 + V 2 σ x i W i , 2 ˆ i = 1 i = 1 � �� h 1 h 2 we can express y as a function only of the input x and the weights W and V

� F � F � �� y = V 1 σ x i W i , 1 + V 2 σ x i W i , 2 ˆ i = 1 i = 1 � �� h 1 h 2 This is hairy, but differentiable Backpropagation: Given training samples of <x,y> pairs, we can use stochastic gradient descent to find the values of W and V that minimize the loss.

x + α (-2x) 0 [ α = 0.1] x .1(-2x) -25 8.00 -1.60 6.40 -1.28 5.12 -1.02 -x^2 4.10 -0.82 -50 3.28 -0.66 2.62 -0.52 2.10 -0.42 -75 1.68 -0.34 1.34 -0.27 1.07 -0.21 0.86 -0.17 -100 0.69 -0.14 -10 -5 0 5 10 x d We can get to maximum value of this dx − x 2 = − 2 x function by following the gradient 17

Neural network structures x 1 h 1 1 y x 2 h 2 x 3 Output one real value

Neural network structures 0 y x 1 h 1 1 y x 2 h 2 0 y x 3 Multiclass: output 3 values, only one = 1 in training data

Neural network structures 1 y x 1 h 1 1 y x 2 h 2 0 y x 3 output 3 values, several = 1 in training data

Regularization • Increasing the number of parameters = increasing the possibility for overfitting to training data

Regularization • L2 regularization: penalize W and V for being too large • Dropout: when training on a <x,y> pair, randomly remove some node and weights. • Early stopping: Stop backpropagation before the training error is too small.

Deeper networks W 1 W 2 V x 1 h 1 h 2 x 2 y h 2 h 2 x 3 h 2 x 3

http://neuralnetworksanddeeplearning.com/chap1.html

Higher order features learned for image recognition   Lee et al. 2009 (ICML)

Autoencoder • Unsupervised neural network, where y = x • Learns a low-dimensional representation of x x 1 x 1 h 1 x 2 x 2 h 2 x 3 x 3

Feedforward networks x 1 h 1 y x 2 h 2 x 3

Recurrent networks input x hidden layer h label y

Interpretability x β β 1 x 1 exp( x � β ) P ( y = 1 | x, β ) = 1 + exp( x � β ) β 2 1 -0.5 not x 2 y bad 1 -1.7 β 3 x 3 movie 0 0.3 With a single-layer linear model (logistic/linear regression, perceptron) there’s an immediate relationship between x and y apparent in β

Interpretability x 1 h 1 y x 2 h 2 x 3 Non-linear activation functions induce dependencies between the inputs.

Deconstructing Data Science David Bamman, UC Berkeley Info 290 - PowerPoint PPT Presentation

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 16: Neural networks Mar 16, 2017 https://www.forbes.com/sites/kevinmurnane/2016/04/01/what-is-deep-learning-and-how-is-it-useful Neural network libraries The

C Constructing i (and Deconstructing) (and Deconstructing) the Postmortem Interval the

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 7: Data and

Deconstructing Alice & Bob Carlos Caleiro CLC, Dep. Mathematics, IST, TU Lisbon, Portugal

Deconstructing MinBFT for Security and Verifiability Vincent Rahli, Francisco Rocha, Marcus V

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 20: Distance models

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 5: Clustering

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 5: Clustering

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 9: Logistic

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 18: Distance models

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 3: Classification

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 8: Probabilistic

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 3: Classification

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 2: Survey of

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 10: Validity Feb

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 8: Naive Bayes Feb

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

AI and Predictive Analytics in Data-Center Environments Neural Networks and Deep Learning Josep

DCM rev. IL Vx Ton Toff T Ts Vx Vin VL Vin-Vo Vo -Vo [2-2] Mor M. Peretz,

Week 11 6/2/02 1 X11 Window System 6/2/02 2 Benefits of X n network transparency n clients on

Introduction A very important step in physical design cycle. A poor placement requires

Deep Belief Networks 4-27-16 Announcement We are extending the deadline for the final project.

Deep Feedforward Networks Thanks to Sargur Srihari, Alexander Ororbia, Christopher Olah Deep

Recursive List Func.ons in Racket Because Racket lists are defined recursively, its natural to

Recursion David E. Culler CS8 Computational Structures in Data Science

Deconstructing Data Science David Bamman, UC Berkeley Info 290 - PowerPoint PPT Presentation

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 16: Neural networks Mar 16, 2017 https://www.forbes.com/sites/kevinmurnane/2016/04/01/what-is-deep-learning-and-how-is-it-useful Neural network libraries The

C Constructing i (and Deconstructing) (and Deconstructing) the Postmortem Interval the

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 7: Data and

Deconstructing Alice &amp; Bob Carlos Caleiro CLC, Dep. Mathematics, IST, TU Lisbon, Portugal

Deconstructing MinBFT for Security and Verifiability Vincent Rahli, Francisco Rocha, Marcus V

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 20: Distance models

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 5: Clustering

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 5: Clustering

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 9: Logistic

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 18: Distance models

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 3: Classification

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 8: Probabilistic

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 3: Classification

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 2: Survey of

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 10: Validity Feb

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 8: Naive Bayes Feb

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

AI and Predictive Analytics in Data-Center Environments Neural Networks and Deep Learning Josep

DCM rev. IL Vx Ton Toff T Ts Vx Vin VL Vin-Vo Vo -Vo [2-2] Mor M. Peretz,

Week 11 6/2/02 1 X11 Window System 6/2/02 2 Benefits of X n network transparency n clients on

Introduction A very important step in physical design cycle. A poor placement requires

Deep Belief Networks 4-27-16 Announcement We are extending the deadline for the final project.

Deep Feedforward Networks Thanks to Sargur Srihari, Alexander Ororbia, Christopher Olah Deep

Recursive List Func.ons in Racket Because Racket lists are defined recursively, its natural to

Recursion David E. Culler CS8 Computational Structures in Data Science

Deconstructing Alice & Bob Carlos Caleiro CLC, Dep. Mathematics, IST, TU Lisbon, Portugal