Training Neural Networks
Some considerations
Gaurav Kumar Center for Language and Speech Processing gkumar@cs.jhu.edu
Training Neural Networks Some considerations Gaurav Kumar Center - - PowerPoint PPT Presentation
Training Neural Networks Some considerations Gaurav Kumar Center for Language and Speech Processing gkumar@cs.jhu.edu Universal Approximators Neural networks can approximate [1] function. any Capacity Layers hidden layer size
Gaurav Kumar Center for Language and Speech Processing gkumar@cs.jhu.edu
any
[1] function.
and hyper-parameters.
[1] K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 5 (July 1989) : proved this for a specific class of functions.
aspects of training:
during training
Adam)
Sigmoid Relu
Glorot & Bengio (2010), He et.al (2015)
Saxe et al, 2014,
saturation very early
features
L=−tlog(p)−(1−t)log(1−p)
cause underflow
0.000001 and 0.999999.
layer, normalize with the L2 norm.
penalties to flow in
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15
networks