basis of neural networks
play

Basis of Neural Networks School of Data Science, Fudan - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Basis of Neural Networks School of Data Science, Fudan University Dec. 20 th , 2017 General Neural Architectures for NLP 1. Represent the words/features with dense


  1. DATA130006 Text Management and Analysis Basis of Neural Networks 魏忠钰 复旦大学大数据学院 School of Data Science, Fudan University Dec. 20 th , 2017

  2. General Neural Architectures for NLP 1. Represent the words/features with dense vectors (embeddings) by lookup table’ 2. Concatenate the vectors 3. Multi-layer neural networks § Classification § Matching § ranking R. Collobert et al. “Natural language processing (almost) from scratch”

  3. Machine Learning § Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. (from Wikipedia)

  4. Formal Specification of Machine Learning § Input Data: 𝑦 " , 𝑧 " , 1 ≤ 𝑗 ≤ 𝑛 § Model § Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network § Criterion: § Loss Function: § L(y, f(x)) à Optimization § 𝑅 𝜄 = 4 5 5 ∑ 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) à Minimization "84 § Regularization: 𝜄 § Objective Function: Q 𝜄 + 𝜇 𝜄 ;

  5. Linear Classifier 𝑔 𝑦, 𝑋 = 𝑋𝑦 + b

  6. Generalized Linear Classification § Hypothesis is a logistic function of a linear combination of inputs 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 4 F x = 4?@AB (D) § We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥 , 𝑦 is linear

  7. Softmax § Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: , 𝑦) exp (𝑥 M , 𝑦 = 𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥 M P , 𝑦) ∑ exp (𝑥 " "84 § To present class c by one-hot vector 𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)] , § Where I() is indicator function

  8. Examples of word classification x = � D * 1 � W = � K * D � b = � K * 1 � •

  9. How to learn W? 𝑅 𝜄 = 1 5 𝑛 W 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) "84 § Hinge Loss (SVM) § Softmax loss: cross-entropy loss

  10. SVM vs Softmax (Quiz)

  11. Parameter Learning § In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?

  12. Gradient Descent § Gradient Descent: § 𝜇 is also called Learning Rate in ML.

  13. Gradient Descent

  14. Learning Rate

  15. Gradient Descent

  16. Stochastic Gradient Descent (SGD)

  17. Computational graphs

  18. Backpropagation: a simple example

  19. Biological Neuron

  20. Artificial Neuron

  21. Activation Functions

  22. Activation Functions

  23. Feedforward Neural Network

  24. Neural Network

  25. Feedforward Computing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend