neural networks old and new
play

Neural Networks: Old and New Ju Sun Computer Science & - PowerPoint PPT Presentation

Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32 Logistics Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li


  1. Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32

  2. Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) 2 / 32

  3. Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight 2 / 32

  4. Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight – Waiting list 2 / 32

  5. Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 3 / 32

  6. Model of biological neurons Credit: Stanford CS231N 4 / 32

  7. Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites 4 / 32

  8. Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon 4 / 32

  9. Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon – The axon branches out and connects via synapese to dendrites of other neurons 4 / 32

  10. Model of biological neurons Credit: Stanford CS231N 5 / 32

  11. Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites 5 / 32

  12. Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i 5 / 32

  13. Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b 5 / 32

  14. Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b – Fire rate is modeled by an activation function f , i.e., outputting f ( � i w i x i + b ) 5 / 32

  15. Artificial neural networks Brain neural networks Credit: Max Pixel 6 / 32

  16. Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel 6 / 32

  17. Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? 6 / 32

  18. Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level 6 / 32

  19. Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level In this course, neural networks are always artificial. 6 / 32

  20. Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 7 / 32

  21. Artificial neurons 8 / 32

  22. Artificial neurons �� � w i x i + b = f ( w ⊺ x + b ) f i 8 / 32

  23. Artificial neurons �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32

  24. Artificial neurons Examples of activation function σ �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32

  25. Artificial neurons Examples of activation function σ �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. Credit: [Hughes and Correll, 2016] 8 / 32

  26. Neural networks One neuron: σ ( w ⊺ x + b ) 9 / 32

  27. Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons 9 / 32

  28. Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned 9 / 32

  29. Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned Many models in machine learning are neural networks 9 / 32

  30. A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 10 / 32

  31. A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i 10 / 32

  32. A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality 10 / 32

  33. A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 10 / 32

  34. A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 ... known as empirical risk minimization (ERM) framework in learning theory 10 / 32

  35. A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 11 / 32

  36. A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i 11 / 32

  37. A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality 11 / 32

  38. A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality – Find weights ( w 1 , . . . , w k , b 1 , . . . , b k ) to minimize the average loss n 1 � min ℓ [ y i , { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i )] n w ′ s,b ′ s i =1 11 / 32

  39. Linear regression Credit: D2L 12 / 32

  40. Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d Credit: D2L 12 / 32

  41. Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b Credit: D2L 12 / 32

  42. Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 Credit: D2L 12 / 32

  43. Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 12 / 32

  44. Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 Credit: D2L σ is the identity function 12 / 32

  45. Perceptron Frank Rosenblatt (1928–1971) 13 / 32

  46. Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } Frank Rosenblatt (1928–1971) 13 / 32

  47. Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } – Model: y i ≈ σ ( w ⊺ x i + b ) , σ sign function Frank Rosenblatt (1928–1971) 13 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend