Neural Networks: Old and New Ju Sun Computer Science & - PowerPoint PPT Presentation

Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32

Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) 2 / 32

Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight 2 / 32

Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight – Waiting list 2 / 32

Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 3 / 32

Model of biological neurons Credit: Stanford CS231N 4 / 32

Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites 4 / 32

Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon 4 / 32

Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon – The axon branches out and connects via synapese to dendrites of other neurons 4 / 32

Model of biological neurons Credit: Stanford CS231N 5 / 32

Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites 5 / 32

Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i 5 / 32

Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b 5 / 32

Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b – Fire rate is modeled by an activation function f , i.e., outputting f ( � i w i x i + b ) 5 / 32

Artificial neural networks Brain neural networks Credit: Max Pixel 6 / 32

Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel 6 / 32

Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? 6 / 32

Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level 6 / 32

Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level In this course, neural networks are always artificial. 6 / 32

Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 7 / 32

Artificial neurons 8 / 32

Artificial neurons �� w i x i + b = f ( w ⊺ x + b ) f i 8 / 32

Artificial neurons �� w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32

Artificial neurons Examples of activation function σ �� w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32

Artificial neurons Examples of activation function σ �� w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. Credit: [Hughes and Correll, 2016] 8 / 32

Neural networks One neuron: σ ( w ⊺ x + b ) 9 / 32

Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons 9 / 32

Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned 9 / 32

Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned Many models in machine learning are neural networks 9 / 32

A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 10 / 32

A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i 10 / 32

A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality 10 / 32

A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 10 / 32

A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 ... known as empirical risk minimization (ERM) framework in learning theory 10 / 32

A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 11 / 32

A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i 11 / 32

A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality 11 / 32

A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality – Find weights ( w 1 , . . . , w k , b 1 , . . . , b k ) to minimize the average loss n 1 � min ℓ [ y i , { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i )] n w ′ s,b ′ s i =1 11 / 32

Linear regression Credit: D2L 12 / 32

Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d Credit: D2L 12 / 32

Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b Credit: D2L 12 / 32

Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 Credit: D2L 12 / 32

Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 12 / 32

Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 Credit: D2L σ is the identity function 12 / 32

Perceptron Frank Rosenblatt (1928–1971) 13 / 32

Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } Frank Rosenblatt (1928–1971) 13 / 32

Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } – Model: y i ≈ σ ( w ⊺ x i + b ) , σ sign function Frank Rosenblatt (1928–1971) 13 / 32

Neural Networks: Old and New Ju Sun Computer Science & - PowerPoint PPT Presentation

Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32 Logistics Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Thank you for joining us. The program will commence momentarily. Meet The Professors Nurse and

PosHYdon Pilot Offshore green hydrogen NEPTUNE ENERGY THE NETHERLANDS FACTS & FIGURES

management of SMA: Multidisciplinary perspectives SMA, spinal muscular atrophy. What are the key

9.54 Review Supervised learning + Invariant Learning Shimon Ullman + Tomaso Poggio Danny Harari

AUTHORS BioNB 4240 Discussion: Sep. 7, 2011 Karin Zhu Karin Zhu Rafael Yuste and David W. Tank

Fast statistical methods for mapping synaptic connectivity on dendrites Liam Paninski Department

9.54 Shimon Ullman + Tomaso Poggio Danny Harari + Daniel Zysman + Darren Seibert 9.54, fall

Neural Networks: Old and New Ju Sun Computer Science & - PowerPoint PPT Presentation

Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32 Logistics Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Thank you for joining us. The program will commence momentarily. Meet The Professors Nurse and

PosHYdon Pilot Offshore green hydrogen NEPTUNE ENERGY THE NETHERLANDS FACTS &amp; FIGURES

management of SMA: Multidisciplinary perspectives SMA, spinal muscular atrophy. What are the key

9.54 Review Supervised learning + Invariant Learning Shimon Ullman + Tomaso Poggio Danny Harari

AUTHORS BioNB 4240 Discussion: Sep. 7, 2011 Karin Zhu Karin Zhu Rafael Yuste and David W. Tank

Fast statistical methods for mapping synaptic connectivity on dendrites Liam Paninski Department

9.54 Shimon Ullman + Tomaso Poggio Danny Harari + Daniel Zysman + Darren Seibert 9.54, fall

PosHYdon Pilot Offshore green hydrogen NEPTUNE ENERGY THE NETHERLANDS FACTS & FIGURES