Machine Learning & Neural Networks CS16: Introduction to Data - PowerPoint PPT Presentation

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms Spring 2020

Outline ‣ Overview ‣ Artificial Neurons ‣ Single-Layer Perceptrons ‣ Multi-Layer Perceptrons ‣ Overfitting and Generalization ‣ Applications 2

What do think of when you hear “Machine Learning”? Bobby “Alexa, play De Despa pacito ito .” 3

Artificial Intelligence vs. Machine Learning

What does it mean for machines to learn? ‣ Can machines think? ‣ Difficult question to answer because vague definition of “think”: ‣ Ability to process information/perform calculations ‣ Ability to arrive at ‘intelligent’ results ‣ Replication of the ‘intelligent’ process 5

Let’s Think About This Differently ‣ Alan Turing, in “Computing Machinery and Intelligence” (1950) ‣ Turing’s test: the Imitation Game ‣ Proposed that we instead consider the question, “Can machines do what we (as thinking entities) do?” ‣ A machine learns when its pe performance rformance at a particular ta task sk improves with ex experience perience 6

Machine Learning Algorithm Structure ‣ Three key components: ‣ Re Repres esentat entation ion: define a space of possible programs ‣ Loss s fun unction tion : decide how to score a program’s performance ‣ Op Optim timiz izer er: how to search the space for the program with the highest score ‣ Let’s revisit decision trees: ‣ Re Repres esentat entation ion: space of possible trees that can be built using attributes of the dataset as internal nodes and outcomes as leaf nodes ‣ Loss s fun unction tion: percent of testing examples misclassified ‣ Op Opti timiz mizer er: choose attribute that maximizes information gain 7

Neurons ‣ The brain has 100 billion neurons ‣ Neurons are connected to 1000 ’s of other neurons by synapses ‣ If the neuron’s electrical potential is high enough, neuron is activated and fires ‣ Each neuron is very simple ‣ it either fires or not depending on its potential ‣ but together they form a very complex “machine” 8

Neuron Anatomy (…very simplified) Axo xon Termina inals De Dend ndrite rites Axo xon Cell ll Body dy

Artificial Neuron 10

Artificial Neuron inner product -1 bias multiplication Outputs 1 if input is larger than some threshold else it outputs 0 11

Artificial Neuron inner product -1 bias multiplication Outputs 1 if input is larger than some threshold else it outputs 0 12

Artificial Neuron ‣ The bias b allows us to control the threshold of 𝞆 ‣ we can change the threshold by changing the weight/bias b ‣ this will simplify how we describe the learning process 13

The Perceptron (Rosenblatt,1957) 14

Perceptron Network -1 y 1 x 1 N y 2 x 2 N y 3 x 3 N x 4 15

Perceptron Network x 0 = w 0 -1 w 1 x 1 x 1 y 1 w 2 x 2 y 2 w 3 N x 3 w 4 y 3 N x 4 16

Training a Perceptron What does it mean for a perceptron to learn? ‣ as we feed it more examples (i.e., input + classification pairs) ‣ it should get better at classifying inputs ‣ Examples have the form (x 1 ,…, x n , t) ‣ where t is the “target” classification (the right classification) ‣ How can we use examples to improve a (artificial) neuron? ‣ which aspects of a neuron can we change/improve? ‣ how can we get the neuron to output something closer to the target value? ‣ 17

Perceptron Network update weights x 0 = -1 x 1 x 1 y 1 N Comp x 2 y 2 N x 3 y 3 N x 4 t 18

Perceptron Training ‣ Set all weights to small random values (positive and negative) ‣ For each training example (x 1 ,…, x n ,t) ‣ feed (x 1 ,…, x n ) to a neuron and get a result y ‣ if y=t then we don’t need to do anything! ‣ if y<t then we need to increase the neuron’s weights ‣ if y>t then we need to decrease the neuron’s weights ‣ We do this with the following update rule 19

Perceptron Network w 0 x 0 = -1 w 1 x 1 x 1 y 1 w 2 x 2 y 2 N w 3 x 3 y 3 w 4 N x 4 20

Artificial Neuron Update Rule ‣ If y=t then Δ i =0 and w i =w i ‣ if y<t and x i >0 then Δ i >0 and w i increases by Δ i ‣ if y>t and x i >0 then Δ i <0 and w i decreases by Δ i ‣ What happens when x i <0 ? ‣ last two cases are inverted! why? ‣ recall that w i gets multiplied by x i so when x i <0 , so if we want y to increase then w i needs to be decreased! 21

Artificial Neuron Update Rule What is η for? ‣ to control by how much w i should increase or decrease ‣ if η is large then errors will cause weights to be changed a lot ‣ if η is small then errors will cause weights to be change a little ‣ large η increases speed at which a neuron learns but increases sensitivity to ‣ errors in data 22

Perceptron Training Pseudocode Perceptron (data, neurons, k): for round from 1 to k: for each training example in data: for each neuron in neurons: y = output of feeding example to neuron for each weight of neuron: update weight 23

Perceptron Training x 1 x 2 t 0 0 0 w 0 =- 0.5 -1 0 1 1 1 0 1 w 1 =- 0.5 1 1 1 x 1 w 2 =- 0.5 x 2 0.5 3 min Activity #1 24

Perceptron Training bias ‣ Example (-1,0,0,0) target ‣ y= 𝞆 (-1×-0.5+0×-0.5+0×-0.5)= 𝞆 (0.5)= 1 ‣ w 0 =-0.5+0.5(0-1)×-1= 0 ‣ w 1 =-0.5+0.5(0-1)×0= -0.5 ‣ w 2 =-0.5+0.5(0-1)×0= -0.5 ‣ Example (-1,0,1,1) ‣ y= 𝞆 (-1×0+0×-0.5+1×-0.5)= 𝞆 (-0.5)= 0 ‣ w 0 =0+0.5(1-0)×-1= -0.5 ‣ w 1 =-0.5+0.5(1-0)×0= -0.5 ‣ w 2 =-0.5+0.5(1-0)×1= 0 25

Perceptron Training bias ‣ Example (-1,1,0,1) target ‣ y= 𝞆 (-1×-0.5+1×-0.5+0×0)= 𝞆 (0)= 0 ‣ w 0 =-0.5+0.5(1-0)×-1= -1 ‣ w 1 =-0.5+0.5(1-0)×1= 0 ‣ w 2 =0+0.5(1-0)×0= 0 ‣ Example (-1,1,1,1) ‣ y= 𝞆 (-1×-1+1×0+1×0)= 𝞆 (1)= 1 ‣ w 0 = -1 ‣ w 1 = 0 ‣ w 2 = 0 26

Perceptron Training Are we done? ‣ No! ‣ ‣ perceptron was wrong on examples: (0,0,0),(0,1,1),&(1,0,1) ‣ so we keep going until weights stop changing, or change only by very small amounts ( con onverg vergence ence ) ‣ For sanity, check if our final weights correctly classify (0,0,0) w 0 = -1, w 1 = 0, w 2 = 0 ‣ ‣ y= 𝞆 (-1×-1+0×0+0×0)= 𝞆 (1)= 1 27

Perceptron Animation

Single-Layer Perceptron -1 y 1 x 1 N y 2 x 2 N y 3 x 3 N x 4 30

Limits of Single-Layer Perceptrons ‣ Perceptrons are limited ‣ there are many functions they cannot learn ‣ To better understand their power and limitations, it’s helpful to take a geometric view ‣ If we plot classifications of all possible inputs in the plane (or hyperplane if high-dimensional) ‣ perceptrons can learn the function if classifications can be separated by a line (or hyperplane) ‣ data is line nearly arly sep eparable arable 31

Linearly-Separable Classifications 32

Single-Layer Perceptrons ‣ In 1969, Minksy and Papert published ‣ Perceptrons: An Introduction to Computational Geometry ‣ In it they proved that single-layer perceptrons ‣ could not learn some simple functions ‣ This really hurt research in neural networks… ‣ …many became pessimistic about their potential 33

Multi-Layer Perceptron -1 -1 x 1 y 1 N N x 2 y 2 N N x 3 y 3 N N x 4 Hidden Output Inputs Layer Layer 34

Training Multi-Layer Perceptrons ‣ Harder to train than a single-layer perceptron ‣ if output is wrong, do we update weights of hidden neuron or of output neuron? or both? ‣ update rule for neuron requires knowledge of target but there is no target for hidden neurons ‣ MLPs are trained with stochastic gradient descent (SGD) using backpropagation ‣ invented in 1986 by Rumelhart, Hinton and Williams ‣ technique was known before but Rumelhart et al. showed precisely how it could be used to train MLPs 35

Training Multi-Layer Perceptrons 36

Training by Backpropagation update weights -1 -1 x 1 Comp y 1 N N x 2 y 2 N N x 3 y 3 N N x 4 Comp update weights t 37

Training Multi-Layer Perceptrons ‣ Specifics of the algorithm are beyond CS16 ‣ covered in CS142 and CS147 ‣ Architecture depends on your task and inputs ‣ oftentimes, more layers don’t seem to add much more power ‣ tradeoff between complexity and number of parameters needed to tune ‣ Other kinds of neural nets ‣ convolutional neural nets (image & video recognition) ‣ recurrent neural nets (speech recognition) ‣ many many more 38

Overfitting ‣ A challenge in ML is deciding how much to train a model ‣ if a model is overtrained then it can overfit the training data ‣ which can lead it to make mistakes on new/unseen inputs ‣ Why does this happen? ‣ training data can contain errors and noise ‣ if model overfits training data then it “learns” those errors and noise ‣ and won’t do as well on new unseen inputs ‣ for more on overfitting see ‣ https://www.youtube.com/watch?v=DQWI1kvmwRg 39

Overfitting ‣ A challenge in ML is deciding how much to train a model ‣ if a model is overtrained then it can overfit the training data ‣ which can lead it to make mistakes on new/unseen inputs ‣ Why does this happen? ‣ training data can contain errors and noise ‣ if model overfits training data then it “learns” those errors and noise ‣ and won’t do as well on new unseen inputs ‣ for more on overfitting see ‣ https://www.youtube.com/watch?v=DQWI1kvmwRg 40

Machine Learning & Neural Networks CS16: Introduction to Data - PowerPoint PPT Presentation

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline Overview Artificial Neurons Single-Layer Perceptrons Multi-Layer Perceptrons Overfitting and Generalization

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks & backprop Byron C Wallace Neural

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Reinforcement Learning l Variation on Supervised Learning l Exact target outputs are not given l

Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Two Applications of Topological Methods for Neuronal Morphology Analysis Yusu Wang Computer

The Neurobiology of Addiction Addiction. Is it a Disease? The Disease Model of Addiction

Timing in Biological Systems Lou Scheffer Howard Hughes Medical Institute

Modelling Biochemical Reaction Networks Introductory lecture: What to model? Why? Marc R.

Methods for recording neuronal activity Prof. Tom Otis t.otis@ucl.ac.uk From animal

BASICS OF ARTIFICIAL NEURAL NETWORKS Tilo Burghardt | tilo@cs.bris.ac.uk 35 Slides Agenda for

Machine Learning & Neural Networks CS16: Introduction to Data - PowerPoint PPT Presentation

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline Overview Artificial Neurons Single-Layer Perceptrons Multi-Layer Perceptrons Overfitting and Generalization

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks &amp; backprop Byron C Wallace Neural

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Reinforcement Learning l Variation on Supervised Learning l Exact target outputs are not given l

Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Two Applications of Topological Methods for Neuronal Morphology Analysis Yusu Wang Computer

The Neurobiology of Addiction Addiction. Is it a Disease? The Disease Model of Addiction

Timing in Biological Systems Lou Scheffer Howard Hughes Medical Institute

Modelling Biochemical Reaction Networks Introductory lecture: What to model? Why? Marc R.

Methods for recording neuronal activity Prof. Tom Otis t.otis@ucl.ac.uk From animal

BASICS OF ARTIFICIAL NEURAL NETWORKS Tilo Burghardt | tilo@cs.bris.ac.uk 35 Slides Agenda for

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks & backprop Byron C Wallace Neural