Machine Learning & Neural Networks CS16: Introduction to Data - - PowerPoint PPT Presentation

machine learning neural networks
SMART_READER_LITE
LIVE PREVIEW

Machine Learning & Neural Networks CS16: Introduction to Data - - PowerPoint PPT Presentation

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms Spring 2020 Outline Overview Artificial Neurons Single-Layer Perceptrons Multi-Layer Perceptrons Overfitting and Generalization


slide-1
SLIDE 1

Machine Learning & Neural Networks

CS16: Introduction to Data Structures & Algorithms Spring 2020

slide-2
SLIDE 2

Outline

2
  • Overview
  • Artificial Neurons
  • Single-Layer Perceptrons
  • Multi-Layer Perceptrons
  • Overfitting and Generalization
  • Applications
slide-3
SLIDE 3

What do think of when you hear “Machine Learning”?

3

Bobby

“Alexa, play De Despa pacito ito.”

slide-4
SLIDE 4

Artificial Intelligence vs. Machine Learning

slide-5
SLIDE 5

What does it mean for machines to learn?

  • Can machines think?
  • Difficult question to answer because vague

definition of “think”:

  • Ability to process information/perform calculations
  • Ability to arrive at ‘intelligent’ results
  • Replication of the ‘intelligent’ process
5
slide-6
SLIDE 6

Let’s Think About This Differently

6
  • A machine learns when its pe

performance rformance at a particular ta task sk improves with ex experience perience

  • Alan Turing, in “Computing Machinery and

Intelligence” (1950)

  • Turing’s test: the Imitation

Game

  • Proposed that we instead

consider the question, “Can machines do what we (as thinking entities) do?”

slide-7
SLIDE 7

Machine Learning Algorithm Structure

  • Three key components:
  • Re

Repres esentat entation ion: define a space of possible programs

  • Loss

s fun unction tion: decide how to score a program’s performance

  • Op

Optim timiz izer er: how to search the space for the program with the highest score

  • Let’s revisit decision trees:
  • Re

Repres esentat entation ion: space of possible trees that can be built using attributes of the dataset as internal nodes and outcomes as leaf nodes

  • Loss

s fun unction tion: percent of testing examples misclassified

  • Op

Opti timiz mizer er: choose attribute that maximizes information gain

7
slide-8
SLIDE 8

Neurons

  • The brain has 100 billion neurons
  • Neurons are connected to 1000’s of other neurons by synapses
  • If the neuron’s electrical potential is high enough, neuron is

activated and fires

  • Each neuron is very simple
  • it either fires or not depending on its potential
  • but together they form a very complex “machine”
8
slide-9
SLIDE 9

Neuron Anatomy (…very simplified)

De Dend ndrite rites Axo xon Cell ll Body dy

Axo xon Termina inals

slide-10
SLIDE 10

Artificial Neuron

10
slide-11
SLIDE 11

Artificial Neuron

11
  • 1

multiplication

inner product

bias

Outputs 1 if input is larger than some threshold else it outputs 0

slide-12
SLIDE 12

Artificial Neuron

12
  • 1

Outputs 1 if input is larger than some threshold else it outputs 0

multiplication

inner product

bias

slide-13
SLIDE 13

Artificial Neuron

  • The bias b allows us to control the threshold of 𝞆
  • we can change the threshold by changing the weight/bias b
  • this will simplify how we describe the learning process
13
slide-14
SLIDE 14

The Perceptron (Rosenblatt,1957)

14
slide-15
SLIDE 15

Perceptron Network

15

x1 x2 x3 x4 N N N

y1 y2 y3

  • 1
slide-16
SLIDE 16

Perceptron Network

16

x1 x2 x3 x4 N N

y1 y2 y3

x1

x0=
  • 1

w0 w1 w2 w3 w4

slide-17
SLIDE 17

Training a Perceptron

  • What does it mean for a perceptron to learn?
  • as we feed it more examples (i.e., input + classification pairs)
  • it should get better at classifying inputs
  • Examples have the form (x1,…,xn,t)
  • where t is the “target” classification (the right classification)
  • How can we use examples to improve a (artificial) neuron?
  • which aspects of a neuron can we change/improve?
  • how can we get the neuron to output something closer to the target value?
17
slide-18
SLIDE 18

Perceptron Network

18

N

y1

t

Comp

update weights

x1 x2 x3 x4 N N

y2 y3

x1

x0=
  • 1
slide-19
SLIDE 19

Perceptron Training

  • Set all weights to small random values (positive and negative)
  • For each training example (x1,…,xn,t)
  • feed (x1,…,xn)to a neuron and get a result y
  • if y=t then we don’t need to do anything!
  • if y<t then we need to increase the neuron’s weights
  • if y>t then we need to decrease the neuron’s weights
  • We do this with the following update rule
19
slide-20
SLIDE 20

Perceptron Network

20

x1 x2 x3 x4 N N

y1 y2 y3

x1

x0=
  • 1

w0 w1 w2 w3 w4

slide-21
SLIDE 21

Artificial Neuron Update Rule

21
  • If y=t then Δi=0 and wi=wi
  • if y<t and xi>0 then Δi>0 and wi increases by Δi
  • if y>t and xi>0 then Δi<0 and wi decreases by Δi
  • What happens when xi<0?
  • last two cases are inverted! why?
  • recall that wi gets multiplied by xi so when xi<0, so if we want y to

increase then wi needs to be decreased!

slide-22
SLIDE 22

Artificial Neuron Update Rule

22
  • What is η for?
  • to control by how much wi should increase or decrease
  • if η is large then errors will cause weights to be changed a lot
  • if η is small then errors will cause weights to be change a little
  • large η increases speed at which a neuron learns but increases sensitivity to

errors in data

slide-23
SLIDE 23

Perceptron Training Pseudocode

23

Perceptron(data, neurons, k): for round from 1 to k: for each training example in data: for each neuron in neurons: y = output of feeding example to neuron for each weight of neuron: update weight

slide-24
SLIDE 24

Perceptron Training

24

3 min

Activity #1

x1 x2

x1 x2 t 1 1 1 1 1 1 1

  • 1

w0=-0.5 w1=-0.5 w2=-0.5

0.5

slide-25
SLIDE 25

Perceptron Training

  • Example (-1,0,0,0)
  • y=𝞆(-1×-0.5+0×-0.5+0×-0.5)=𝞆(0.5)=1
  • w0=-0.5+0.5(0-1)×-1=0
  • w1=-0.5+0.5(0-1)×0=-0.5
  • w2=-0.5+0.5(0-1)×0=-0.5
  • Example (-1,0,1,1)
  • y=𝞆(-1×0+0×-0.5+1×-0.5)=𝞆(-0.5)=0
  • w0=0+0.5(1-0)×-1=-0.5
  • w1=-0.5+0.5(1-0)×0=-0.5
  • w2=-0.5+0.5(1-0)×1=0
25

bias target

slide-26
SLIDE 26

Perceptron Training

  • Example (-1,1,0,1)
  • y=𝞆(-1×-0.5+1×-0.5+0×0)=𝞆(0)=0
  • w0=-0.5+0.5(1-0)×-1=-1
  • w1=-0.5+0.5(1-0)×1=0
  • w2=0+0.5(1-0)×0=0
  • Example (-1,1,1,1)
  • y=𝞆(-1×-1+1×0+1×0)=𝞆(1)=1
  • w0=-1
  • w1=0
  • w2=0
26

bias target

slide-27
SLIDE 27

Perceptron Training

  • Are we done?
  • No!
  • perceptron was wrong on examples:

(0,0,0),(0,1,1),&(1,0,1)

  • so we keep going until weights stop changing, or change only by

very small amounts (con

  • nverg

vergence ence)

  • For sanity, check if our final weights correctly classify (0,0,0)
  • w0=-1, w1=0, w2=0
  • y=𝞆(-1×-1+0×0+0×0)=𝞆(1)=1
27
slide-28
SLIDE 28

Perceptron Animation

slide-29
SLIDE 29

Single-Layer Perceptron

30

x1 x2 x3 x4 N N N

y1 y2 y3

  • 1
slide-30
SLIDE 30

Limits of Single-Layer Perceptrons

  • Perceptrons are limited
  • there are many functions they cannot learn
  • To better understand their power and limitations, it’s helpful to

take a geometric view

  • If we plot classifications of all possible inputs in the plane (or

hyperplane if high-dimensional)

  • perceptrons can learn the function if classifications can be

separated by a line (or hyperplane)

  • data is line

nearly arly sep eparable arable

31
slide-31
SLIDE 31

Linearly-Separable Classifications

32
slide-32
SLIDE 32

Single-Layer Perceptrons

  • In 1969, Minksy and Papert published
  • Perceptrons: An Introduction to Computational

Geometry

  • In it they proved that single-layer

perceptrons

  • could not learn some simple functions
  • This really hurt research in neural

networks…

  • …many became pessimistic about their potential
33
slide-33
SLIDE 33

Multi-Layer Perceptron

34

x1 x2 x3 x4 N N N

y1 y2 y3

  • 1

N N N

Inputs Hidden Layer Output Layer

  • 1
slide-34
SLIDE 34

Training Multi-Layer Perceptrons

  • Harder to train than a single-layer perceptron
  • if output is wrong, do we update weights of hidden neuron
  • r of output neuron? or both?
  • update rule for neuron requires knowledge of target but

there is no target for hidden neurons

  • MLPs are trained with stochastic gradient descent (SGD) using

backpropagation

  • invented in 1986 by Rumelhart, Hinton and Williams
  • technique was known before but Rumelhart et al. showed

precisely how it could be used to train MLPs

35
slide-35
SLIDE 35

Training Multi-Layer Perceptrons

36
slide-36
SLIDE 36

Training by Backpropagation

37

x1 x2 x3 x4 N N N

y1 y2 y3

  • 1

N N N

  • 1

t

Comp

update weights

Comp

update weights

slide-37
SLIDE 37

Training Multi-Layer Perceptrons

  • Specifics of the algorithm are beyond CS16
  • covered in CS142 and CS147
  • Architecture depends on your task and inputs
  • oftentimes, more layers don’t seem to add much more

power

  • tradeoff between complexity and number of parameters

needed to tune

  • Other kinds of neural nets
  • convolutional neural nets (image & video recognition)
  • recurrent neural nets (speech recognition)
  • many many more
38
slide-38
SLIDE 38

Overfitting

  • A challenge in ML is deciding how much to train a model
  • if a model is overtrained then it can overfit the training data
  • which can lead it to make mistakes on new/unseen inputs
  • Why does this happen?
  • training data can contain errors and noise
  • if model overfits training data then it “learns” those errors and noise
  • and won’t do as well on new unseen inputs
  • for more on overfitting see
  • https://www.youtube.com/watch?v=DQWI1kvmwRg
39
slide-39
SLIDE 39

Overfitting

  • A challenge in ML is deciding how much to train a model
  • if a model is overtrained then it can overfit the training data
  • which can lead it to make mistakes on new/unseen inputs
  • Why does this happen?
  • training data can contain errors and noise
  • if model overfits training data then it “learns” those errors and noise
  • and won’t do as well on new unseen inputs
  • for more on overfitting see
  • https://www.youtube.com/watch?v=DQWI1kvmwRg
40
slide-40
SLIDE 40

Overfitting & Generalization

41
slide-41
SLIDE 41

Overfitting & Generalization

  • So how do we know when to stop training?
  • one approach is to use the early stopping technique
  • Split the training examples into 3 sets
  • a training set (50%), a validation set (25%), a testing set (25%)
  • Train on the training set but
  • every 5 rounds, run NN on validation set
  • compute the NN’ s error over entire validation set
  • compare current error to previous error
  • if error is increasing, stop and use previous version of NN
42
slide-42
SLIDE 42

Early Stopping

43
slide-43
SLIDE 43

Applications

  • Musical composition
  • Daniel Johnson – composing music using a

recurrent neural network (RNN)

slide-44
SLIDE 44

Applications (continued)

  • Style Transfer
slide-45
SLIDE 45

Applications (continued)

  • Style Transfer
slide-46
SLIDE 46

Applications

  • Advertising
  • Credit card fraud detection
  • Skin-cancer diagnosis
  • Predicting earthquakes
  • Lip-reading from video
  • Even…neural networks to help you write neural

networks! (Neural Complete)

slide-47
SLIDE 47 48

Questions?