Unsupervised Learning in Neural Networks Keith L. Downing The - - PowerPoint PPT Presentation

unsupervised learning in neural networks
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Learning in Neural Networks Keith L. Downing The - - PowerPoint PPT Presentation

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no January 18, 2014 Keith L. Downing Unsupervised Learning in Neural Networks Unsupervised


slide-1
SLIDE 1

Unsupervised Learning in Neural Networks

Keith L. Downing

The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no

January 18, 2014

Keith L. Downing Unsupervised Learning in Neural Networks

slide-2
SLIDE 2

Unsupervised Learning

No ”instructor” feedback Network tunes itself, often using local mechanisms (i.e., those involving a neuron and a few of its neighbors, but no global controller) to detect patterns that are prevalent in the inputs. Appears to be common in neocortex and hippocampus of brains. Hopfield Networks (a.k.a. Attractor Networks) and Kohonen (a.k.a. Self-Organizing Maps (SOMs)) are common implementations. Hebbian synaptic modifications commonly used.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-3
SLIDE 3

Neural Coding Schemes

Local Fully Distributed Grandma!! Semi-Local or Sparsely Distributed

Given N neurons Local: N

1

  • = N patterns

Sparsely Distributed: N

k

  • =

N! (N−k)!k! patterns. N = 20, k= 3 → 1140

Fully Distributed: N

N 2

  • =

N! ( N

2 !)2 patterns. N = 20 → 184756

*But, to store 184756 patterns with 20 nodes in a useful manner isn’t easy!

Keith L. Downing Unsupervised Learning in Neural Networks

slide-4
SLIDE 4

Hopfield Networks

Use Distributed/Population Codes

1

Storage efficiency - in theory, k neurons with m differentiable states can store mk patterns.

2

Each node represents a pattern component (e.g. pixel).

3

Each (bi-directional) arc represents the correlation (observed in the full set of training patterns) between those components.

4

Pattern completion - given part of a pattern, the network can often fill in the rest.

5

Content addressable memory - patterns are retrieved using portions of the pattern (not memory addresses) as keys.

6

Robustness - if a few neurons die, each pattern may be slightly corrupted, but none is lost completely.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-5
SLIDE 5

Auto-Associative Training Phase: Pattern Storage

a a b b +1

  • 1

a b

w

c d

Learning Intra-Pattern Correlations wjk ← 1 P

P

p=1

cpkcpj where P = # of patterns to learn, cpk = value of kth component in the pth pattern, and wjk = weight between the jth and kth neuron.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-6
SLIDE 6

Simple Example of Auto-Associative Learning

  • 1. Patterns to Learn

p1

1 2 4 3

p2

2 3 4 1

p3 On (+1) Off (-1) Neutral (0) w12 +1 +1 -1 +1/3 w13 +1 -1 -1 -1/3 w14 -1 +1 +1 +1/3 w23 +1 -1 +1 +1/3 w24 -1 +1 -1 -1/3 w34 -1 -1 -1 -1 p1 p2 p3 Avg

  • 2. Hebbian Weight Initialization
  • 3. Build Network

1 4 3 2 +1/3

  • 1
  • 1/3
  • 1/3

+1/3 +1/3

1 2 3 4

Keith L. Downing Unsupervised Learning in Neural Networks

slide-7
SLIDE 7

Auto-Associative Testing Phase: Pattern Recall

a b c d

Oscillation + Convergence to an Attractor Pattern ck(t +1) ← sign(

C

j=1

wkjcj(t)+Ik) where ck(t) = binary activation value of the kth neuron at time t, and Ik is the constant forcing input to neuron k from the initial pattern.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-8
SLIDE 8

Simple Example of Auto-Associative Recall

On (+1) Off (-1) Neutral (0) Enter partial pattern and run to quiescence

1 2 4 3 1 4 3 2

+1/3

  • 1
  • 1/3
  • 1/3

+1/3 +1/3 1 4 3 2 +1/3

  • 1
  • 1/3
  • 1/3

+1/3 +1/3

1 2 3 4

Keith L. Downing Unsupervised Learning in Neural Networks

slide-9
SLIDE 9

Hetero-Associative Training Phase

a a b b

a a

w

b b

+1 w

  • 1

Learning Inter-Pattern Correlations wjk ← 1 P

P

p=1

ipkopj where ipk and opj are the values of the kth and jth components of the input and output patterns, respectively, for the pth pair of patterns.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-10
SLIDE 10

Hetero-Associative Testing Phase

a a b b

Oscillation + Convergence to a PAIR of attractor patterns ck(t +1) ← sign(

C

j=1

wkjcj(t)+Ik) *Same as in the auto-associative network, but alternate between updating neurons in the left and right networks.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-11
SLIDE 11

Interference → Spurious Recall in Hopfield Networks

a b c d

Keith L. Downing Unsupervised Learning in Neural Networks

slide-12
SLIDE 12

Recall = Search in an Energy Landscape

ANN Activation State Hopfield Energy

Hopfield Energy Metric E = −a

C

k=1 C

j=1

wjkcjck −b

C

k=1

Ikck

Keith L. Downing Unsupervised Learning in Neural Networks

slide-13
SLIDE 13

Hopfield Search in the Brain

eyes beak neck ears mouth eyes beak neck ears mouth Duck Rabbit

Flip-Flop Figures Sharing of features between the two figures + Habituation of neurons ⇒ repeated switching between network attractors = scene interpretations.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-14
SLIDE 14

Competitive Neural Networks

Characteristics Single neurons in Intermediate and/or Output Layers function as detectors for multi-neuron patterns of activity in upstream layers. Detector neurons tend to inhibit one another. The weights on afferent (i.e. incoming) arcs to each detector serve as a prototype of the class of patterns that it detects. These weights are modified by local Hebbian mechanisms.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-15
SLIDE 15

Competitive ANNs

1 2 3 4 i j wi1 wi4 wi2 wj4 .5 .3 .7 .1 .9 .9 .7 .8 .9 .1 .8 .1

Class i

.1 .1 .1 .3 .2 .3 .3 .8

Class j

.5 .3 .7 .1 Output Classes Input Nodes Input Case

Keith L. Downing Unsupervised Learning in Neural Networks

slide-16
SLIDE 16

Prototype Learning

Weight Adjustment by the Winning Node wij ← wij +η(Pj −wij) (1) Throught multiple training epochs, this simple update produces input weight vectors = prototype input patterns - but only if all input weights (to each detector node, i) are normalized:

n

j=1

w2

ij = 1

(2) * See compendium for mathematical proof.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-17
SLIDE 17

Competitive Networks in the Brain

P2 P1 P3

Random Activation

  • f

Principal Cells

P2 P1 P3

Lateral Inhibition & Learning P2 becomes a detector for pattern 1001

P2 P1 P3

Random Activation

  • f

Principal Cells

P2 P1 P3

Lateral Inhibition & Learning P3 becomes a detector for pattern 0101

Keith L. Downing Unsupervised Learning in Neural Networks

slide-18
SLIDE 18

Spatially Correlated Neural Populations

1 2 3 4 i j

a b*

b Pre-Motor Neurons Sensory Neurons Inputs c

  • vs -

Pi Pj

If similar inputs map to neighboring neurons, and those in turn map to neighboring neurons, etc., then: Generalization occurs naturally Small errors in perception still lead to the correct action Neural wiring can be reduced.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-19
SLIDE 19

Topological Maps in the Brain

Spiral Ganglion Ventral Cochlear Nucleus Superior Olive Inferior Colliculus MGN Auditory Cortex 1 kHz 20 kHz 10 kHz 4 kHz Cochlea (Inner Ear) 1 kHz 4 kHz 10 kHz 20 kHz Cochlea Source Localization via Delay Lines

Isomorphism between 2 Spaces Spaces: Sound Frequencies + A Layer of Neurons If points p and q are close (distant) in the sound frequency space, then the neurons that detect frequencies p and q, np and nq, are also close neighbors (distant) in the neuron layer.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-20
SLIDE 20

Self-Organizing Visual Topological Maps

Visual Field Visual Neurons

Keith L. Downing Unsupervised Learning in Neural Networks

slide-21
SLIDE 21

Artificial Self-Organizing Maps (SOMs)

Euclidean Neighbor (closest weight vectors) Topological Neighbor (closest grid location) Correlated Uncorrelated

Competition + Cooperation Nodes compete for input patterns, but then share the win by allowing grid neighbors to also update their input weights to more closely match the input pattern.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-22
SLIDE 22

There Goes the Neighborhood

R = 1 R = 2 R = 3 Neuron Space Self-Organizing Learning

Keith L. Downing Unsupervised Learning in Neural Networks

slide-23
SLIDE 23

TSP Using SOM

(.57, .11) (.25, .80) (.83, .66) (.37, .08) (.96, .34) Neuron Ring Neighborhood

Spaces Euclidean: City Locations Neural: A ring of neurons ⇒ Each neuron has 2 neighbors.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-24
SLIDE 24

Emerging Tours

TSP City Neuron Y X Y X Before After

Keith L. Downing Unsupervised Learning in Neural Networks

slide-25
SLIDE 25

Hebbian Learning Rules

General Hebbian: Basic Homosynaptic △wi = λuiv △wi = λ(v −θv)ui Basic Heterosynaptic BCM △wi = λv(ui −θi) △wi = λuiv(v −θv)

Homosynaptic All active synapses are modified the same way, depending only on the strength of the postsynaptic activity. Heterosynaptic Active synapses can be modified differently, depending upon the strength of their presynaptic activity.

Keith L. Downing Unsupervised Learning in Neural Networks

slide-26
SLIDE 26

Controlling Hebbian-Learning Weight-Vector Instability

Positive Feedback ⇒ Weight Explosions High post-synaptic firing ⇒ wij ⇑ ⇒ Higher firing ⇒ wij ⇑ ... All activation-function outputs in [0, 1] ⇒ Major trouble! Outputs in [-1, 1] ⇒ Still trouble! Thresholding (e.g. in homosynaptic, heterosynaptic and BCM rules) ⇒ Still trouble! Solutions BCM + dynamic θv: effective, but expensive. Weight normalization: effective, but expensive. Oja rule: △wi = λv(ui −v | wi |) = uiv −v2 | wi |

Forgetting term implicitly controls weight explosion Emergent weight normalization! Achieves Principle Component Analysis (PCA)

Keith L. Downing Unsupervised Learning in Neural Networks

slide-27
SLIDE 27

Hebbian Episodic Learning in the Hippocampus

Entorhinal Cortex (EC) Dentate gyrus (DG) CA3 CA1 Subiculum EC Sub DG CA3 CA1 CA2 6-layered 3-layered

Keith L. Downing Unsupervised Learning in Neural Networks

slide-28
SLIDE 28

Auto-Associative Learning in CA3

3 1 4 5 6 2 Dentate Gyrus

1 - 5% Recurrence CA3

Keith L. Downing Unsupervised Learning in Neural Networks