The Knowledge Content of Neural Networks Keith L. Downing The - - PowerPoint PPT Presentation

the knowledge content of neural networks
SMART_READER_LITE
LIVE PREVIEW

The Knowledge Content of Neural Networks Keith L. Downing The - - PowerPoint PPT Presentation

The Knowledge Content of Neural Networks Keith L. Downing The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no March 25, 2014 Keith L. Downing The Knowledge Content of Neural Networks Overview


slide-1
SLIDE 1

The Knowledge Content of Neural Networks

Keith L. Downing

The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no

March 25, 2014

Keith L. Downing The Knowledge Content of Neural Networks

slide-2
SLIDE 2

Overview

Linear Separability Saliency Principle Components Analysis Hierarchical Clustering based on ANN Layer Behavior Topographic Maps

Keith L. Downing The Knowledge Content of Neural Networks

slide-3
SLIDE 3

Neurons as Detectors

z x y

wx wy netz activationz

tz z x y

2 5 tz = 1

1

2x +5y ≥ 1 ⇔ y ≥ −2 5x + 1 5

Keith L. Downing The Knowledge Content of Neural Networks

slide-4
SLIDE 4

Linear Separability

X Y X Y

  • +

+ + + +

If each data case has n features, then, when plotted in n-dimensional space, can the positive and negative instances be separated by a hyperplane of n-1 dimensions. E.g. If n = 2, the hyperplane = a line. If so, then a single-neuron detector can easily be reverse-engineered to detect the positive instances.

Keith L. Downing The Knowledge Content of Neural Networks

slide-5
SLIDE 5

Linear Separability of SOME Booleans

X Y

  • +

1 1

  • 1
  • 1

AND z x y

0.5 0.5 tz = 1 + + X Y

  • +

1 1

  • 1
  • 1

OR z x y

0.5 0.5 tz = 0

AND and OR are linearly separable for any input-vector size.

Keith L. Downing The Knowledge Content of Neural Networks

slide-6
SLIDE 6

...but not ALL Booleans

+ + X Y

  • 1

1

  • 1
  • 1

XOR

  • z

x y

?? ?? tz =??

This simple, non-linearly-separable example nearly killed neural network research. Perceptrons, Minsky & Papert (1972). Detecting non-linearly-separable classes requires more than 2 layers of neurons, but weights in multi-layer nets could not be learned prior to the popularization of backprop in the mid 1980’s.

Keith L. Downing The Knowledge Content of Neural Networks

slide-7
SLIDE 7

XOR requires 3 Layers

u x y

0.5

  • 0.5

tu = 1

v

  • 0.5

0.5 tv = 1

z

tz = 0 AND AND 0.5 0.5 OR + + X Y

  • 1

1

  • 1
  • 1
  • u

v y = x + 2 y = x - 2

XOR Keith L. Downing The Knowledge Content of Neural Networks

slide-8
SLIDE 8

ANNs Realize Complex Mappings

ANNs can perform mappings of any complexity, whether linearly separable or not. Although, it may require a lot of hidden layers and neurons. However, for a k-layered ANN (with k > 3) an equivalent ANN with k = 3 can be designed.

+ + X Y

  • 10
  • 10
  • 10
  • 10

+ + + + + + +

  • L1

L2 L3 + +

  • Keith L. Downing

The Knowledge Content of Neural Networks

slide-9
SLIDE 9

Level 1: Region Borders

Each of the 3 borderlines is expressed by a simple line, which translates into the weights of three detector neurons. A1 x y

  • 1

1 y - x > 0

A2 x y

1 1 y + x > 5 5

A3 x y

4 1 y + 4x > 30 30

These fire on all input vectors (x,y) that are above the line.

Keith L. Downing The Knowledge Content of Neural Networks

slide-10
SLIDE 10

Level 2: Regions

Each region of positive training instances is expressed as a conjunction of above and below relationships w.r.t. the borderlines. A1

  • 1

1

A2 x y

1 1 5

A3

4 1 30

R3

  • 1

1 1

  • 1

Region 3 is above border 2 and below borders 1 and 3.

Keith L. Downing The Knowledge Content of Neural Networks

slide-11
SLIDE 11

Level 3: Final Classification

A positive instance of the concept is an (x, y) case in any of the 3 regions, so the high-level detector, M, represents the disjunct of the 3 regions.

A1

  • 1

1

A2 x y

1 1 5

A3

4 1 30

R3

  • 1

1 1

  • 1

R1

1 1

  • 1

R2

2 1 1

M

1 1 1 1 Above Line And Or Keith L. Downing The Knowledge Content of Neural Networks

slide-12
SLIDE 12

Neurons Detect Salient Contexts

Three-spined stickleback experiments (Tinbergen, 1951) Males develop red bellies when establishing territory. Sight of the salient concept, a red belly, makes male’s aggressive, even

  • n abstract mock-up figures.

Something Nothing

Keith L. Downing The Knowledge Content of Neural Networks

slide-13
SLIDE 13

Saliency for Baby Chickens (Tinbergen, 1951)

Mock-ups resembling hawks elicit fear. Those resembling a goose do not.

"Hawk" Something "Goose" Nothing

Keith L. Downing The Knowledge Content of Neural Networks

slide-14
SLIDE 14

What Excites a Toad??

Worms or moving rectangles resembling worms (Ewert, 1980). Neurons in area T5(2) of the toad brain detect worm-ness.

Strong Response "Worm" Weak Response No Response "Anti-Worm" "Partial Worm" Keith L. Downing The Knowledge Content of Neural Networks

slide-15
SLIDE 15

What Excites an Artificial Neuron??

A B C

+2 +6 +1 +7

  • 3

+5 Sexy movie-star cheek mole Bright left eye Smile preferable to a frown Dull nose

Keith L. Downing The Knowledge Content of Neural Networks

slide-16
SLIDE 16

Two Keys to Intelligent Behavior

1

Knowing when to differentiate between two situations based on salient features (for which the situations have unequal values), and thus act differently in each.

2

Knowing when to generalize over two situations based on salient similarities, and thus treat each the same.

Salient features are very task dependent. Easy task → salient feature(s) have high variance among the cases. Hard task → salient feature(s) have low variance among the cases (e.g. Where’s Waldo?)

Keith L. Downing The Knowledge Content of Neural Networks

slide-17
SLIDE 17

Principal Component Analysis (PCA) with ANNs

Principle components of a data set = vector that captures the highest amounts of variance among the features. Important ANN Property If: the values of a data set are scaled (to a common range for each feature such as [0, 1]) and normalized by subtracting the mean vector from each element, these values are fed into a single output neuron, z, and the incoming weights to z are modified by correlation-based Hebbian means ⇒ z’s input-weight vector will reflect the principle components

  • f the data set.

Keith L. Downing The Knowledge Content of Neural Networks

slide-18
SLIDE 18

Weight Vectors Define Region Borders

The border between regions carved out by a single output neuron is perpendicular to that neuron’s weight vector xwx +ywy ≥ tz ⇔ y ≥ −wx wy x + tz wy The border is a line with slope = − wx

wy .

So, any vector with slope + wy

wx is perpendicular to that

border. Since neuron z’s incoming-weight vector is wx,wy, it has slope + wy

wx and is therefore perpendicular to the borderline.

Keith L. Downing The Knowledge Content of Neural Networks

slide-19
SLIDE 19

Of Mice and Elephants

Size Gray-Scale Color

Mouse Elephant Avg Vector

Raw Data Points

Animal Raw Data Scaled Data Normalized Data Mouse (0.05, 60) (0, 0.6) (-0.27, -0.04) Mouse (0.04, 62) (0, 0.62 (-0.27, -0.02) Mouse (0.06, 68) (0, 0.68) (-0.27, 0.04 Elephant (5400, 61) (0.54, 0.61) (0.27, -0.03) Elephant (5250, 66) (0.53, 0.66) (0.26, 0.03) Elephant (5300, 69) (0.53, 0.69) (0.26, 0.05)

Keith L. Downing The Knowledge Content of Neural Networks

slide-20
SLIDE 20

Hebbian Learning ⇒ Principle Components

△wi = λxiy

Gray-Scale Color

Weight Vector Borderline

Normalized Data Points Size

Mouse Elephant Avg Vector

Input (Size, Color) Output δwsize δwcolor (-0.27, -0.04)

  • 0.031

+0.0017 +0.0002 (-0.27, -0.02)

  • 0.029

+0.0016 +0.0001 (-0.27, 0.04)

  • 0.023

+0.0012

  • 0.0002

(0.27, -0.03) +0.024 +0.0013

  • 0.0001

(0.26, 0.03) +0.029 +0.0015 +0.0002 (0.26, 0.05) +0.031 +0.0016 +0.0003 Sum weight change: +0.0089 +0.0005

Keith L. Downing The Knowledge Content of Neural Networks

slide-21
SLIDE 21

PCA via ANN Summary

If the detectors of a network modify their input-weight vectors according to basic Hebbian principles, then, after training, the activation levels of detectors can be used to differentiate the input patterns along the dimensions of highest variance. Hence, those detectors will differentiate between objects (or situations) that are most distinct relative to the space of feature values observed in the training data. Train on animal pictures ⇒ Differentiate birds from horses better than horses from donkeys. Train on human faces ⇒ Differentiate males from females better than Swedes from Norwegians. The network figures out the most salient features on its own, via simple Hebbian means.

Keith L. Downing The Knowledge Content of Neural Networks

slide-22
SLIDE 22

Assessing Generality of an ANN

Generalization: Ability to handle similar cases with similar actions. In ANNs, measure the correlation between input patterns and activity patterns of output- or hidden-layer neurons, giving a coarse indicator of generalization. Hierarchical clustering (using dendograms) gives a more detailed, case-by-case assessment. A quick look at the hierarchical tree usually indicates whether or not the ANN has learned useful similarities and distinctions between the inputs. Animal Name Hidden-Layer Activation Pattern Cat Felix 11000011 Dog Max 00111100 Cat Samantha 10001011 Dog Fido 00011101 Cat Tabby 11011001 Dog Bruno 10110101

Keith L. Downing The Knowledge Content of Neural Networks

slide-23
SLIDE 23

Hierarchical Clustering

Begin with N items, each of which includes a tag, which in this example is the hidden-layer activation pattern that it evokes. Encapsulate each item in a singleton cluster and form the cluster set, C, consisting of all these clusters. Repeat until size(C) = 1

Find the two clusters, c1 and c2, in C that are closest, using distance metric D. Form cluster c3 as the union of c1 and c2; it becomes their parent on the hierarchical tree. Add c3 to C. Remove c1 and c2 from C

Keith L. Downing The Knowledge Content of Neural Networks

slide-24
SLIDE 24

Dendograms

Bruno 10110101 Fido 00011101 Max 00111100 Tabby 11011001 Felix 11000011 Samantha 10001011

Distance metric for clustering: D(c1,c2) = 1 M1M2 ∑

x∈c1 ∑ y∈c2

dham(tag(x),tag(y)) where M1 and M2 = sizes of clusters c1 and c2, respectively. We do not need to understand the concepts represented by the hidden nodes, only the similarities of the hidden-layer activation patterns.

Keith L. Downing The Knowledge Content of Neural Networks

slide-25
SLIDE 25

Topographic Neural Maps

Star-nosed Mole Brain Body scaled to match brain proportions

Keith L. Downing The Knowledge Content of Neural Networks

slide-26
SLIDE 26

Spatially Correlated Neural Populations

1 2 3 4 i j

a b*

b Pre-Motor Neurons Sensory Neurons Inputs c

  • vs -

Pi Pj

If similar inputs map to neighboring neurons, and those in turn map to neighboring neurons, etc., then: Generalization occurs naturally Small errors in perception still lead to the correct action Neural wiring can be reduced.

Keith L. Downing The Knowledge Content of Neural Networks

slide-27
SLIDE 27

Topographic Maps in the Brain

Spiral Ganglion Ventral Cochlear Nucleus Superior Olive Inferior Colliculus MGN Auditory Cortex 1 kHz 20 kHz 10 kHz 4 kHz Cochlea (Inner Ear) 1 kHz 4 kHz 10 kHz 20 kHz Cochlea Source Localization via Delay Lines

Isomorphism between 2 Spaces Spaces: Sound Frequencies + A Layer of Neurons If points p and q are close (distant) in the sound frequency space, then the neurons that detect frequencies p and q, np and nq, are also close neighbors (distant) in the neuron layer.

Keith L. Downing The Knowledge Content of Neural Networks

slide-28
SLIDE 28

Self-Organizing Visual Topographic Maps

Visual Field Visual Neurons

Keith L. Downing The Knowledge Content of Neural Networks

slide-29
SLIDE 29

Artificial Self-Organizing Maps (SOMs)

Euclidean Neighbor (closest weight vectors) Topological Neighbor (closest grid location) Correlated Uncorrelated

Competition + Cooperation Nodes compete for input patterns, but then share the win by allowing grid neighbors to also update their input weights to more closely match the input pattern.

Keith L. Downing The Knowledge Content of Neural Networks

slide-30
SLIDE 30

There Goes the Neighborhood

R = 1 R = 2 R = 3 Neuron Space Self-Organizing Learning

Keith L. Downing The Knowledge Content of Neural Networks

slide-31
SLIDE 31

TSP Using SOM

(.57, .11) (.25, .80) (.83, .66) (.37, .08) (.96, .34) Neuron Ring Neighborhood

Spaces Euclidean: City Locations Neural: A ring of neurons ⇒ Each neuron has 2 neighbors.

Keith L. Downing The Knowledge Content of Neural Networks

slide-32
SLIDE 32

Emerging Tours

TSP City Neuron Y X Y X Before After

Keith L. Downing The Knowledge Content of Neural Networks

slide-33
SLIDE 33

Development of Topographic Maps

A B C D Z Y X W

Axons read morphogen concentrations in their source layer and then search for similar chemical signatures in the target layer.

Keith L. Downing The Knowledge Content of Neural Networks

slide-34
SLIDE 34

Hebbian Fine-Tuning of Topographic Maps

A B C D Z Y X W

Active Inactive Weakened Active Inactive Strengthened 1 Visual Stimulus

Lower-layer neurons require 2 or more simultaneous inputs to fire. Hebbian Learning with STDP: fire together, wire together...fire apart, weaken.

Keith L. Downing The Knowledge Content of Neural Networks

slide-35
SLIDE 35

More Fine Tuning

A B C D Z Y X W

3 D fires but W does not, so D-W synapse weakens. This is a noncontinuous stimulus (less common in the real world), but it does suffice to fire W and D, so the D-W synapse strengthens.

A B C D Z Y X W

2

Keith L. Downing The Knowledge Content of Neural Networks

slide-36
SLIDE 36

And More Fine Tuning

4

A B C D Z Y X W A B C D Z Y X W

Network After Learning D fires after W, so more depression of D-W synapse Non-topographic connections weaken so much that they wither away.

When learning begins with many topographic links, the contiguous nature of most real-world stimuli will strongly bias the training set, leading to depression and disappearance of the non-topographic links.

Keith L. Downing The Knowledge Content of Neural Networks