Deep Belief Networks
Presented by Joseph Nunn Psych 149/239 Computational Models of Cognition University of California, Irvine Winter 2015
1
Deep Belief Networks Presented by Joseph Nunn Psych 149/239 - - PowerPoint PPT Presentation
Deep Belief Networks Presented by Joseph Nunn Psych 149/239 Computational Models of Cognition University of California, Irvine Winter 2015 1 Talk Structure Connectionist Background Material To Recognize Shapes, First Learn to Generate
Presented by Joseph Nunn Psych 149/239 Computational Models of Cognition University of California, Irvine Winter 2015
1
2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
2
3
vaguely related to actual neurons and brains.
patently unreal properties exist in Connectionist models and algorithms.
Algorithmic level of analysis, Connectionist model details are ‘inspired’ by neuroscience not rooted in it.
4
independent agents, or demons, running concurrently.
vocal depending on input they see in previous layer.
get represented in active conscious mind.
Distributed Processing (PDP) [Selfridge 1958]
5
consisting of an input layer and
learning functions not linearly separable in Perceptrons book [Minsky & Papert 1969].
the ‘death’ of connectionist research vs symbolist and the first AI Winter 1974-1980.
6
layers are capable of learning linearly separable functions, but no algorithm was known that could train them.
train multilayer networks ‘rediscovered’ and popularized in the mid 80’s by several people including Hinton.
between the expected output and the actual output and distributing that error
the connection weights by a small amount.
7
up, these are referred to as the ‘AI Winters’, 1974-1980 and 1987-1993.
Connectionist and Symbolicist approaches to AI, each of which at times has been ascendant.
vogue.
should be encouraged in Cognitive Science in order to avoid a similar fate. Both approaches have much to contribute.
Learning.
8
in AI in order to compare the performance of various algorithms.
academic and commercial (Kaggle).
Standards and Technology - handwriting database used in the papers reviewed.
Learning is within a few percent of what humans can do.
9
[Hinton 2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
10
11
configuration, perturb a random weight and evaluate.
any single weight changed could affect the output for any input in the test data.
know of no model of any size that uses should an algorithm.
12
features detectors in hidden layers.
‘interesting correlations’ between the elements in the previous layer.
input could be learned as a feature. At the top level feature detectors learned are hoped to be useful for categorizing the input.
space for a random basis explaining the input using heuristic
13
Learning Network.
Latest research shows the more the merrier, some networks 9-10 hidden layers.
Randomized Boltzmann Machine (RBM), top layer with symmetric connections.
better than Backprop.
14
memory that settles into stable state.
Wake-Sleep with Backprop for fine tuning. AKA ‘Bag of Tricks’.
https://www.youtube.com/ watch?v=AyzOUbkUf3M
15
2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
16
investigate efficiency and learning dynamics
between matrix dimensions.
including rapid stage like transitions.
develop arbitrary hierarchical structured data.
hierarchal organized data and the degrees of separation.
with magnitudes of singular values. Stronger input/output correlations described by singular values take less time to learn.
17
+1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1Modes Items
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8Items Items (a) (b) (c)
0.3$ 1$ 1$ &1$100 200 300 400 500 600 50 100 150 t (Epochs) Input−output mode strength Simulation Theory
2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
18
than structured probabilistic approaches.
be present in the real world, e.g. taxonomy hierarchies and prey/predator similarities.
‘integrated accounts’ across multiple levels of analysis for cognitive modeling.
AKA patterns of neuronal activations.
algorithm depending on constraints while underlying probabilistic problem remains the same.
without missing critical aspects.
19
2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
20
hierarchical feature detectors. A classification decision is made based on the features identified in the input.
recognize one distinguishing feature in the input.
vector set for interpreting data. Researchers like to think these feature units have semantic interpretations.
still has semantic interpretation.
feature detectors, but instead encode data regularities across whole activation surface.
21
(a) Unit sensitive to lower round stroke. (c) Unit senstive to left, upper round stroke. (a) Direction sensitive to upper straight stroke, or lower round stroke. (c) Direction senstive to round top stroke.
Random Basis Natural Basis
(b)
misclassifications.
particular Connectionist model or set of hyperparameters.
algorithms are not cognitively plausible.
analysis at Marr’s Computational level is best.
22
(a)
(a)
2006]
Networks [Saxe et al 2013]
Approaches to Cognition [McClelland et al 2010]
23
dominate AI competitions and performance on tasks such at MINST data.
advances in programmable GPUs due to parallel nature of Connectionist algorithms. Progress now ‘plugged in’ to Moore’s Law.
phones.
continue to dominate other forms of AI not amplifiable by parallelization on GPUs.
24
25