人工智能引论 2018 罗智凌
人工智能引论 (五)
- Intro. on Artificial Intelligence
( ) Intro. on Artificial Intelligence from the perspective of - - PowerPoint PPT Presentation
2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
Name Feed-Forward NN (Stochastic) Recurrent NN Input Feature Observation Output Ground truth (Latent, Visible) variables Learning Supervised Learning Unsupervised Learning Model Discriminative Model Generative Model Strategy Loss on ground truth(diff or entropy) Loss on observation(energy) Algorithm Gradient Descent (Variational) EM, Sampling Examples Perception, MLP, CNN LSTM, Markov Field, RBM Hybrid DBN, GAN, pre-trained/two-phase learning, AutoEncoder Ground truth Feature Observation
人工智能引论 2018 罗智凌 CNN
MLP Perceptron LSTM FRCNN GAN
RBM Auto
DBN bi-LSTM word2vec LDA Markov Net Hopfield Net
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
l An MLP consists of multiple layers and can map input data to output data via a set of nonlinear activation functions. MLP utilizes a supervised learning technique called backpropagation for training the network. l However, MLP will not be able to learn mapping functions that there are dependency between input data (i.e., sequential data)
Input Output
人工智能引论 2018 罗智凌
Recurrent Neural Network: An RNN has recurrent connections (connections to previous time steps of the same layer). l RNN are powerful but can get extremely complicated. Computations derived from earlier input are fed back into the network, which gives RNN a kind of memory. l Standard RNNs suffer from both exploding and vanishing gradients due to their iterative nature.
sequence input
(x0…xt) Embedding vector (ht)
人工智能引论 2018 罗智凌
Volodymyr Mnih Nicolas Heess Alex Graves Koray Kavukcuoglu. Recurrent Models of Visual Attention.
人工智能引论 2018 罗智凌
l LSTM is an RNN devised to deal with exploding and vanishing gradient problems in RNN. l An LSTM hidden layer consists of a set of recurrently connected blocks, known as memory cells. l Each of memory cells is connected by three multiplicative units - the input, output and forget gates. l The input to the cells is multiplied by the activation of the input gate, the output to the net is multiplied by the output gate, and the previous cell values are multiplied by the forget gate.
Sepp Hochreiter &Jűrgen Schmidhuber, Long short-term memory, Neural computation, Vol. 9(8), pp. 1735--1780, MIT Press, 1997
人工智能引论 2018 罗智凌
Cell state Hidden state Input
Cell/Hidden state Forget/Write/Read gate 3 sigmoid/ 1 tanh perceptron
人工智能引论 2018 罗智凌
Sigmoid function Hidden state at t-1 Input at t Forget signal, 1 represents “completely keep this”, 0 represents “completely forget this”
人工智能引论 2018 罗智凌
Hidden state at t-1 Input at t Write signal, 1 represents “completely write this”, 0 represents “completely ignore this” Content to write
人工智能引论 2018 罗智凌
Cell state at t-1 Write signal Write signal Updated cell state Content to write
人工智能引论 2018 罗智凌
Hidden state at t-1 Input at t Read signal, 1 represents “completely read this, 0 represents “completely ignore this Updated hidden state at t
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
𝑀 𝜄, 𝐸 = − '
( ∑
log 𝑞(
x 5 ) = '
( ∑
𝐹 x 5 − log 𝑎
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
i
i<j
j
人工智能引论 2018 罗智凌
net, start from a random state and then update units one at a time in random order. – Update each unit to whichever
lowest global energy. – i.e. use binary threshold units. 3 2 3 3
人工智能引论 2018 罗智凌
net, start from a random state and then update units one at a time in random order. – Update each unit to whichever
lowest global energy. – i.e. use binary threshold units. 3 2 3 3
人工智能引论 2018 罗智凌
net, start from a random state and then update units one at a time in random order. – Update each unit to whichever
lowest global energy. – i.e. use binary threshold units. 3 2 3 3
人工智能引论 2018 罗智凌
three units mostly support each other. – Each triangle mostly hates the other triangle.
where the other one has a weight of 3. – So turning on the units in the triangle
minimum. 3 2 3 3
人工智能引论 2018 罗智凌
memories could be energy minima of a neural net. – The binary threshold decision rule can then be used to “clean up” incomplete or corrupted memories.
minima was proposed by I. A. Richards in 1924 in “Principles of Literary Criticism”.
represent memories gives a content-addressable memory: – An item can be accessed by just knowing part of its content. – It is robust against hardware damage. – It’s like reconstructing a dinosaur from a few bones.
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
– 1、RNN本质是学习一个函数,因此有输入和输出层 的概念,而BM的用处在于学习一组数据的“内在表 示”,因此其没有输出层的概念。 – 2、RNN各节点链接为有向环,而BM各节点连接成 无向完全图
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Only one layer of hidden units. – No connections between hidden units.
to reach thermal equilibrium when the visible units are clamped. – So we can quickly get the exact value of :
i∈vis
hidden visible i j
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
i j i i j i j t = 0
Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy j t = 1 t = 2 t = infinity
人工智能引论 2018 罗智凌
t = 0 t = 1
Start with a training vector on the visible units. Update all the hidden units in parallel. Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again. This is not following the gradient of the log likelihood. But it works well. reconstruction data
<vihj>0
1
i j i j
人工智能引论 2018 罗智凌
50 binary neurons that learn features
16 x 16 pixel image
Increment weights between an active pixel and an active feature Decrement weights between an active pixel and an active feature data (reality) reconstruction (better than reality) 50 binary neurons that learn features
16 x 16 pixel image
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌 Reconstruction from activated binary features
Data
Reconstruction from activated binary features
Data
New test image from the digit class that the model was trained on Image from an unfamiliar digit class The network tries to see every image as a 2.
人工智能引论 2018 罗智凌
– Discriminative procedure – Positive phase – Increasing the free energy on observations.
– Generative procedure – Negative phase – Decreasing the energy on partition function.
Dream
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
1
2
2
1
1
1
2
2
1
copy binary state for each v Compose the two RBM models to make a single DBN model Train this RBM first Then train this RBM It’s not a Boltzmann machine!
人工智能引论 2018 罗智凌
To generate data: 1. Get an equilibrium sample from the top- level RBM by performing alternating Gibbs sampling for a long time. 2. Perform a top-down pass to get states for all the other layers. The lower level bottom-up connections are not part of the generative model. They are just used for inference.
3
1
人工智能引论 2018 罗智凌
2000 units 500 units 500 units
28 x 28 pixel image
10 labels
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
111
Ranzato
Error
input prediction
人工智能引论 2018 罗智凌
118
Ranzato
encoder
Error
input prediction
decoder
code
– input: code:
Sparsity Penalty
Le et al. “ICA with reconstruction cost..” NIPS 2011
h=W
T X
X L X ;W =∥W h−X∥
2∑ j∣h j∣
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
– This is a vector of word counts ignoring
– Ignore stop words (like “the” or “over”)
– So we reduce each query vector to a much smaller vector that still contains most of the information about the content
人工智能引论 2018 罗智凌
to reproduce its input vector as its output
much information as possible into the 10 numbers in the central bottleneck.
good way to compare documents.
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
hash function
人工智能引论 2018 罗智凌
– Pixels are not like words: individual pixels do not tell us much about the content.
– Matching real-valued vectors in a big database is slow and requires a lot of storage.
人工智能引论 2018 罗智凌
– This only requires a few words of storage per image and the serial search can be done using fast bit-operations.
– Do they find images that we think are similar?
人工智能引论 2018 罗智凌
1024 1024 1024 8192 4096 2048 1024 512
256-bit binary code The encoder has about 67,000,000 parameters. There is no theory to justify this architecture It takes a few days on a GTX 285 GPU to train on two million images.
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌 retrieved using 256 bit codes retrieved using Euclidean distance in pixel intensity space
人工智能引论 2018 罗智凌 retrieved using 256 bit codes retrieved using Euclidean distance in pixel intensity space
人工智能引论 2018 罗智凌 Leftmost column is the search image. Other columns are the images that have the most similar feature activities in the last hidden layer.
人工智能引论 2018 罗智凌
– Long Short Term Memory
– Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model
– Deep Belief Network – AutoEncoder – Generative Adversarial Network
人工智能引论 2018 罗智凌
– The probability of generating obs
– The difference between obs and generated obs
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
Log likelihood on read samples Log likelihood on fake samples noise perior
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌
人工智能引论 2018 罗智凌