D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic - PowerPoint PPT Presentation

R ESTRICTED B OLTZMANN M ACHINES AND D EEP B ELIEF N ETWORKS ON M ULTI -C ORE P ROCESSORS Jo˜ Noel Lopes Bernardete Ribeiro ao Gonc ¸alves University of Coimbra Polytechnic Institute of Guarda June 11, 2012 WCCI–IJCNN

D EEP B ELIEF N ETWORKS (DBN S ) “Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. [...] The lower layers receive top-down, directed connections from the layers above. The states of the units in the lowest layer represent a data vector.” Geoffrey E. Hinton [Hinton et al., 2006]

O UTLINE Motivation Deep Belief Networks Restricted Boltzmann Machines GPU implementation Results on MNIST Handwritten Database Conclusions and Future Work

M OTIVATION The robustness and efficiency by which humans can recognize objects has ever been an intriguing challenge in computational intelligence.

M OTIVATION The robustness and efficiency by which humans can recognize objects has ever been an intriguing challenge in computational intelligence. Theoretical results suggest that deep architectures are fundamental to learn complex functions that can represent high-level abstractions (e.g. vision, language) [Bengio, 2009]

M OTIVATION The robustness and efficiency by which humans can recognize objects has ever been an intriguing challenge in computational intelligence. Theoretical results suggest that deep architectures are fundamental to learn complex functions that can represent high-level abstractions (e.g. vision, language) [Bengio, 2009] Empirical results show their successful application: classification, regression, dimensionality reduction, object recognition, information retrieval, robotics, and collaborative filtering etc. [Larochelle et al., 2007, Swersky et al., 2010].

D EEP VERSUS SHALLOW ARCHITECTURES model outputs ( y ) level d high-order features · · · level 2 model outputs ( y ) low-order features non-linear operations level 1 model inputs ( x ) model inputs ( x ) deep architecture shallow architecture

D EEP B ELIEF N ETWORKS DBNs are composed of several Restricted Boltzmann Machines (RBMs) stacked on top of each other. · · · h 3 · · · h 2 · · · h 1 · · · x

R ESTRICTED B OLTZMANN M ACHINES An RBM is an energy-based generative model that consists of a layer of binary visible units, v , and a layer of binary hidden units, h . hidden units · · · h j · · · h 1 h 2 h 3 h J 1 bias decoder encoder v 1 v 2 · · · v i · · · v I 1 bias visible units

R ESTRICTED B OLTZMANN M ACHINES Given an observed state, the energy of the joint configuration of the visible and hidden units ( v , h ) is given by (1): I J J I � � � � E ( v , h ) = − a i v i − b j h j − W ji v i h j , (1) i =1 j =1 j =1 i =1 h 1 h 2 h 3 · · · h j · · · h J 1 v 1 v 2 · · · v i · · · v I 1

R ESTRICTED B OLTZMANN M ACHINES The RBM defines a joint probability over ( v , h ) : p ( v , h ) = e − E ( v , h ) , (2) Z where Z is the partition function, obtained by summing the energy of all possible ( v , h ) configurations: e − E ( v , h ) . � Z = (3) v , h · · · · · · h 1 h 2 h 3 h j h J 1 v 1 v 2 · · · v i · · · v I 1

R ESTRICTED B OLTZMANN M ACHINES Given a random input configuration v , the state of the hidden unit j is set to 1 with probability: I � p ( h j = 1 | v ) = σ ( b j + v i W ji ) , (4) i =1 Similarly, given a random hidden vector, h , the state of the visible unit i can be set to 1 with probability: J � p ( v i = 1 | h ) = σ ( a i + h j W ji ) . (5) j =1

T RAINING AN RBM The following learning rule performs stochastic steepest ascent in the log probability of the training data: ∂ log p ( v , h ) = � v i h j � 0 − � v i h j � ∞ (6) ∂W ji where �·� 0 denotes the expectations for the data distribution ( p 0 ) and �·� ∞ denotes the expectations under the model distribution h 1 h 2 h 3 · · · h j · · · h J 1 v 1 v 2 · · · v i · · · v I 1

G IBBS SAMPLING h ( 0 ) · · · j � v i h j � 0 · · · i v ( 0 ) = x p ( h j = 1 | v ) = σ ( b j + � I i =1 v i W ji )

A LTERNATING G IBBS SAMPLING h ( 0 ) · · · j � v i h j � 0 · · · · · · i i v ( 0 ) = x v ( 1 ) p ( v i = 1 | h ) = σ ( a i + � J j =1 h j W ji )

A LTERNATING G IBBS SAMPLING h ( 0 ) h ( 1 ) h ( 2 ) h ( ∞ ) · · · j · · · j · · · j · · · j � v i h j � 0 � v i h j � ∞ i · · · i · · · i · · · i · · · v ( 0 ) = x v ( 1 ) v ( 2 ) v ( ∞ )

C ONTRASTIVE D IVERGENCE (CD– k ) Hinton proposed the Contrastive Divergence (CD) algorithm CD– k replaces � . � ∞ by �·� k for small values of k .

C ONTRASTIVE D IVERGENCE (CD– k ) v ( 0 ) ← x Compute the binary (features) states of the hidden units, h ( 0 ) , using v ( 0 ) for n ← 1 to k Compute the “reconstruction” states for the visible units, v ( n ) , using h ( n − 1 ) Compute the “reconstruction” states for the hidden units, h ( n ) , using v ( n ) end for Update the weights and biases, according to: ∆ W ji = γ ( � v i h j � 0 − � v i h j � k ) (7) ∆ b j = γ ( � h j � 0 − � h j � k ) (8) ∆ a i = γ ( � v i � 0 − � v i � k ) (9)

D EEP B ELIEF N ETWORKS (DBN) · · · h 1 p ( h 1 | x ) p ( x | h 1 ) · · · x

D EEP B ELIEF N ETWORKS (DBN) · · · h 2 p ( h 2 | h 1 ) p ( h 1 | h 2 ) · · · h 1 p ( h 1 | x ) p ( x | h 1 ) · · · x

D EEP B ELIEF N ETWORKS (DBN) · · · h 3 p ( h 3 | h 2 ) p ( h 2 | h 3 ) · · · h 2 p ( h 2 | h 1 ) p ( h 1 | h 2 ) · · · h 1 p ( h 1 | x ) p ( x | h 1 ) · · · x

D EEP B ELIEF N ETWORKS (DBN) high-level features (concepts) · · · h 3 · · · h 2 · · · h 1 low-level features · · · x

GPU IMPLEMENTATION Training a DBN is a computationally expensive task that involves training several RBMs and may require a considerable amount of time.

GPU IMPLEMENTATION Training a DBN is a computationally expensive task that involves training several RBMs and may require a considerable amount of time. Solution? GPU Parallel implementation

CUDA – D EVICE ARCHITECTURE Device Streaming Multiprocessor SM N · · · Streaming Multiprocessor SM 2 Streaming Multiprocessor SM 1 Shared Memory Processor Processor Processor Instruction · · · 1 2 M Unit Device Memory

CUDA – L AUNCHING A KERNEL GRID Block(3,0) Thread(0,0) Thread(1,0) Thread(2,0) Thread(3,0) Grid Block(0,0) Block(1,0) Block(2,0) Block(3,0) Thread(0,1) Thread(1,1) Thread(2,1) Thread(3,1) Thread(0,2) Thread(1,2) Thread(2,2) Thread(3,2) Block(0,1) Block(1,1) Block(2,1) Block(3,1) Threads within a block can share information.

CUDA – L AUNCHING A KERNEL GRID Block(3,0) Thread(0,0) Thread(1,0) Thread(2,0) Thread(3,0) Grid Block(0,0) Block(1,0) Block(2,0) Block(3,0) Thread(0,1) Thread(1,1) Thread(2,1) Thread(3,1) Thread(0,2) Thread(1,2) Thread(2,2) Thread(3,2) Block(0,1) Block(1,1) Block(2,1) Block(3,1) Threads within a block can share information. However blocks are required to run independently.

CUDA – L AUNCHING A KERNEL GRID Block(3,0) Thread(0,0) Thread(1,0) Thread(2,0) Thread(3,0) Grid Block(0,0) Block(1,0) Block(2,0) Block(3,0) Thread(0,1) Thread(1,1) Thread(2,1) Thread(3,1) Thread(0,2) Thread(1,2) Thread(2,2) Thread(3,2) Block(0,1) Block(1,1) Block(2,1) Block(3,1) Threads within a block can share information. However blocks are required to run independently. To address scalability the tasks should be partitioned.

CUDA – S CALABILITY Grid Block(0,0) Block(1,0) Block(2,0) Block(3,0) Block(0,1) Block(1,1) Block(2,1) Block(3,1) execution Device with 2 SMs Device with 4 SMs SM 0 SM 1 SM 0 SM 1 SM 2 SM 3 Block(0,0) Block(0,1) Block(0,0) Block(1,0) Block(2,0) Block(3,0) Block(1,0) Block(1,1) Block(0,1) Block(1,1) Block(2,1) Block(3,1) Block(2,0) Block(2,1) Block(3,0) Block(3,1)

K ERNELS v data ∈ IR N × I RBM inputs ( x ) w ∈ IR J × I ComputeStatusHiddenUnits Step 1. Compute h data weights h data ∈ IR N × J RBM outputs (data) a ∈ IR I ComputeStatusVisibleUnits CorrectWeights Step 2. Compute v recon Step 4. Correct weights visible units bias v recon ∈ IR N × I reconstructed inputs b ∈ IR J ComputeStatusHiddenUnits Step 3. Compute h recon hidden units bias h recon ∈ IR N × J reconstructed outputs

C OMPUTE S TATUS H IDDEN U NITS AND C OMPUTE S TATUS V ISIBLE U NITS KERNELS Each thread represents a connection Multiplies the clamped input by the weight Stores the weight in the shared memory Each block represents a neuron Uses fast shared memory to sum up the values computed by each thread Block ( Neuron ) Connection 1 Connection 2 Connection 3 Connection J . . .

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic - PowerPoint PPT Presentation

R ESTRICTED B OLTZMANN M ACHINES AND D EEP B ELIEF N ETWORKS ON M ULTI -C ORE P ROCESSORS Jo Noel Lopes Bernardete Ribeiro ao Gonc alves University of Coimbra Polytechnic Institute of Guarda June 11, 2012 WCCIIJCNN D EEP B ELIEF N

ENGIE Energa Per September 2016 Highlights Q316: EEP Total Installed Capacity reached 2,638

Practjcal, Powerful Presentatjon Skills JHB, CPT and DBN R 6, 675 Ex VAT 2 Days E - Learning

RBM, DBN, and DBM M. Soleymani Sharif University of Technology Fall 2017 Slides are based on

Graphical Models Kalman Filter DBN ML 701 Undirected Models Anna Goldenberg

Dynamic Bayesian network (DBN) HMM defined by Transition model P(X (t+1) |X (t) )

Wh Whis iskey y Mou ounta ntain in Bi Bighorn orn Sheep eep Draft Plan Presentation and

E XPECTATIONS , N ETWORKS , AND C ONVENTIONS Benjamin Golub Stephen Morris Harvard Princeton

What is UNRWA ( U nited N ations R elief and W orks A gency for Palestine Refugees in the Near

M APPING P ARAMETRIZED D IFFERENCE R EVISION O PERATORS TO B ELIEF C ONTRACTION Maria

Inference in b elief net w orks Chapter 15.3{4 + new c AIMA Slides Stuart Russell

Welcome to Todays Workshop! 1 Pl Please ease keep eep in in mi mind nd Turn cell

sl sleep eep di disorders sorders in the elderly Geriatrics Center Tehran University of

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen

EEP in AE-FUNAI A story of gladness and progress A status Update Presented by VC, AE- FUNAI at

RO ROBO BOT IN IN TK TKA: A: KEEP IT EEP IT CONVENTIONAL NVENTIONAL. IT . IT HAS HAS

PRO T EC T I NG YO UR BUSI NESS D AT A T Y T O K EEP Y O UR B USINESS U P AND RANSFER AND O T

T-duality Invariant Formalisms at the Quantum Level Daniel Thompson Queen Mary University of

Edaphobase a new online soil-zoological data warehouse U. Burkhardt, D.J. Russell, H. Hfer, S.

The boundary element method discretised with the space-time method for the heat equation in 2D 1

e x p [ ] d x ' e x p [ ] d y ' 4 a t ( t ') 4 a t (

Implementing Partial Evaluator Via Symbolic Execution (Work in Progress) Ran Ji Joint work with

Generating Wannier Function within OpenMX Hongming Weng ( ) Institute of Physics,

SMO Algorithm Milan Straka December 02, 2019 Charles University in Prague Faculty of

Soft-margin SVM, SMO Algorithm, Decision Trees Milan Straka November 25, 2019 Charles

Sambuz

Useful Links

Newsletter

Mail Us