12 unsupervised deep learning
play

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 - PowerPoint PPT Presentation

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from Wanli Ouyang, Zsolt Kira, Lawrence Neal, Raymond Yeh, Junting Lou and Teck-Yian Lim Unsupervised Learning in General Unsupervised learning is


  1. Sliced Wasserstein Distance • For each image, build a Laplacian pyramid • Sample many patches from these pyramids • Normalize them by their mean/variance • Yields R/G/B histograms at each scale • Measures difference in distributions • Used as SWD(real_images, generated_images) • Lower score (more similarity) is better • Measures realism

  2. Experiments

  3. Nearest Neighbors Comparison with training set images

  4. Results

  5. Boltzmann Machine (Fully-connected MRF/CRF) • Undirected graphical model • Binary values on each variable • • • Consider only binary interactions    )     E ( x; w x x x ij i j i i  i j i   f ( x ; ) m m m    E ( x ; ) e f ( x ; )     P ( x ; ) m ,        E ( x ; ) f ( x ; ) e Z ( ) m m m x x m  w  : { , } ij i Boltzmann machine: � � � � �

  6. Restricted Boltzmann Machines • We restrict the connectivity to make j hidden inference and learning easier. • Only one layer of hidden units. visible i • No connections between hidden units. 1  p ( h ) • In an RBM it only takes one step to  1 ( ) j reach thermal equilibrium when the    b v w visible units are clamped. j i ij  1 e  i vis • So we can quickly get the exact value of :   v i h j v

  7. What you gain

  8. Example: ShapeBM (Eslami et al. 2012) • Generating shapes • 2-layer RBM with local connections • Learning from many horses

  9. Training: Contrastive divergence Start with a training vector on the j j visible units.  v i h j  0  v i h j  1 Update all the hidden units in parallel. i i Update the all the visible units in t = 0 t = 1 parallel to get a “ reconstruction ” . reconstruction data Update the hidden units again . D w ij  e (  v i h j  0   v i h j  1 ) This is not following the gradient of the log likelihood. But it works well.

  10. 28x28 Layerwise Pretraining T W 1 (Hinton & Salakhutdinov, 2006) 1000 neurons • They always looked like a really T W 2 nice way to do non-linear 500 neurons T dimensionality reduction: W 3 • But it is very difficult to 250 neurons T optimize deep autoencoders W 4 linear using backpropagation. 30 units • We now have a much better W 4 way to optimize them: 250 neurons • First train a stack of 4 RBM’s W 3 • Then “unroll” them. 500 neurons • Then fine-tune with backprop. W 2 1000 neurons W 1 28x28 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend