cmp784
play

CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut - PowerPoint PPT Presentation

photo by unsplash user @tuvaloland CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut Erdem // Hacettepe University // Spring 2020 latent by Tom White Previously on CMP784 Motivation for Variational Autoencoders (VAEs)


  1. photo by unsplash user @tuvaloland CMP784 DEEP LEARNING Lecture #12 – Self-Supervised Learning Aykut Erdem // Hacettepe University // Spring 2020

  2. latent by Tom White Previously on CMP784 • Motivation for Variational Autoencoders (VAEs) • Mechanics of VAEs • Separatibility of VAEs • Training of VAEs • Evaluating representations • Vector Quantized Variational Autoencoders (VQ-VAEs) 2

  3. Lecture Overview • Predictive / Self-supervised learning • Self-supervised learning in NLP • Self-supervised learning in vision sclaimer: Much of the material and slides for this lecture were borrowed from Discl — Andrej Risteski's CMU 10707 class — Jimmy Ba's UToronto CSC413/2516 class 3

  4. Unsupervised Learning • Learning from data without labels. • What can we hope to do: – Task A: Fit a parametrized structure (e.g. clustering, low-dimensional subspace, manifold) to data to reveal something meaningful about data (Structure learning) – Task B: Learn a (parametrized) distribution close to data generating distribution. (Distribution learning) – Task C: Learn a (parametrized) distribution that implicitly reveals an “embedding”/“representation” of data for downstream tasks. (Representation/feature learning) • Entangled! The “structure” and “distribution” often reveals an embedding. 4

  5. Self-Supervised/Predictive Learning • Given unlabeled data, design supervised tasks that induce a good representation for downstream tasks. • No good mathematical formalization, but the intuition is to “force” the predictor used in the task to learn something “semantically meaningful” about the data. 5

  6. Self-Supervised/Predictive Learning ► Predict any part of the input from any other part. ► Predict the future from the past . ► Predict the future from the recent past . ► Predict the past from the present . ► Predict the top from the bottom . ► Predict the occluded from the visible ► Pretend there is a part of the input you don’t know and predict that. Slide by Yann LeCun 6

  7. How Much Information Does the Machine Need to Predict? Y LeCun “Pure” Reinforcement Learning (cherry) The machine predicts a scalar reward given once in a while. A few bits for some samples Supervised Learning (icing) The machine predicts a category or a few numbers for each input Predicting human-supplied data 10 10,000 bits per sample → Unsupervised/Predictive Learning (cake) The machine predicts any part of its input for any observed part. • LeCun’s original cake Predicts future frames in videos analogy slide, presented Millions of bits per sample at his keynote speech in (Yes, I know, this picture is slightly offensive to RL folks. But I’ll make it up) NIPS 2016. 7

  8. Y. LeCun How Much Information is the Machine Given during Learning? “Pure” Reinforcement Learning (cherry) The machine predicts a scalar reward given once in a while. A few bits for some samples Supervised Learning (icing) The machine predicts a category or a few numbers for each input Predicting human-supplied data 10→10,000 bits per sample Self-Supervised Learning (cake génoise) The machine predicts any part of its input for any observed part. • Updated version at (ISSCC 2019, where he Predicts future frames in videos replaced “ unsupervised learning ” with Millions of bits per sample “ self-supervised learning ”. 8

  9. Self-Supervised Learning in NLP 9

  10. Word Embeddings ations of words • Semantically meaningful ve vect ctor represe sentat Tiger Tiger Lion Lion Ex Example : Inner product (possibly scaled, i.e. cosine similarity) correlates with word si similarity. Table Table 10

  11. Word Embeddings ations of words • Semantically meaningful ve vect ctor represe sentat Semantically meaningful �T�e �e���ce �� g�ea�� fa�� "The service is great, a�d f��e�d���� fast and friendly!" Example : Can use embeddings to do se sentiment classification by training a si simple (e.g. linear) classifier 11

  12. Word Embeddings Semantically meaningful ations of words • Semantically meaningful ve vect ctor represe sentat Engli�h� �I��� �aining English: "It’s raining o���ide�� outside". Ex � Can ��ain a ��imple� ne Example: Can train a “simple” network for that if fed word embeddings for two tra languages, can effectively translate. Ge�man� �E� regnet German: "Es regnet draussen �� draussen". 12

  13. <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> Word Embeddings via Predictive Learning • Basic task: predict the next word, given a few previous ones. : predict the next word, given a few previous ones. Late: 0.9 Early: 0.05 I am running a little ???? Tired: 0.04 Table: 0.01 In other words, optimize for In other words, optimize for X max log p θ ( x t | x t − 1 , x t − 2 , . . . , x t − L ) max � log � � � � � ��� , � ��� , … , � ��� � � θ � t 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend