CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut - PowerPoint PPT Presentation

photo by unsplash user @tuvaloland CMP784 DEEP LEARNING Lecture #12 – Self-Supervised Learning Aykut Erdem // Hacettepe University // Spring 2020

latent by Tom White Previously on CMP784 • Motivation for Variational Autoencoders (VAEs) • Mechanics of VAEs • Separatibility of VAEs • Training of VAEs • Evaluating representations • Vector Quantized Variational Autoencoders (VQ-VAEs) 2

Lecture Overview • Predictive / Self-supervised learning • Self-supervised learning in NLP • Self-supervised learning in vision sclaimer: Much of the material and slides for this lecture were borrowed from Discl — Andrej Risteski's CMU 10707 class — Jimmy Ba's UToronto CSC413/2516 class 3

Unsupervised Learning • Learning from data without labels. • What can we hope to do: – Task A: Fit a parametrized structure (e.g. clustering, low-dimensional subspace, manifold) to data to reveal something meaningful about data (Structure learning) – Task B: Learn a (parametrized) distribution close to data generating distribution. (Distribution learning) – Task C: Learn a (parametrized) distribution that implicitly reveals an “embedding”/“representation” of data for downstream tasks. (Representation/feature learning) • Entangled! The “structure” and “distribution” often reveals an embedding. 4

Self-Supervised/Predictive Learning • Given unlabeled data, design supervised tasks that induce a good representation for downstream tasks. • No good mathematical formalization, but the intuition is to “force” the predictor used in the task to learn something “semantically meaningful” about the data. 5

Self-Supervised/Predictive Learning ► Predict any part of the input from any other part. ► Predict the future from the past . ► Predict the future from the recent past . ► Predict the past from the present . ► Predict the top from the bottom . ► Predict the occluded from the visible ► Pretend there is a part of the input you don’t know and predict that. Slide by Yann LeCun 6

How Much Information Does the Machine Need to Predict? Y LeCun “Pure” Reinforcement Learning (cherry) The machine predicts a scalar reward given once in a while. A few bits for some samples Supervised Learning (icing) The machine predicts a category or a few numbers for each input Predicting human-supplied data 10 10,000 bits per sample → Unsupervised/Predictive Learning (cake) The machine predicts any part of its input for any observed part. • LeCun’s original cake Predicts future frames in videos analogy slide, presented Millions of bits per sample at his keynote speech in (Yes, I know, this picture is slightly offensive to RL folks. But I’ll make it up) NIPS 2016. 7

Y. LeCun How Much Information is the Machine Given during Learning? “Pure” Reinforcement Learning (cherry) The machine predicts a scalar reward given once in a while. A few bits for some samples Supervised Learning (icing) The machine predicts a category or a few numbers for each input Predicting human-supplied data 10→10,000 bits per sample Self-Supervised Learning (cake génoise) The machine predicts any part of its input for any observed part. • Updated version at (ISSCC 2019, where he Predicts future frames in videos replaced “ unsupervised learning ” with Millions of bits per sample “ self-supervised learning ”. 8

Self-Supervised Learning in NLP 9

Word Embeddings ations of words • Semantically meaningful ve vect ctor represe sentat Tiger Tiger Lion Lion Ex Example : Inner product (possibly scaled, i.e. cosine similarity) correlates with word si similarity. Table Table 10

Word Embeddings ations of words • Semantically meaningful ve vect ctor represe sentat Semantically meaningful �T�e �e��ce �� g�ea�� fa�� "The service is great, a�d f��e�d�� fast and friendly!" Example : Can use embeddings to do se sentiment classification by training a si simple (e.g. linear) classifier 11

Word Embeddings Semantically meaningful ations of words • Semantically meaningful ve vect ctor represe sentat Engli�h� �I�� aining English: "It’s raining o��ide�� outside". Ex � Can ��ain a ��imple� ne Example: Can train a “simple” network for that if fed word embeddings for two tra languages, can effectively translate. Ge�man� �E� regnet German: "Es regnet draussen �� draussen". 12

<latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> <latexit sha1_base64="ScNKP790di/h+G0e2S4ONFwg=">ACRnicbZBPa9tAEMVHTtu47j8nOfay1BRSaIMUCskxJcenChTgKWEKv1yF6yqxW7oxCj6NPlknNu+Qi9JBSeu3aFqVNOrDsj/dmN2XlUo6CsPboLP26PGT9e7T3rPnL16+6m9sHjtTWYEjYZSxpxl3qGSBI5Kk8LS0yHWm8CQ7O1r4J+donTFV5qXmGg+LWQuBScvpf0k1vyCpXVMyTesNhVOq3JgzJTVv4xYoU5bV8srUu2uD9EzfsV7HqI1cSQa4XPTWzldEbv0v4g3AmXxR5C1MIA2hqm/Zt4YkSlsSChuHPjKCwpqbklKRQ2vbhyWHJxqc49lhwjS6plzE07K1XJiw31p+C2FL9e6Lm2rm5znyn5jRz972F+D9vXFG+n9SyKCvCQqwW5ZViZNgiUzaRFgWpuQcurPRvZWLGLRfk+/5EKL7X34Ix7s7kecvHwcHh20cXgNb2AbItiDA/gEQxiBgCv4BnfwI7gOvgc/g1+r1k7QzmzBP9WB36QmscI=</latexit> Word Embeddings via Predictive Learning • Basic task: predict the next word, given a few previous ones. : predict the next word, given a few previous ones. Late: 0.9 Early: 0.05 I am running a little ???? Tired: 0.04 Table: 0.01 In other words, optimize for In other words, optimize for X max log p θ ( x t | x t − 1 , x t − 2 , . . . , x t − L ) max � log � � � � � �� , � �� , … , � �� θ � t 13

CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut - PowerPoint PPT Presentation

photo by unsplash user @tuvaloland CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut Erdem // Hacettepe University // Spring 2020 latent by Tom White Previously on CMP784 Motivation for Variational Autoencoders (VAEs)

CMP784 DEEP LEARNING Lecture #12 Deep Reinforcement Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

Source side Dependency Tree Reordering Models with Subtree Movements and Constraints Nguyen

Dow ntow n Transportation Plan 1 D O W N T O W N T O L E D O T R A N S P O R T A T I O N P L A

Whats so Hard about Natural Language Understanding? Alan Ritter Computer Science and

1 Particle Systems - History Particle Systems 1982 Star Trek II: The Wrath of Khan Certain

2019-2020 About Mrs. Cherry Graduated from Indiana University Masters in Curriculum and

Patent Law Prof. Roger Ford Class 9 September 27, 2017 Novelty and statutory bars:

Unavoidable trees in tournaments Richard Mycroft Tssio Naia 20 April 2016 1 Tournaments

Air Force Research Laboratory TID Effe TID Effects on on the the WSM WSMR Ionograms &

Second Wednesdays | 1:00 2:15 pm ET www.fs.fed.us/research/urban-webinars USDA is an equal

Low Energy Neutrino Scattering: Supernovae Neutrino Energies J. Carlson Introduction Why is

Village in Holland ... With no roads AT ALL! 1 2 A Village in Holland wherein you can't find

Climate change the science & the lies Tony Eggleton Outline Some basic climate stuff

Healing a Broken Project Aimee Degnan, PMP, SCPM, CSM, CSPO, CPACC CEO Hook 42 / Principal

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

Object Class Recognition Readings: Yi Lis 2 Papers Abstract Regions Paper 1: EM as a

EXCESS Zhen Liu (Fermilab) Talk based on my recent work with B. Dobrescu, arXiv:1506.06736

America Makes Project Call Kickoff Webinar Maturation of Advanced Manufacturing for Low Cost

Interpretability in NLP: Moving Beyond Vision Shuoyang Ding Microsoft Translator Talk Series

CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut - PowerPoint PPT Presentation

photo by unsplash user @tuvaloland CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut Erdem // Hacettepe University // Spring 2020 latent by Tom White Previously on CMP784 Motivation for Variational Autoencoders (VAEs)

CMP784 DEEP LEARNING Lecture #12 Deep Reinforcement Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

Source side Dependency Tree Reordering Models with Subtree Movements and Constraints Nguyen

Dow ntow n Transportation Plan 1 D O W N T O W N T O L E D O T R A N S P O R T A T I O N P L A

Whats so Hard about Natural Language Understanding? Alan Ritter Computer Science and

1 Particle Systems - History Particle Systems 1982 Star Trek II: The Wrath of Khan Certain

2019-2020 About Mrs. Cherry Graduated from Indiana University Masters in Curriculum and

Patent Law Prof. Roger Ford Class 9 September 27, 2017 Novelty and statutory bars:

Unavoidable trees in tournaments Richard Mycroft Tssio Naia 20 April 2016 1 Tournaments

Air Force Research Laboratory TID Effe TID Effects on on the the WSM WSMR Ionograms &amp;

Second Wednesdays | 1:00 2:15 pm ET www.fs.fed.us/research/urban-webinars USDA is an equal

Low Energy Neutrino Scattering: Supernovae Neutrino Energies J. Carlson Introduction Why is

Village in Holland ... With no roads AT ALL! 1 2 A Village in Holland wherein you can't find

Climate change the science &amp; the lies Tony Eggleton Outline Some basic climate stuff

Healing a Broken Project Aimee Degnan, PMP, SCPM, CSM, CSPO, CPACC CEO Hook 42 / Principal

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

Object Class Recognition Readings: Yi Lis 2 Papers Abstract Regions Paper 1: EM as a

EXCESS Zhen Liu (Fermilab) Talk based on my recent work with B. Dobrescu, arXiv:1506.06736

America Makes Project Call Kickoff Webinar Maturation of Advanced Manufacturing for Low Cost

Interpretability in NLP: Moving Beyond Vision Shuoyang Ding Microsoft Translator Talk Series

Air Force Research Laboratory TID Effe TID Effects on on the the WSM WSMR Ionograms &

Climate change the science & the lies Tony Eggleton Outline Some basic climate stuff