Current State of Unsupervised Deep Learning William Falcon, PhD - - PowerPoint PPT Presentation

current state of unsupervised deep learning
SMART_READER_LITE
LIVE PREVIEW

Current State of Unsupervised Deep Learning William Falcon, PhD - - PowerPoint PPT Presentation

Current State of Unsupervised Deep Learning William Falcon, PhD Student AGENDA AGENDA Unsupervised vs self-supervised vs supervised learning Why we don't like supervised learning Cost of supervised learning Theoretical approaches to


slide-1
SLIDE 1

Current State of Unsupervised Deep Learning

William Falcon, PhD Student

slide-2
SLIDE 2

AGENDA

slide-3
SLIDE 3

AGENDA Unsupervised vs self-supervised vs supervised learning Why we don't like supervised learning Cost of supervised learning Theoretical approaches to unsupervised learning Current State-of-the-art Closing thoughts

slide-4
SLIDE 4

Unsupervised vs Supervised vs Self-supervised learning

slide-5
SLIDE 5

Label this datapoint Cutest thing ever Dog Dancing dog Pet in living room Pet on floor Dog evolving

slide-6
SLIDE 6

Humans are biased

slide-7
SLIDE 7

Transfer Learning

slide-8
SLIDE 8

Transfer Learning Medical Imaging Self-driving cars Neuroscience

slide-9
SLIDE 9

Cost

slide-10
SLIDE 10

Supervised Learning Weakly supervised Learning

Accuracy Cost

Unsupervised Learning

slide-11
SLIDE 11

Supervised Learning Weakly supervised Learning

Accuracy Cost

Unsupervised Learning

slide-12
SLIDE 12

Unsupervised Learning vs self- supervised learning

slide-13
SLIDE 13

self-supervised learning

slide-14
SLIDE 14

Zhang, R., Isola, P. and Efros, A.A., 2016, October. Colorful image colorization. In European conference on computer vision (pp. 649-666). Springer, Cham

Colorful Image Colorization (Zhang et al 2016)

slide-15
SLIDE 15

Unsupervised Learning of Visual Representations by solving Jigsaw Puzzles (Mehdi et al, 2016)

Noroozi, M. and Favaro, P., 2016, October. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (pp. 69-84). Springer, Cham.

slide-16
SLIDE 16

Unsupervised Visual Representation Learning by Context Prediction (Doersch et al, 2016)

Doersch, C., Gupta, A. and Efros, A.A., 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1422-1430).

slide-17
SLIDE 17

Unsupervised Representation Learning By Predicting Image Rotations (Giradis et al, 2018)

Gidaris, S., Singh, P. and Komodakis, N., 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728.

slide-18
SLIDE 18

BERT: Pre-training of deep bidirectional transformers for language understanding (Devlin et al, 2018) Masked word prediction Next sentence prediction This is a [MASK] long sentence with missing [MASK] i love AI because it's crazy that it works

slide-19
SLIDE 19

Why is this bad?

slide-20
SLIDE 20

Humans don't likely learn like this

slide-21
SLIDE 21

(Credit: Yann LeCun)

slide-22
SLIDE 22

unsupervised learning

slide-23
SLIDE 23

Autoencoder

slide-24
SLIDE 24

Generative Adversarial Networks (Goodfellow et al. 2015)

slide-25
SLIDE 25

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline

slide-26
SLIDE 26

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline

slide-27
SLIDE 27

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline

slide-28
SLIDE 28

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline CNN

slide-29
SLIDE 29

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline f1 f2 f3 CNN

slide-30
SLIDE 30

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline f1 f2 f3 CNN

slide-31
SLIDE 31

f2 f3 f1

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

Data Augmentation Pipeline f1 f2 f3 CNN

slide-32
SLIDE 32

f2 f3 f1

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

f1 f2 f3

slide-33
SLIDE 33

f2 f3 f1

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

f1 f2 f3

slide-34
SLIDE 34

f2 f3 f1

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

f1 f2 f3

slide-35
SLIDE 35

f2 f3 f1

Learning Representations By Maximizing Mutual Information Across Views (Bachman et al, 2019)

f1 f2 f3

slide-36
SLIDE 36

Data-efficient Image Recognition with Contrastive Predictive Coding (Hennaff, 2019)

slide-37
SLIDE 37

A General Framework For Self-Supervised Image Representation Learning and PatchedDIM (Falcon, Cho, 2019)

slide-38
SLIDE 38

Scaling

slide-39
SLIDE 39

39

slide-40
SLIDE 40
slide-41
SLIDE 41

41

Addressing Reproducibility Crisis

slide-42
SLIDE 42

42

LightningModule

class CoolSystem(pl.LightningModule): def __init__(self): super(CoolSystem, self).__init__() self.l1 = torch.nn.Linear(28 * 28, 10) def forward(self, x): return torch.relu(self.l1(x.view(x.size(0), -1))) def training_step(self, batch, batch_nb): x, y = batch y_hat = self.forward(x) loss = F.cross_entropy(y_hat, y) tensorboard_logs = {'train_loss': loss} return {'loss': loss, 'log': tensorboard_logs} def validation_step(self, batch, batch_nb): x, y = batch y_hat = self.forward(x) return {'val_loss': F.cross_entropy(y_hat, y)} def validation_end(self, outputs): avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean() tensorboard_logs = {'val_loss': avg_loss} return {'avg_val_loss': avg_loss, 'log': tensorboard_logs} def configure_optimizers(self): return torch.optim.Adam(self.parameters(), lr=0.02) @pl.data_loader def train_dataloader(self): return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32) @pl.data_loader def val_dataloader(self): return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32) @pl.data_loader def test_dataloader(self): return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)

slide-43
SLIDE 43

43

LightningModule

model = CoolSystem() trainer = Trainer() trainer.fit(model)

Automatic Tensorboard Automatic checkpointing Automatic early-stopping Automatic training loop Automatic validation loop

slide-44
SLIDE 44

In summary

slide-45
SLIDE 45

45

Unsupervised is state-of-the-art in NLP (BERT, GPT-2) Computer vision is lagging behind (transfer learning is ok but not great) Unsupervised Learning will unlock new ways of using data We need to move away from images and clever tasks Self-supervised gains come from data processing NOT learning

slide-46
SLIDE 46

46

Thank you @_willfalcon