Learning from Unlabeled Video Carl Vondrick Columbia University - - PowerPoint PPT Presentation

learning from unlabeled video
SMART_READER_LITE
LIVE PREVIEW

Learning from Unlabeled Video Carl Vondrick Columbia University - - PowerPoint PPT Presentation

Learning from Unlabeled Video Carl Vondrick Columbia University Survivor Bias of Video Data Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014 Survivor Bias of Video Data Large-scale Video Classification with


slide-1
SLIDE 1

Learning from Unlabeled Video

Carl Vondrick Columbia University

slide-2
SLIDE 2

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

Survivor Bias of Video Data

slide-3
SLIDE 3

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

Survivor Bias of Video Data

slide-4
SLIDE 4

Survivor Bias of Video Data

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

slide-5
SLIDE 5

Felix Warneken, Max Plank Institute

slide-6
SLIDE 6

The Oops! dataset

slide-7
SLIDE 7

Oops! Predicting Unintentional Action CVPR 2020

  • ops.cs.columbia.edu

Epstein, Chen, Vondrick. CVPR 2020.

slide-8
SLIDE 8

Oops! Predicting Unintentional Action CVPR 2020

  • ops.cs.columbia.edu

Epstein, Chen, Vondrick. CVPR 2020.

slide-9
SLIDE 9

Learning from unlabeled video

slide-10
SLIDE 10

Example Videos

slide-11
SLIDE 11

Perceptual Clues

1) Predictability

Ranzato 2014, Han 2019, …

2) Temporal Order

Misra 2016, Wei 2018, …

slide-12
SLIDE 12

3) Video speed as self-supervised clue

Epstein, Chen, Vondrick. CVPR 2020.

slide-13
SLIDE 13

Speed of Action Alters Perceptual Judgement

slide-14
SLIDE 14

3) Video speed as self-supervised clue

Epstein, Chen, Vondrick. CVPR 2020.

slide-15
SLIDE 15

Visualizing Features

Epstein, Chen, Vondrick. CVPR 2020.

slide-16
SLIDE 16

Fit linear model to classify intentionality

+ ++

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

What’s missing?

Environmental Unexpected Multi-agent Limited Skill Planning Error Single-agent Execution Error Limited Visibility Limited Knowledge Error (lower is better) 5 10 15 20 25 Ours (self-supervised) Kinetics (supervised) Human

slide-21
SLIDE 21
  • ops.cs.columbia.edu

Epstein, Chen, Vondrick. CVPR 2020.

Poster 93 Tuesday 10am PST

slide-22
SLIDE 22

Natural Synchronization

Vision Speech

slide-23
SLIDE 23

Ackee seems to be:

  • edible
  • white/yellow
  • washable
  • sticky
  • larger than cherry

tomato

“I’m going to go in with the actual ackee I rinsed off earlier”

slide-24
SLIDE 24
slide-25
SLIDE 25

Word Learning from Vision

“I turn on the fire and then I [???] the pasta” Transformer stack

“stir”

Learn what “stir” means Learn how to learn what “stir” means

  • VisualBERT, VILBERT, VideoBERT, LXMERT, …
slide-26
SLIDE 26

Learning to Learn Words

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-27
SLIDE 27

Transformers as Meta-Learners

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-28
SLIDE 28

Implement with cross entropy loss Suris, Epstein, Ji, Chang, Vondrick. arXiv.

Transformers as Meta-Learners

slide-29
SLIDE 29

Meta-Learning Episodes

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

New Words Episode Composition Episode

slide-30
SLIDE 30

Mode 1: Language Modeling Mode 2: Word Acquisition

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-31
SLIDE 31

Language Modeling

Accuracy 15 30 45 60 75 BERT pretrained BERT + vision Meta-Learned

Seen Composition New Composition

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

19% drop 18% drop New Seen New Seen

slide-32
SLIDE 32

Language Modeling

Accuracy 15 30 45 60 75 BERT pretrained BERT + vision Meta-Learned

Seen Composition New Composition

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

19% drop 18% drop 11% drop New New New Seen Seen Seen

slide-33
SLIDE 33

Word Acquisition

Training Set Test Example still taking skin off fish with a knife stir rice into pan get avocado avocado new word Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-34
SLIDE 34

switch off oven on the bottom right wash plates with rag

  • pen the

cupboard close oven

  • ven

new word Training Set Test Example Suris, Epstein, Ji, Chang, Vondrick. arXiv.

Word Acquisition

slide-35
SLIDE 35

Novel word acquisition

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-36
SLIDE 36

Visualizing Learned Process

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-37
SLIDE 37

put spoon rinse container chop sun-dried tomatoes put spoon container spoon tomatoes …

close food container … cut cherry tomatoes

Visualizing Attention

Green boxes impact green prediction the most

Training Set Test

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-38
SLIDE 38

expert.cs.columbia.edu

Suris, Epstein, Ji, Chang, Vondrick. arXiv.

slide-39
SLIDE 39

Learning from Unlabeled Video

Carl Vondrick Columbia University