Playing hard exploration games by watching YouTube Yusuf Aytar, - - PowerPoint PPT Presentation

playing hard exploration games by watching youtube
SMART_READER_LITE
LIVE PREVIEW

Playing hard exploration games by watching YouTube Yusuf Aytar, - - PowerPoint PPT Presentation

Playing hard exploration games by watching YouTube Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas Learning by watching YouTube People learn many tasks by watching online videos Despite huge gaps in visual


slide-1
SLIDE 1

Playing hard exploration games by watching YouTube

Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

slide-2
SLIDE 2

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Learning by watching YouTube

People learn many tasks by watching online videos Despite huge gaps in visual appearance, sensing modalities, body differences, etc..

construction knitting playing games

slide-3
SLIDE 3

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Learning by watching YouTube

People learn many tasks by watching online videos Despite huge gaps in visual appearance, sensing modalities, body differences, etc..

knitting playing games construction

slide-4
SLIDE 4

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Challenges

Domain Gap No Actions No Rewards

slide-5
SLIDE 5

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Challenges

Domain Gap No Actions No Rewards Self-Supervised Domain Alignment Learn to Play with Imitation (RL) Rewards Learned from Expert Sequence

slide-6
SLIDE 6

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Temporal distance classification (TDC)

... ... ...

demonstration sequence

slide-7
SLIDE 7

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Temporal distance classification (TDC)

... ... ...

video

slide-8
SLIDE 8

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Temporal distance classification (TDC)

... ... ...

video Temporal Classifier Visual Embedding Network

slide-9
SLIDE 9

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Temporal distance classification (TDC)

visual embedding

... ... ...

video Temporal Classifier

slide-10
SLIDE 10

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Temporal distance classification (TDC)

visual embedding

... ... ...

temporal classifier

video

slide-11
SLIDE 11

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Cross-modal distance classification (CMC)

... ... ... ... ... ... ... ... ...

slide-12
SLIDE 12

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Model successfully aligns different videos

slide-13
SLIDE 13

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

What does the embedding focus on?

Cross-modal Visual only

slide-14
SLIDE 14

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Imitation through RL

embedding space

demonstration

slide-15
SLIDE 15

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Imitation through RL

embedding space

  • bservation
slide-16
SLIDE 16

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

RL makes imitation more robust

slide-17
SLIDE 17

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Results

Montezuma Pitfall Private Eye

Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50

  • Avg. Human

4,743 6,464 69,571 DQfD (2018) 29,384 3,997 100,747 Ours 58,175 74,323 98,763 Averaged score of best policy

slide-18
SLIDE 18

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Results

Montezuma Pitfall Private Eye

Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50

  • Avg. Human

4,743 6,464 69,571 DQfD (2018) 29,384 3,997 100,747 Ours 58,175 74,323 98,763 Averaged score of best policy

slide-19
SLIDE 19

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Results

Montezuma Pitfall Private Eye

Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50

  • Avg. Human

4,743 6,464 69,571 DQfD (2018) 29,384 3,997 100,747 Ours 58,175 74,323 98,763 Averaged score of best policy

slide-20
SLIDE 20

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Results

Montezuma Pitfall Private Eye

Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50

  • Avg. Human

4,743 6,464 69,571 DQfD (2018) 29,384 3,997 100,747 Ours 58,175 74,323 98,763 Averaged score of best policy

slide-21
SLIDE 21

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Results

Montezuma Pitfall Private Eye

Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50

  • Avg. Human

4,743 6,464 69,571 DQfD (2018) 29,384 3,997 100,747 Ours 58,175 74,323 98,763

max score level 3 max score

Averaged score of best policy

slide-22
SLIDE 22

Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Visit our poster !

Playing hard exploration games by watching Youtube

#142