Curiosity-Bottleneck: Exploration by Distilling Task-Specific - - PowerPoint PPT Presentation

curiosity bottleneck exploration by distilling task
SMART_READER_LITE
LIVE PREVIEW

Curiosity-Bottleneck: Exploration by Distilling Task-Specific - - PowerPoint PPT Presentation

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty Youngjin Kim 1 4 , Wontae Nam 3 , Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1 2 1 3 4 Code available at: http://vision.snu.ac.kr/projects/cb Motivation: Exploration under


slide-1
SLIDE 1

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

Youngjin Kim 1 4, Wontae Nam 3, Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1

Code available at: http://vision.snu.ac.kr/projects/cb

3 4 1 2

slide-2
SLIDE 2

Motivation: Exploration under Distraction

  • 1. Distractive Environments are Widespread

§ Real-world observations often contain novel but task-irrelevant information.

(a) Known Place (b) Known Place and Strangers Navigating City

slide-3
SLIDE 3

Motivation: Exploration under Distraction

  • 2. Degeneration of Prior Novelty-Based Exploration Strategies

§ Due to task-agnostic intrinsic reward § Need mechanisms to prioritize task-related novelty

Not Novel Novel

(a) Known Place (b) Known Place and Strangers Navigating City

slide-4
SLIDE 4

Approach: Curiosity-Bottleneck

Quantify the ‘Degree of Compression’ using a compressive value network

!"

#

E

Compressor

$

" %

E

$

" &

'"

Value Predictor Intrinsic Reward External Reward

Environment Policy Environment

slide-5
SLIDE 5

Approach: Curiosity-Bottleneck

!"

#

E

Compressor

$

" %

E

$

" &

'"

Value Predictor Intrinsic Reward External Reward

Compressor

§ Encode rare ! to a lengthy code and common ! to a shorter code § Discard information about ! during compression

Environment Policy Environment

slide-6
SLIDE 6

Approach: Curiosity-Bottleneck

!"

#

E

Compressor

$

" %

E

$

" &

'"

Value Predictor Intrinsic Reward External Reward

Value Predictor

§ Prevent Compressor from discarding task-related information

Environment Policy Environment

slide-7
SLIDE 7

Approach: Curiosity-Bottleneck

  • 1. Objective Function

§ Minimize average code-length of representation ! § Discard information about observation " #$% &(!; )) #+, -(!) − - ! " § Preserve information related to value estimate )

/ = −& !; ) + 2& "; ! 34(%) = 5

6

7 8 % log 7 %, 8 7 % 7(8) =8

  • 2. Intrinsic Reward: Per-instance Mutual Information
slide-8
SLIDE 8

Approach: Curiosity-Bottleneck

  • 3. Approximation

Variational Information Bottleneck with Gaussian assumptions

!",$ = &',([− log .$ / 0 + 23![4" 5 6 | . 5 ] 9:(6) = 3![4" 5 6 ||. 5 ] 0= ∼ 4"(5|6=) 6=

Compressor

?", @"

3![4"(5|6=)||.(5)]

Value Predictor

?$, @$

9

= :

−log.$(/=|0=) !",$

+

slide-9
SLIDE 9

Experiments: Static Environment

Detects novelty !"( ) while being robust to distraction !#( )

(b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

0.1 0.9 0.1 0.9 !" !#

Random Box Object Pixel Noise

(a) Input

0.1 0.9 !" 0.1 0.9 !" 0.1 0.9 !# 0.1 0.9 !# 0.1 0.9 !# 0.1 0.9 !#

slide-10
SLIDE 10

Experiments: Static Environment

Detects novelty !"( ) while being robust to distraction !#( )

(b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

0.1 0.9 0.1 0.9 !" !#

Random Box Object Pixel Noise

(a) Input

0.1 0.9 !" 0.1 0.9 !" 0.1 0.9 !# 0.1 0.9 !# 0.1 0.9 !# 0.1 0.9 !#

slide-11
SLIDE 11

Experiments: Treasure-Hunt

(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics (e) RND (c) CB (g) SimHash

Compression loss term induces task-agnostic exploration in early stages !"[$% & ' ||) & ]

Grad-Cam Visualization

The adaptive exploration strategy

slide-12
SLIDE 12

Experiments: Treasure-Hunt

Grad-Cam Visualization

The adaptive exploration strategy

(a) Input (b) CB-Early (d) CB-noKL (f) Dynamics (e) RND (c) CB (g) SimHash

Value prediction loss term induces task-specific exploration after collecting external rewards − "#$ %& ' (

slide-13
SLIDE 13

Experiments: Treasure-Hunt

Consistently outperform baselines on different distraction settings

Mean Episodic Reward

(a) Movement Condition CB CB-noKL RND Dynamics SimHash (b) Location Condition

1e6 1e6

slide-14
SLIDE 14

Experiments: Atari Hard-Exploration Games

Gravitar Solaris With Distraction W.o. Distraction Montezuma CB CB-noKL RND Dynamics SimHash

slide-15
SLIDE 15

Thank You!

Poster @ Pacific Ballroom #48

Curiosity-Bottleneck:

Exploration by Distilling Task-Specific Novelty

Code Available at http://vision.snu.ac.kr/projects/cb