curiosity bottleneck exploration by distilling task
play

Curiosity-Bottleneck: Exploration by Distilling Task-Specific - PowerPoint PPT Presentation

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty Youngjin Kim 1 4 , Wontae Nam 3 , Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1 2 1 3 4 Code available at: http://vision.snu.ac.kr/projects/cb Motivation: Exploration under


  1. Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty Youngjin Kim 1 4 , Wontae Nam 3 , Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1 2 1 3 4 Code available at: http://vision.snu.ac.kr/projects/cb

  2. Motivation: Exploration under Distraction (a) Known Place (b) Known Place and Strangers Navigating City 1. Distractive Environments are Widespread Real-world observations often contain § novel but task-irrelevant information.

  3. Motivation: Exploration under Distraction Not Novel Novel (a) Known Place (b) Known Place and Strangers Navigating City 2. Degeneration of Prior Novelty-Based Exploration Strategies Due to task-agnostic intrinsic reward § Need mechanisms to prioritize task-related novelty §

  4. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Compressor Value Predictor & # $ ! " ' " External Reward E E " Environment Policy Environment Quantify the ‘Degree of Compression’ using a compressive value network

  5. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Value Predictor Compressor & # $ ! " ' " External Reward E E " Environment Policy Environment Compressor Encode rare ! to a lengthy code and common ! to a shorter code § Discard information about ! during compression §

  6. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Value Predictor Compressor & # $ ! " ' " External Reward E E " Environment Policy Environment Value Predictor Prevent Compressor from discarding task-related information §

  7. Approach: Curiosity-Bottleneck 1. Objective Function Minimize average code-length of representation ! § Discard information about observation " § #+, -(!) − - ! " Preserve information related to value estimate ) § #$% &(!; )) / = −& !; ) + 2& "; ! 2. Intrinsic Reward: Per-instance Mutual Information 7 8 % log 7 %, 8 3 4 (%) = 5 7 % 7(8) =8 6

  8. Approach: Curiosity-Bottleneck : ! ",$ 9 = + −log. $ (/ = |0 = ) 3![4 " (5|6 = )||.(5)] Value Predictor Compressor 6 = 0 = ∼ 4 " (5|6 = ) ? $ , @ $ ? " , @ " 3. Approximation Variational Information Bottleneck with Gaussian assumptions ! ",$ = & ',( [− log . $ / 0 + 23![4 " 5 6 | . 5 ] 9 : (6) = 3![4 " 5 6 ||. 5 ]

  9. Experiments: Static Environment Detects novelty ! " ( ) while being robust to distraction ! # ( ) Random Box 0.1 ! " 0.9 0.1 Object ! " 0.9 0.1 Pixel Noise ! " 0.9 ! # ! # ! # ! # ! # 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 (a) Input (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

  10. Experiments: Static Environment Detects novelty ! " ( ) while being robust to distraction ! # ( ) Random Box 0.1 ! " 0.9 0.1 Object ! " 0.9 0.1 Pixel Noise ! " 0.9 ! # ! # ! # ! # ! # 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 (a) Input (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

  11. Experiments: Treasure-Hunt Grad-Cam Visualization The adaptive exploration strategy (a) Input (c) CB (e) RND (f) Dynamics (g) SimHash (b) CB-Early (d) CB-noKL !"[$ % & ' ||) & ] Compression loss term induces task-agnostic exploration in early stages

  12. Experiments: Treasure-Hunt Grad-Cam Visualization The adaptive exploration strategy (a) Input (c) CB (e) RND (f) Dynamics (g) SimHash (b) CB-Early (d) CB-noKL − "#$ % & ' ( Value prediction loss term induces task-specific exploration after collecting external rewards

  13. Experiments: Treasure-Hunt Consistently outperform baselines on different distraction settings SimHash Dynamics CB-noKL RND CB Mean Episodic Reward 1e6 1e6 (a) Movement Condition (b) Location Condition

  14. Experiments: Atari Hard-Exploration Games SimHash Dynamics CB-noKL RND CB With Distraction W.o. Distraction Gravitar Montezuma Solaris

  15. Curiosity-Bottleneck : Exploration by Distilling Task-Specific Novelty Thank You! Poster @ Pacific Ballroom #48 Code Available at http://vision.snu.ac.kr/projects/cb

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend