learning novel policies for tasks
play

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk - PowerPoint PPT Presentation

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk Motivation Want more than one solution (i.e. novel solutions) to a problem. E.g. Different Locomotion styles for legged robots. Style 1 Style 2 Style 3 Key Aspects


  1. Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk

  2. Motivation • Want more than one solution (i.e. novel solutions) to a problem. • E.g. Different Locomotion styles for legged robots. Style 1 Style 2 Style 3

  3. Key Aspects • Novelty measurement function • Measures the novelty of a trajectory compared with trajectories from other policies • Policy Gradient Update • Make sure final gradient compromises between task and novelty • Task-Novelty Bisector (TNB)

  4. Method Overview • Define a separate novelty reward function apart from task reward. • Train a policy using Task-Novelty Bisector (TNB) to balance the optimization of task and novelty. • Update novelty measurement function. • Repeat

  5. Novelty Measurement • Use autoencoder reconstruction error of state sequences to compute novelty. • One autoencoder for each policy. • For the set of autoencoders 𝑬 = {𝐸 % , … , 𝐸 ( } , the novelty reward function is: 𝐸 < 𝒕 − 𝒕 > ) 9∈𝑬 ‖ ‖ 𝑠 +,-./ = −exp (−𝑥 +,-./ min

  6. Task-Novelty Bisector (TNB) • Compute policy gradients for task reward and novelty reward 𝑕 ABCD = 𝜖𝐾 ABCD 𝑕 +,-./ = 𝜖𝐾 +,-./ 𝜖𝜄 𝜖𝜄 • Compute the final policy gradient using the following rules: or

  7. Multiple Solutions PPO Policy End-Effector Target

  8. Multiple Solutions TNB Policies

  9. Deceptive Reward Problems • Our methods could be further extended to solve tasks with deceptive reward signals. • E.g. Deceptive Reacher Target End-Effector

  10. Deceptive Reward Problems TNB Policies

  11. Thank You! Poster: Pacific Ballroom #37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend