Motions Using Reinforcement Learning Libin Liu http://libliu.info - PowerPoint PPT Presentation

Learning to Control Complex Human Motions Using Reinforcement Learning Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com 1

Physics-based Character Animation Motion Control Physics Character Controller Signal Engine Animation [Gang Beasts] [Totally Accurate Battle Simulator]

Designing Controllers for Locomotion Hand-crafted control policy [Hodgins et al. 1995] Simulating abstract model SIMBICON [Yin et al. 2007] SIMBICON, IPM, ZMP … Optimization/policy search [Coros et al. 2010] [Tan et al. 2014] Reinforcement learning Actor-critic [Mordatch et al. 2010] [Peng et al. 2017]

Designing Controllers for Complex Motions 4

Designing controllers for complex motions Tracking Motion Clip Controller 5

Tracking Control for Complex Human Motion Feedback Policy Open-loop Motion Clip Tracking Control Feedback Control Policy Scheduler

Reinforcement Learning Feedback Guided Policy Learning Policy Control Deep Q-Learning Feedback Scheduler Policy

Outline Construct open-loop control SAMCON (Sample-based Motion Control) Guided learning of linear feedback policies Learning to schedule control fragment using deep Q-learning 8

Tracking Control • PD servo 𝜐 = 𝑙 𝑞 ෨ 𝜄 − 𝜄 − 𝑙 𝑒 ሶ 𝜄 9

Mocap Clips as Tracking Target 10

Correction with Sampling [ ] 𝜀𝑢 11

SAMCON • SA mpling-based M otion CON trol [Liu et al. 2010, 2015] • Motion Clip  Open-loop control trajectory Sample Sample Sample Start End … Start 1 Start 2 Start n Particle filtering / Sequential Monte Carlo 12

SAMCON 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 13

Sampling & Simulation 𝑏 Actions (PD-control Targets) 𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 14

Resampling 𝑏 Actions (PD-control Targets) 𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 15

SAMCON Iterations 𝑏 Actions (PD-control Targets) 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 16

SAMCON Iterations 𝑏 Actions (PD-control Targets) 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 17

Constructed Open-loop Control Trajectory 𝑏 Actions (PD-control Targets) 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 𝜀𝑢 State Reference Trajectory time 18

Control Reconstruction 19

Linear Policy 𝜌: 𝜀𝑏 = 𝑁 𝜀𝑡 + ො 𝑏 𝜀𝑏 = 𝑏 − ෤ 𝑏 Control Trajectory 𝑡 − ǁ 𝑡 = 𝜀𝑡 Simulation 20

For complex motions Uniform Segmentation Control Fragments Linear Feedback Policy 21

Control Fragment • A short control unit: 𝑛 ෝ • 𝜀𝑢 ≈ 0.1 seconds long 𝜌 • Open-loop control segment ෝ 𝑛 𝒟 ∶ 𝜀𝑢, ෝ 𝑛, 𝜌 𝜀𝑢 • Linear Feedback policy 𝜌 22

Controller • A chain of control fragments ⋯ 𝒟 1 𝒟 2 𝒟 𝐿 23

Guided Learning of Control Policies Regression Feedback Policy Multiple Open-loop Solutions 24

Guided Learning of Control Policies Guided Learning Feedback Policy Multiple Open-loop Solutions 25

Example: Cyclical Motion 𝒟 2 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , … 𝒟 3 𝒟 1 𝒟 4 𝒟 𝑙 : 𝑛 𝑙 , 𝜀𝑢, 𝜌 𝑙 ෝ SAMCON 𝑏 𝑢 28

Example: Cyclical Motion 𝑏 1 𝑏 2 𝜌 2 𝜌 1 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , 𝓓 𝟐 , 𝓓 𝟑 , 𝓓 𝟒 , 𝓓 𝟓 , … 𝑡 2 𝑡 1 𝑏 3 𝑏 4 𝜌 𝟓 𝜌 𝟒 SAMCON 𝑏 𝑡 4 𝑡 3 𝑢 29

Policy Update 𝑏 𝑡 30 30

Policy Update Regression 𝑏 𝑡 31 31

Guided Learning Iterations Guided SAMCON Regression 𝑏 𝑏 𝑡 𝑡 32 32

Guided Learning Iterations Guided SAMCON Regression Regression 𝑏 𝑏 𝑡 𝑡 33 33

Control Graph • A graph whose nodes are control fragments Control Graph 35

Control Graph • A graph whose nodes are control fragments • Converted from a motion graph Motion Graph Control Graph 36

Problem of Fixed Time-Indexed Tracking Reference Basin of attraction Simulation

Scheduling Reference Basin of attraction Simulation

Scheduling ?

Deep Q-Learning Learn to perform good actions Raw image input Deep convolutional network [Mnih et al. 2015, DQN]

A Q-Network For Scheduling 300 ReLus 300 ReLus … … … … 𝑔 𝑔 max 0, 𝑨 max 0, 𝑨 … … … … … … Q-values state Fully Connected

A Q-Network For Scheduling 300 ReLus 300 ReLus … … … … Input: motion state 𝑔 𝑔 environmental state user command max 0, 𝑨 max 0, 𝑨 … … DoFs: 18 ~ 25 … … … … Q-values state Fully Connected

A Q-Network For Scheduling 300 ReLus 300 ReLus … … … … Action Set: 𝑔 𝑔 Control Fragments max 0, 𝑨 max 0, 𝑨 # of actions: 39 ~ 146 … … … … … … state Fully Connected

A Q-Network For Scheduling 300 ReLus 300 ReLus Q-Values … … … … 𝑔 𝑔 actions: max 0, 𝑨 max 0, 𝑨 … … … … … … Fully Connected

Training Pipeline: Exploration / Exploitation Simulation Reward Replay Buffer Batch SGD

Reward Function 𝑆 = 𝐹 tracking + 𝐹 preference + 𝐹 feedback + 𝐹 task + 𝑆 0

Importance of the Reference Sequence original sequence is enforced original sequence is not enforced

Tracking penalty term In-sequence action Out-of-sequence action Penalty

Tracking exploration strategy with probability 𝜁 𝑠 select a random action with probability 𝜁 𝑝 select an in-sequence action

Bongo Board Balancing Action Sequence

Effect of Feedback Policy Open-loop Control Fragments Feedback-augmented Fragments

Discover New Transitions

Running

Tripping

Skateboarding

Walking On A Ball

Push-Recovery

Conclusion Feedback Policy Open-loop Motion Clip Tracking Control Feedback Policy Control Scheduler Libin Liu and Jessica Hodgins. 2017. Learning to Libin Liu, Michiel Van De Panne, and Kangkang Yin. Schedule Control Fragments for Physics-Based 2016. Guided Learning of Control Graphs for Physics- Characters Using Deep Q-Learning. ACM Trans. Graph. Based Characters. ACM Trans. Graph. 35, 3, Article 29 (May 2016), 14 pages. 36, 3, Article 29 (June 2017), 14 pages.

Future Work Statistical/generative model [Holden et al. 2017] Control with raw simulation state and terrain information Active human-object interaction [Peng et al. 2017, DeepLoco] basketball, soccer dancing, boxing, martial arts [Heess et al. 2017] 62

Questions? Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com

Motions Using Reinforcement Learning Libin Liu http://libliu.info - PowerPoint PPT Presentation

Learning to Control Complex Human Motions Using Reinforcement Learning Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com 1 Physics-based Character Animation Motion Control Physics Character Controller Signal Engine

TRUSTEES MOTIONS TO DISMISS MOTIONS TO DISMISS ARE FILED FOR ARREARS, AND FAILURE TO PROVIDE

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Subordinate Brownian Motions and Their Applications Renming Song University of Illinois NIST,

Shadows Shadows What for? Shadows tell us about the relative locations and motions of objects

Anisotropy of gas Anisotropy of gas random motions random motions in galactic disks in

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine &

Sensor Actuator Network Michiel van de Panne Eugene Fiume SIGGRAPH 93 stochastic synthesis of

Faster Motion Planning Using Learned Local Viability Models Maciej Kalisiak 1 Michiel van de Panne

! I u, :' P.,ii.' t.. ^ rr - ,q U.;grq 4y +t #-x / n-r i5 ur ctfultt h ?.vu,n|.lro-

Controlling Reactive, Motion Capture-driven Simulated Characters Victor B. Zordan Victor B.

Applications of the Stochastic Gradient Method December 11, 2019 P. Carpentier Master

Walking motion Control: theory and implementation Pierre-Brice Wieber INRIA Grenoble Who am I

RENATER / TF-NOC Flash presentation February 15 th -16 th 2012 frederic.loui@renater.fr Agenda

Motions Using Reinforcement Learning Libin Liu http://libliu.info - PowerPoint PPT Presentation

Learning to Control Complex Human Motions Using Reinforcement Learning Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com 1 Physics-based Character Animation Motion Control Physics Character Controller Signal Engine

TRUSTEES MOTIONS TO DISMISS MOTIONS TO DISMISS ARE FILED FOR ARREARS, AND FAILURE TO PROVIDE

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Subordinate Brownian Motions and Their Applications Renming Song University of Illinois NIST,

Shadows Shadows What for? Shadows tell us about the relative locations and motions of objects

Anisotropy of gas Anisotropy of gas random motions random motions in galactic disks in

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Le Store WebContent Benjamin NGUYEN UVSQ &amp; INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine &amp;

Sensor Actuator Network Michiel van de Panne Eugene Fiume SIGGRAPH 93 stochastic synthesis of

Faster Motion Planning Using Learned Local Viability Models Maciej Kalisiak 1 Michiel van de Panne

! I u, :' P.,ii.' t.. ^ rr - ,q U.;grq 4y +t #-x / n-r i5 ur ctfultt h ?.vu,n|.lro-

Controlling Reactive, Motion Capture-driven Simulated Characters Victor B. Zordan Victor B.

Applications of the Stochastic Gradient Method December 11, 2019 P. Carpentier Master

Walking motion Control: theory and implementation Pierre-Brice Wieber INRIA Grenoble Who am I

RENATER / TF-NOC Flash presentation February 15 th -16 th 2012 frederic.loui@renater.fr Agenda

Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine &