CSElab
http://www.cse-lab.ethz.ch
Harnessing Wake Vortices for Efficient Collective Swimming via Deep - - PowerPoint PPT Presentation
Harnessing Wake Vortices for Efficient Collective Swimming via Deep Reinforcement Learning Siddhartha Verma With: Guido Novati and Petros Koumoutsakos CSE lab http://www.cse-lab.ethz.ch Collective Swimming Hydrodynamic benefit of
CSElab
http://www.cse-lab.ethz.ch
Credit: Artbeats
Breder (1965), Weihs (1973,1975), Shaw (1978)
Svendsen (2003), Killen et al. (2011)
Hemelrijk et al. (2015), Daghooghi & Borazjani (2015), Maertens et al. (2017)
Breder (1965) Weihs (1973), Shaw (1978)
the environment
Prior Work @CSE Lab: “Vanilla” Reinforcement learning - Goal: Follow the Leader (Novati et al., Bioinspir. Biomim. 2017)
and-error interaction with environment
Qπ(st, at) = E ⇥ rt+1 + γrt+2 + γ2rt+3 + . . . | ak = π(sk) ∀k > t ⇤ = E [rt+1 + γQπ(st+1, π(st+1)] Qπ(st, at) = Bellman (1957)
ACTION in a given STATE
visited states
Credit: https:// www.cs.utexas.edu/ ~eladlieb/RLRG.html
at each iteration:
Acting
at each iteration
Learning ∂ ∂w ⇣ r + γ max
a0 Q(s0, a0, w) − Q(s, a, w)
⌘2 w− ← w
Δx Δy θ
Turn and modulate velocity by controlling body deformation
η Speed CoT PDef Smart 1.32 1.11 0.64 0.71 Solo 1 1 1 1
First 10,000 transitions Last 10,000 transitions
11
W1 L1 S1
12
13
LR
14
0.4 0.6 0.8 1 17.5 18 18.5 19 19.5 η t
LR
Follower Leader
15
16
Swimming via Reinforcement Learning : An effective and robust method for harnessing energy from unsteady flow NEXT: Energy Efficient Swarms of Drones ?
Two fish swimming together in Greece Two fish swimming together in the Swiss supercomputer
Note: Reward allotted here has no connection to relative displacement
19
Rossinelli et al., J. Comput. Phys. (2015) Angot et al., Numerische Mathematik (1999)
∂ω ∂t + u · rω = ω · ru + νr2ω + λr ⇥ (χ (us u))
Diffusion Advection Penalization 0 in 2D
(Chorin 1968)
Rossinelli et al., SC'13 Proc. Int. Conf. High Perf. Comput., Denver, Colorado
R>0 R<0
Reward: vertical displacement
R∆y = 1 − |∆y| 0.5L
Rend = −1
Reward: efficiency
Rη = Pthrust Pthrust + max(Pdef, 0)
Thrust power
Deformation power
Rη = T |uCM| T |uCM| + max( R
∂Ω F(x) · udef(x) dx, 0)
feedback
Example: Maze solving State Agent’s position (A) Actions go U, D, L, R Reward
0 at terminal state
and-error interaction with environment
Qπ(st, at) = E ⇥ rt+1 + γrt+2 + γ2rt+3 + . . . | ak = π(sk) ∀k > t ⇤ = E [rt+1 + γQπ(st+1, π(st+1)] Qπ(st, at) = Bellman (1957)
updated in previously visited states
Credit: https:// www.cs.utexas.edu/ ~eladlieb/RLRG.html
LSTM Layer 2 LSTM Layer 1 LSTM Layer 3
(a3)
qn
(a1)
qn
(a2)
qn
(a4)
qn
(a5)
qn
Traveling wave Traveling spline c
c
Decrease Curvature
c c+¼ c+½ c+¾ c+1
Increase Curvature
c c+¼ c+½ c+¾ c+1
Reducing local curvature Increasing local curvature Chain of actions Effect of action depends on when action is made
26
Rossinelli et al., JCP (2015)
27