Consortium ANR Jeunes Chercheurs Jeunes Chercheuses Programme - - PowerPoint PPT Presentation

▶

Aug 25, 2022 253 likes •369 views

ANR ExTra-Learn Extraction and Transfer of Knowledge in Reinforcement Learning A. LAZARIC ANR Runion de lancement projets, Paris SequeL INRIA Lille Nord Europe November 4th, 2014 Consortium ANR Jeunes Chercheurs Jeunes

SLIDE 1

ANR ExTra-Learn

Extraction and Transfer of Knowledge in Reinforcement Learning

A. LAZARIC

ANR Réunion de lancement projets, Paris

SequeL

INRIA Lille – Nord Europe

November 4th, 2014

SLIDE 2

Consortium

November 4th, 2014

A. LAZARIC - ExTra-Learn
2

INRIA ¡Lille ¡– ¡Nord ¡Europe ¡ SequeL ¡Team ¡

PhD ¡ Student ¡ Post-‑doc ¡ (2yrs) ¡

A. ¡Lazaric ¡

(CR1) ¡

J. ¡Mary ¡

(MdC) ¡

M. ¡Valko ¡

(CR1) ¡

R. ¡Munos ¡

(DR1) ¡

ANR ¡“Jeunes ¡Chercheurs ¡Jeunes ¡Chercheuses” ¡Programme ¡

SLIDE 3

Reinforcement Learning

November 4th, 2014

A. LAZARIC - ExTra-Learn
3

Environment ¡

Critic

Agent ¡

acJon ¡

bservaJon ¡

reward ¡

SLIDE 4

Reinforcement Learning

November 4th, 2014

A. LAZARIC - ExTra-Learn
4

Environment ¡ CriJc ¡

Learning ¡ Agent ¡

acJon ¡

bservaJon ¡

reward ¡

Learning ¡of ¡a ¡behavior ¡strategy ¡(a ¡policy) ¡which ¡maximizes ¡the ¡ long ¡term ¡sum ¡of ¡rewards ¡(delayed ¡reward) ¡by ¡a ¡direct ¡interacJon ¡ (trial-‑and-‑error) ¡with ¡an ¡unknown ¡and ¡uncertain ¡environment. ¡

SLIDE 5

Reinforcement Learning

November 4th, 2014

A. LAZARIC - ExTra-Learn
5

Task ¡

Agent ¡

CriEc ¡

acEon ¡

bservaEon ¡

reward ¡

prior ¡ knowledge ¡ designer ¡

SLIDE 6

Transfer in Reinforcement Learning

November 4th, 2014

A. LAZARIC - ExTra-Learn
6

Task ¡n+1 ¡

Agent ¡

CriEc ¡

acEon ¡

bservaEon ¡

reward ¡

Transfer ¡ transferred ¡ knowledge ¡ past ¡ knowledge ¡

… ¡

Task ¡1 ¡ Task ¡n ¡

Transfer ¡of ¡knowledge ¡across ¡tasks ¡to ¡improve ¡ the ¡performance ¡of ¡the ¡learning ¡process ¡

SLIDE 7

Objectives

November 4th, 2014

A. LAZARIC - ExTra-Learn
7

ExTra-‑Learn ¡

(2014-‑2017) ¡

Reduce ¡sample ¡ complexity ¡ Improve ¡ accuracy ¡ Solve ¡problems ¡ with ¡complex ¡ structure ¡

Objec7ve ¡1 ¡ Objec7ve ¡2 ¡ Objec7ve ¡3 ¡

SLIDE 8

Tasks

November 4th, 2014

A. LAZARIC - ExTra-Learn
8

Reduce ¡sample ¡ complexity ¡ Improve ¡accuracy ¡ Solve ¡problems ¡with ¡ complex ¡structure ¡

Objec7ve ¡1 ¡ Objec7ve ¡2 ¡ Objec7ve ¡3 ¡

Task ¡1 ¡ Transfer ¡of ¡ExploraJon-‑ ExploitaJon ¡Strategies ¡ Task ¡2 ¡ Transfer ¡SoluJons ¡for ¡ Approximated ¡RL ¡ Task ¡3 ¡ Hierarchical ¡ ¡ Transfer ¡RL ¡

ExTra-‑Learn ¡

SLIDE 9

Expected Results

November 4th, 2014

A. LAZARIC - ExTra-Learn
9

Reduce ¡sample ¡ complexity ¡

Objec7ve ¡1 ¡

Task ¡1 ¡ Transfer ¡of ¡ExploraJon-‑ ExploitaJon ¡Strategies ¡

Algorithms ¡with ¡ provable ¡smaller ¡regret ¡

Improve ¡accuracy ¡

Objec7ve ¡2 ¡

Task ¡2 ¡ Transfer ¡SoluJons ¡for ¡ Approximated ¡RL ¡

ExTra-‑Learn ¡

Algorithms ¡with ¡ provable ¡smaller ¡ predicJon ¡error ¡

Solve ¡problems ¡with ¡ complex ¡structure ¡

Objec7ve ¡3 ¡

Task ¡3 ¡ Hierarchical ¡ ¡ Transfer ¡RL ¡

Models ¡and ¡algorithms ¡ for ¡automaJc ¡ hierarchical ¡ decomposiJon ¡

SLIDE 10

Expected Impact

November 4th, 2014

A. LAZARIC - ExTra-Learn
10

Reduce ¡sample ¡ complexity ¡

Objec7ve ¡1 ¡

Task ¡1 ¡ Transfer ¡of ¡ExploraJon-‑ ExploitaJon ¡Strategies ¡ Algorithms ¡with ¡provable ¡ smaller ¡regret ¡ Improve ¡accuracy ¡

Objec7ve ¡2 ¡

Task ¡2 ¡ Transfer ¡SoluJons ¡for ¡ Approximated ¡RL ¡

ExTra-‑Learn ¡

Algorithms ¡with ¡provable ¡ smaller ¡predicJon ¡error ¡ Solve ¡problems ¡with ¡ complex ¡structure ¡

Objec7ve ¡3 ¡

Task ¡3 ¡ Hierarchical ¡ ¡ Transfer ¡RL ¡ Models ¡and ¡algorithms ¡ for ¡automaJc ¡hierarchical ¡ decomposiJon ¡

Novel ¡learning ¡algorithms ¡with ¡potenJal ¡applicaJon ¡to ¡ ¡

recommenda:on ¡systems, ¡games, ¡educa:on ¡

nline ¡trading, ¡autonomous ¡robo7cs, ¡online ¡adver7sing, ¡energy ¡management… ¡

SLIDE 11

ExTra-Learn

https://project.inria.fr/ExTra-Learn/ (under construction)

Agence Nationale de Recherche (ANR) Paris

ANR ExTra-Learn

Extraction and Transfer of Knowledge in Reinforcement Learning

Consortium

Reinforcement Learning

Environment ¡

Agent ¡

Reinforcement Learning

Learning ¡of ¡a ¡behavior ¡strategy ¡(a ¡policy) ¡which ¡maximizes ¡the ¡ long ¡term ¡sum ¡of ¡rewards ¡(delayed ¡reward) ¡by ¡a ¡direct ¡interacJon ¡ (trial-­‑and-­‑error) ¡with ¡an ¡unknown ¡and ¡uncertain ¡environment. ¡

Reinforcement Learning

Transfer in Reinforcement Learning

… ¡

Transfer ¡of ¡knowledge ¡across ¡tasks ¡to ¡improve ¡ the ¡performance ¡of ¡the ¡learning ¡process ¡

Objectives

ExTra-­‑Learn ¡

Reduce ¡sample ¡ complexity ¡ Improve ¡ accuracy ¡ Solve ¡problems ¡ with ¡complex ¡ structure ¡

Tasks

ExTra-­‑Learn ¡

Expected Results

ExTra-­‑Learn ¡

Expected Impact

ExTra-­‑Learn ¡

Novel ¡learning ¡algorithms ¡with ¡potenJal ¡applicaJon ¡to ¡ ¡

recommenda:on ¡systems, ¡games, ¡educa:on ¡

ExTra-Learn

www.inria.fr

Learning ¡of ¡a ¡behavior ¡strategy ¡(a ¡policy) ¡which ¡maximizes ¡the ¡ long ¡term ¡sum ¡of ¡rewards ¡(delayed ¡reward) ¡by ¡a ¡direct ¡interacJon ¡ (trial-‑and-‑error) ¡with ¡an ¡unknown ¡and ¡uncertain ¡environment. ¡

ExTra-‑Learn ¡

ExTra-‑Learn ¡

ExTra-‑Learn ¡

ExTra-‑Learn ¡