Option Discovery in the Absence of Rewards with Manifold Analysis
Amitay Bar, Ronen Talmon and Ron Meir
Viterbi Faculty of Electrical Engineering Technion - Israel Institute of Technology
Rewards with Manifold Analysis Amitay Bar , Ronen Talmon and Ron Meir - - PowerPoint PPT Presentation
Option Discovery in the Absence of Rewards with Manifold Analysis Amitay Bar , Ronen Talmon and Ron Meir Viterbi Faculty of Electrical Engineering Technion - Israel Institute of Technology Option Discovery ry We address the problem of
Amitay Bar, Ronen Talmon and Ron Meir
Viterbi Faculty of Electrical Engineering Technion - Israel Institute of Technology
[Sutton et al. β99]
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 1 2 3 4 5 6 7
M β Adjacency matrix
1 2 3 4 5 6 7 1 2 3 4 5 6 7
D β Degree matrix
2 2 3 4 2 2 1
State=Node
1 2 π± β π΅π¬β1
π’ π‘ , denoted as π‘π π
π , build an option leading to it
π’: π β β , π π’ π‘ =
π’ππ π‘ ΰ·¨
2 To be motivated later
π
π’ allows the identification of goal states
π’ π‘ are at states that are βfar awayβ from all
π
π’ π‘ =
ΰ·
πβ₯2
ππ
π’ππ π‘ ΰ·¨
ππ
2
f13 π‘ f4 π‘
Low pass filter effect
Diffusion options (t=4) Eigenoptions Random walk Normalized visitation during learning
* Further results in _paper
[Watkins and Dayan, β92]
[Machado et al. β17]
[Machado et al. β17] [Jinnai et al. β19]
together
Euclidean distance Diffusion distance
ππ
π’
π’ β ππ‘β² π’
πΏ = 1 2 π± β π΅π¬β1
Proposition 1
The function π
π’: π β β can be expressed as
2 π‘, π‘β² π‘β²βπ is the average diffusion distance between state π‘ and all
*See ICML paper for the proof
π’ π‘ = πΈπ’ 2 π‘, π‘β² π‘β²βπ + ππππ‘π’
Proposition 1
The function π
π’: π β β can be expressed as
π’ π‘ = max πΈπ’ 2 π‘, π‘β² π‘β²βπ
Exploration benefits
π’ π‘ = πΈπ’ 2 π‘, π‘β² π‘β²βπ + ππππ‘π’
*See ICML paper for the proof
Proposition 2
Relates π
π’ π‘ to π0, the stationary distribution of the graph
π
π’ π‘ =
ππ’
π‘ β π0 2
π
π’ π‘ β€ π2 2π’ 1 π0 π‘ β 1 *See ICML paper for the proof
βOption Discovery in the Absence of Rewards with Manifold Analysisβ,