learning light transport the reinforced way
play

Learning Light Transport the Reinforced Way Ken Dahm and Alexander - PowerPoint PPT Presentation

Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller Light Transport Simulation How to do importance sampling compute functionals of a Fredholm integral equation of the 2nd kind L ( x , ) = L e ( x , )+ + ( x


  1. Learning Light Transport the Reinforced Way Ken Dahm and Alexander Keller

  2. Light Transport Simulation How to do importance sampling � compute functionals of a Fredholm integral equation of the 2nd kind � L ( x , ω ) = L e ( x , ω )+ + ( x ) L ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2

  3. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 2

  4. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 2

  5. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ

  6. Light Transport Simulation How to do importance sampling � example: direct illumination � L ( x , ω ) = L e ( x , ω )+ + ( x ) L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i d ω i S 2 N − 1 L e ( x , ω )+ 1 L e ( h ( x , ω i ) , − ω i ) f r ( ω i , x , ω ) cos θ i ∑ ≈ N p ( ω i ) i = 0 p ∼ f r cos θ p ∼ L e f r cos θ 2

  7. Machine Learning

  8. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks 4

  9. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks 4

  10. Machine Learning Taxonomy supervised learning: learning from labeled data goal: extrapolate/generalize response to unseen data example: artificial neural networks unsupervised learning: learning from unlabeled data goal: identify structure in data example: autoencoder networks semi-supervised learning: reward-based supervision goal: maximize reward example: reinforcement learning 4

  11. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t Environment 5

  12. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment 5

  13. The Reinforcement Learning Problem Maximize reward � policy π t : S → A ( S t ) Agent S t – to select an action A t ∈ A ( S t ) – given current state S t ∈ S S t + 1 R t + 1 ( A t | S t ) A t � state transition yields reward R t + 1 ( A t | S t ) ∈ R Environment � classic goal: find policy to maximize discounted cumulative reward ∞ γ k R t + 1 + k ( A t + k | S t + k ) , where 0 < γ < 1 V π ( S t ) ≡ ∑ k = 0 5

  14. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] 6

  15. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ≡         6

  16. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space         6

  17. The Reinforcement Learning Problem Q-Learning [Watkins 1989] � finds optimal action-selection policy for any given Markov decision process Q ′ ( s , a ) r ( s , a )+ γ V ( s ′ ) � � = ( 1 − α ) · Q ( s , a )+ α · for a learning rate α ∈ [ 0 , 1 ] with the following options for the discounted cumulative reward  max a ′ ∈ A Q ( s ′ , a ′ ) consider best action         V ( s ′ ) ∑ a ′ ∈ A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) ≡ policy weighted average over discrete action space     �  A π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′  policy weighted average over continuous action space   6

  18. Light Transport Simulation and Reinforcement Learning

  19. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  20. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  21. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  22. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  23. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A 8

  24. Light Transport Simulation and Reinforcement Learning Structural equivalence of the integral equations � matching terms � L ( x , ω ) = L e ( x , ω ) + f r ( ω i , x , ω ) cos θ i L ( h ( x , ω i ) , − ω i ) d ω i S 2 + ( x ) � � � Q ′ ( s , a ) π ( s ′ , a ′ ) Q ( s ′ , a ′ ) da ′ = ( 1 − α ) Q ( s , a )+ α r ( s , a ) + γ A hints at learning the incident radiance � � � Q ′ ( x , ω ) = ( 1 − α ) Q ( x , ω )+ α L e ( y , − ω )+ + ( y ) f r ( ω i , y , − ω ) cos θ i Q ( y , ω i ) d ω i S 2 as a policy for selecting an action ω in state x to reach the next state y := h ( x , ω ) � the learning rate α is the only parameter left 8

  25. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u 9

  26. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search 9

  27. Light Transport Simulation and Reinforcement Learning Discretization of Q in analogy to irradiance representations � action space: a ∈ S 2 + ( y ) – equally sized patches √     1 − u 2 cos ( 2 π v ) x √  = 1 − u 2 sin ( 2 π v ) y    z u � state space: s ∈ dV – Voronoi diagram of low-discrepancy sequence � Φ 2 ( i ) � x i = for i = 0 ,..., N − 1 i / N – nearest neighbor search including surface normal 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend