CS 287 Lecture 18 (Fall 2019) RL I: Policy Gradients Pieter Abbeel - PowerPoint PPT Presentation

Likelihood Ratio Policy Gradient [Aleksandrov, Sysoyev, & Shemeneva, 1968] [Rubinstein, 1969] [Glynn, 1986] [Reinforce, Williams 1992] [GPOMDP, Baxter & Bartlett, 2001]

Likelihood Ratio Gradient: Validity m g = 1 X r θ log P ( τ ( i ) ; θ ) R ( τ ( i ) ) r U ( θ ) ⇡ ˆ m i =1 Valid even when n n R is discontinuous and/or unknown n Sample space (of paths) is a discrete set

Likelihood Ratio Gradient: Intuition m g = 1 X r θ log P ( τ ( i ) ; θ ) R ( τ ( i ) ) r U ( θ ) ⇡ ˆ m i =1 n Gradient tries to: n Increase probability of paths with positive R n Decrease probability of paths with negative R ! Likelihood ratio changes probabilities of experienced paths, does not try to change the paths (<-> Path Derivative)

Let’s Decompose Path into States and Actions

Likelihood Ratio Gradient Estimate

Outline for Today’s Lecture Super-quick Refresher: Markov Model-free Policy Optimization: Policy Gradients n n Decision Processes (MDPs) Policy Gradient standard derivation n Reinforcement Learning n Temporal decomposition n Policy Optimization Policy Gradient importance sampling n n derivation Model-free Policy Optimization: Finite n Baseline subtraction Differences n Value function estimation n Model-free Policy Optimization: Cross- n Entropy Method Advantage Estimation (A2C/A3C/GAE) n Trust Region Policy Optimization (TRPO) n Proximal Policy Optimization (PPO) n

Derivation from Importance Sampling  P ( τ | θ ) � U ( θ ) = E τ ∼ θ old P ( τ | θ old ) R ( τ )  r θ P ( τ | θ ) � r θ U ( θ ) = E τ ∼ θ old P ( τ | θ old ) R ( τ )  r θ P ( τ | θ ) | θ old � r θ U ( θ ) | θ = θ old = E τ ∼ θ old R ( τ ) P ( τ | θ old ) ⇥ ⇤ = E τ ∼ θ old r θ log P ( τ | θ ) | θ old R ( τ ) Suggests we can also look at more than just gradient! E.g., can use importance sampled objective as “surrogate loss” (locally) [[ à later: PPO]] [Tang&Abbeel, NeurIPS 2011]

Outline for Today’s Lecture Super-quick Refresher: Markov Model-free Policy Optimization: Policy Gradients n n Decision Processes (MDPs) Policy Gradient standard derivation n Reinforcement Learning n Temporal decomposition n Policy Optimization Policy Gradient importance sampling n n derivation Model-free Policy Optimization: Finite n Baseline subtraction and temporal structure Differences n Value function estimation n Model-free Policy Optimization: Cross- n Entropy Method Advantage Estimation (A2C/A3C/GAE) n Trust Region Policy Optimization (TRPO) n Proximal Policy Optimization (PPO) n

Likelihood Ratio Gradient Estimate n As formulated thus far: unbiased but very noisy n Fixes that lead to real-world practicality n Baseline n Temporal structure n [later] Trust region / natural gradient

Likelihood Ratio Gradient: Intuition m g = 1 X r θ log P ( τ ( i ) ; θ ) R ( τ ( i ) ) r U ( θ ) ⇡ ˆ m i =1 n Gradient tries to: n Increase probability of paths with positive R n Decrease probability of paths with negative R ! Likelihood ratio changes probabilities of experienced paths, does not try to change the paths (<-> Path Derivative)

Likelihood Ratio Gradient: Baseline m g = 1 X r θ log P ( τ ( i ) ; θ ) R ( τ ( i ) ) r U ( θ ) ⇡ ˆ m i =1 m g = 1 à Consider baseline b: X r θ log P ( τ ( i ) ; θ )( R ( τ ( i ) ) � b ) r U ( θ ) ⇡ ˆ m i =1 E [ r θ log P ( τ ; θ ) b ] X = P ( τ ; θ ) r θ log P ( τ ; θ ) b τ OK as long as baseline P ( τ ; θ ) r θ P ( τ ; θ ) X = P ( τ ; θ ) b doesn’t depend on action still unbiased! τ in logprob(action) [Williams 1992] X = r θ P ( τ ; θ ) b τ X ! = r θ P ( τ ) b X = b r θ ( P ( τ )) = b ⇥ 0 τ τ = r θ ( b ) =0

Likelihood Ratio and Temporal Structure Current estimate: m g = 1 n X r θ log P ( τ ( i ) ; θ )( R ( τ ( i ) ) � b ) ˆ m i =1 H − 1 ! H − 1 ! m = 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) t , u ( i ) X X X t ) t ) � b m i =1 t =0 t =0 m H − 1 " � t − 1 � H − 1 # ! = 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) R ( s ( i ) k , u ( i ) X X X X � � t ) k ) + k ) � b m i =1 t =0 k =0 k = t Doesn’t depend on u ( i ) Ok to depend on s ( i ) t t Removing terms that don’t depend on current action can lower variance: n m H − 1 H − 1 ! 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � b ( s ( i ) X X X t ) t ) m i =1 t =0 k = t [Policy Gradient Theorem: Sutton et al, NIPS 1999; GPOMDP: Bartlett & Baxter, JAIR 2001; Survey: Peters & Schaal, IROS 2006]

Baseline Choices Good choice for b? n m b = E [ R ( τ )] ≈ 1 n Constant baseline: X R ( τ ( i ) ) m i =1 n Optimal Constant baseline: m H − 1 n Time-dependent baseline: b t = 1 R ( s ( i ) k , u ( i ) X X k ) m i =1 k = t n State-dependent expected return: b ( s t ) = E [ r t + r t +1 + r t +2 + . . . + r H − 1 ] = V π ( s t ) à Increase logprob of action proportionally to how much its returns are better than the expected return under the current policy [See: Greensmith, Bartlett, Baxter, JMLR 2004 for variance reduction techniques.]

Outline for Today’s Lecture Super-quick Refresher: Markov Model-free Policy Optimization: Policy Gradients n n Decision Processes (MDPs) Policy Gradient standard derivation n Reinforcement Learning n Temporal decomposition n Policy Optimization Policy Gradient importance sampling n n derivation Model-free Policy Optimization: Finite n Baseline subtraction & temporal structure Differences n Value function estimation n Model-free Policy Optimization: Cross- n Entropy Method Advantage Estimation (A2C/A3C/GAE) n Trust Region Policy Optimization (TRPO) n Proximal Policy Optimization (PPO) n

Monte Carlo Estimation of V π H − 1 H − 1 ! m 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � V π ( s ( i ) X X X t ) k ) m i =1 t =0 k = t How to estimate? V π Init n φ 0 n Collect trajectories τ 1 , . . . , τ m n Regress against empirical return: ! 2 H − 1 � H − 1 m 1 θ ( s ( i ) R ( s ( i ) k , u ( i ) X X X V π � φ i +1 ← arg min t ) − k ) m φ i =1 t =0 k = t

Bootstrap Estimation of V π Bellman Equation for V π n X X P ( s 0 | s, u )[ R ( s, u, s 0 ) + γ V π ( s 0 )] V π ( s ) = π ( u | s ) s 0 u V π Init n φ 0 n Collect data {s, u, s’, r} n Fitted V iteration: X φ i ( s 0 ) � V φ ( s ) k 2 2 + λ k φ � φ i k 2 k r + V π φ i +1 min 2 φ ( s,u,s 0 ,r )

Vanilla Policy Gradient ~ [Williams, 1992]

Recall Our Likelihood Ratio PG Estimator H − 1 H − 1 ! m 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � V π ( s ( i ) X X X t ) k ) m i =1 t =0 k = t n Estimation of Q from single roll-out Q π ( s, u ) = E [ r 0 + r 1 + r 2 + · · · | s 0 = s, a 0 = a ] n = high variance per sample based / no generalization used n Reduce variance by discounting n Reduce variance by function approximation (=critic)

Recall Our Likelihood Ratio PG Estimator H − 1 H − 1 ! m 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � V π ( s ( i ) X X X t ) k ) m i =1 t =0 k = t n Estimation of Q from single roll-out Q π ( s, u ) = E [ r 0 + r 1 + r 2 + · · · | s 0 = s, a 0 = a ] n = high variance per sample based / no generalization n Reduce variance by discounting n Reduce variance by function approximation (=critic)

Further Refinements H − 1 H − 1 ! m 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � V π ( s ( i ) X X X t ) k ) m i =1 t =0 k = t n Estimation of Q from single roll-out Q π ( s, u ) = E [ r 0 + r 1 + r 2 + · · · | s 0 = s, a 0 = a ] n = high variance per sample based / no generalization n Reduce variance by discounting n Reduce variance by function approximation (=critic)

Recall Our Likelihood Ratio PG Estimator H − 1 H − 1 ! m 1 r θ log π θ ( u ( i ) t | s ( i ) R ( s ( i ) k , u ( i ) k ) � V π ( s ( i ) X X X t ) k ) m i =1 t =0 k = t n Estimation of Q from single roll-out Q π ( s, u ) = E [ r 0 + r 1 + r 2 + · · · | s 0 = s, a 0 = a ] n = high variance per sample based / no generalization n Reduce variance by discounting n Reduce variance by function approximation (=critic)

Variance Reduction by Discounting Q π ( s, u ) = E [ r 0 + r 1 + r 2 + · · · | s 0 = s, a 0 = a ] à introduce discount factor as a hyperparameter to improve estimate of Q: Q π , γ ( s, u ) = E [ r 0 + γ r 1 + γ 2 r 2 + · · · | s 0 = s, a 0 = a ]

Reducing Variance by Function Approximation Q π , γ ( s, u ) = E [ r 0 + γ r 1 + γ 2 r 2 + · · · | s 0 = s, u 0 = u ] = E [ r 0 + γ V π ( s 1 ) | s 0 = s, u 0 = u ] = E [ r 0 + γ r 1 + γ 2 V π ( s 2 ) | s 0 = s, u 0 = u ] = E [ r 0 + γ r 1 + + γ 2 r 2 + γ 3 V π ( s 3 ) | s 0 = s, u 0 = u ] = · · · n Generalized Advantage Estimation uses an exponentially weighted average of these n ~ TD(lambda)

Reducing Variance by Function Approximation Q π , γ ( s, u ) = E [ r 0 + γ r 1 + γ 2 r 2 + · · · | s 0 = s, u 0 = u ] = E [ r 0 + γ V π ( s 1 ) | s 0 = s, u 0 = u ] = E [ r 0 + γ r 1 + γ 2 V π ( s 2 ) | s 0 = s, u 0 = u ] = E [ r 0 + γ r 1 + + γ 2 r 2 + γ 3 V π ( s 3 ) | s 0 = s, u 0 = u ] = · · · Async Advantage Actor Critic (A3C) [Mnih et al, 2016] n ˆ Q one of the above choices (e.g. k=5 step lookahead) n

Reducing Variance by Function Approximation Q π , γ ( s, u ) = E [ r 0 + γ r 1 + γ 2 r 2 + · · · | s 0 = s, u 0 = u ] (1 − λ ) = E [ r 0 + γ V π ( s 1 ) | s 0 = s, u 0 = u ] (1 − λ ) λ = E [ r 0 + γ r 1 + γ 2 V π ( s 2 ) | s 0 = s, u 0 = u ] (1 − λ ) λ 2 = E [ r 0 + γ r 1 + + γ 2 r 2 + γ 3 V π ( s 3 ) | s 0 = s, u 0 = u ] (1 − λ ) λ 3 = · · · Generalized Advantage Estimation (GAE) [Schulman et al, ICLR 2016] n ˆ Q = lambda exponentially weighted average of all the above n ~ TD(lambda) / eligibility traces [Sutton and Barto, 1990] n

Actor-Critic with A3C or GAE Policy Gradient + Generalized Advantage Estimation: n V π Init π θ 0 n φ 0 ˆ Collect roll-outs {s, u, s’, r} and Q i ( s, u ) n Update: X k ˆ Q i ( s, u ) � V π φ ( s ) k 2 2 + κ k φ � φ i k 2 φ i +1 min n 2 φ ( s,u,s 0 ,r ) m H − 1 θ i +1 θ i + α 1 ⇣ ⌘ r θ log π θ i ( u ( k ) | s ( k ) Q i ( s ( k ) , u ( k ) φ i ( s ( k ) X X ˆ ) � V π ) ) t t t t t m t =0 k =1 Note: many variations, e.g. could instead use 1-step for V, full roll-out for pi: X φ i ( s 0 ) � V φ ( s ) k 2 2 + λ k φ � φ i k 2 k r + V π φ i +1 min 2 φ ( s,u,s 0 ,r ) m H − 1 H − 1 ! 1 r θ log π θ i ( u ( k ) | s ( k ) r ( k ) φ i ( s ( k ) X X X � V π θ i +1 θ i + α ) t 0 ) t t t 0 m t =0 t 0 = t k =1

Async Advantage Actor Critic (A3C) n [Mnih et al, ICML 2016] n Likelihood Ratio Policy Gradient n n-step Advantage Estimation

A3C -- labyrinth

Example: Toddler Robot [Tedrake, Zhang and Seung, 2005] [Video: TODDLER – 40s]

GAE: Effect of gamma and lambda [Schulman et al, 2016 -- GAE]

Step-sizing and Trust Regions n Step-sizing necessary as gradient is only first-order approximation

What’s in a step-size? n Terrible step sizes, always an issue, but how about just not so great ones? n Supervised learning n Step too far à next update will correct for it n Reinforcement learning n Step too far à terrible policy n Next mini-batch: collected under this terrible policy! n Not clear how to recover short of going back and shrinking the step size

Step-sizing and Trust Regions n Simple step-sizing: Line search in direction of gradient n Simple, but expensive (evaluations along the line) n Naïve: ignores where the first-order approximation is good/poor

Step-sizing and Trust Regions n Advanced step-sizing: Trust regions n First-order approximation from gradient is a good approximation within “trust region” à Solve for best point within trust region: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε H − 1 n Recall: Y P ( τ ; θ ) = P ( s 0 ) π θ ( u t | s t ) P ( s t +1 | s t , u t ) t =0 n Hence: P ( τ ; θ ) X KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) = P ( τ ; θ ) log P ( τ ; θ + δθ ) τ P ( s 0 ) Q H − 1 t =0 π θ ( u t | s t ) P ( s t +1 | s t , u t ) X = P ( τ ; θ ) log P ( s 0 ) Q H − 1 t =0 π θ + δθ ( u t | s t ) P ( s t +1 | s t , u t ) τ Q H − 1 t =0 π θ ( u t | s t ) dynamics cancels out! J X = P ( τ ; θ ) log Q H − 1 t =0 π θ + δθ ( u t | s t ) τ ≈ 1 π θ ( u | s ) X π θ ( u | s ) log π θ + δθ ( u | s ) M ( s,u ) in roll − outs under θ = 1 X KL ( π θ ( u | s ) || π θ + δθ ( u | s )) M ( s,u ) ∼ θ

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε H − 1 n Recall: Y P ( τ ; θ ) = P ( s 0 ) π θ ( u t | s t ) P ( s t +1 | s t , u t ) t =0 n Hence: P ( τ ; θ ) X KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) = P ( τ ; θ ) log P ( τ ; θ + δθ ) τ P ( s 0 ) Q H − 1 t =0 π θ ( u t | s t ) P ( s t +1 | s t , u t ) X = P ( τ ; θ ) log P ( s 0 ) Q H − 1 t =0 π θ + δθ ( u t | s t ) P ( s t +1 | s t , u t ) τ Q H − 1 t =0 π θ ( u t | s t ) dynamics cancels out! J X = P ( τ ; θ ) log Q H − 1 t =0 π θ + δθ ( u t | s t ) τ ≈ 1 π θ ( u | s ) X π θ ( u | s ) log ≈ 1 π θ ( u | s ) X π θ + δθ ( u | s ) M log π θ + δθ ( u | s ) M ( s,u ) in roll − outs under θ s , u in roll − outs under θ = 1 X KL ( π θ ( u | s ) || π θ + δθ ( u | s )) M ( s,u ) ∼ θ

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Has become: g > δθ max ˆ δθ 1 π θ ( u | s ) X s . t . log π θ + δθ ( u | s ) ≤ ε M (s , u) ⇠ θ

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Has become: g > δθ max ˆ δθ 1 π θ ( u | s ) X s . t . log π θ + δθ ( u | s ) ≤ ε M (s , u) ⇠ θ n How to enforce this constraint given complex policies like neural nets n 2 nd approximation of KL Divergence n (1) First order approximation is constant n (2) Hessian is Fisher Information Matrix

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Has become: g > δθ max ˆ δθ 1 π θ ( u | s ) X s . t . log π θ + δθ ( u | s ) ≤ ε M (s , u) ⇠ θ n 2 nd order approximation to KL: 0 1 @ X A δθ KL ( π θ ( u | s ) || π θ + δθ ( u | s ) ⇡ δθ > r θ log π θ ( u | s ) r θ log π θ ( u | s ) > ( s,u ) ⇠ θ = δθ > F θ δθ

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Has become: g > δθ max ˆ δθ 1 π θ ( u | s ) X s . t . log π θ + δθ ( u | s ) ≤ ε M (s , u) ⇠ θ n 2 nd order approximation to KL: 0 1 @ X A δθ KL ( π θ ( u | s ) || π θ + δθ ( u | s ) ⇡ δθ > r θ log π θ ( u | s ) r θ log π θ ( u | s ) > ( s,u ) ⇠ θ = δθ > F θ δθ à Fisher matrix easily computed from gradient calculations F θ

Evaluating the KL n Our problem: g > δθ max ˆ δθ s . t . δθ > F θ δθ ≤ ε n Done? n Deep RL à high-dimensional, and building / inverting impractical θ F θ n Efficient scheme through conjugate gradient [Schulman et al, 2015, TRPO] n Can we do better? n Replace objective by surrogate loss that’s higher order approximation yet equally efficient to evaluate [Schulman et al, 2015, TRPO] n Note: surrogate loss idea is generally applicable when likelihood ratio gradients are used

CS 287 Lecture 18 (Fall 2019) RL I: Policy Gradients Pieter Abbeel - PowerPoint PPT Presentation

CS 287 Lecture 18 (Fall 2019) RL I: Policy Gradients Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline for Todays Lecture Super-quick Refresher: Markov Model-free Policy

287(g) Program Sheriff Eric J. Severson Waukesha County, WI 287(g) Program Legal Authority

Outline Last time Image gradients Seam carving gradients as energy Edges

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

US 287 / SH 40 Passing Lane Pre-Proposal August 18, 2020 1 US 287 / SH 40 Passing Lane Project

CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy gradients fit a model to estimate

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

CS 287 Lecture 19 (Fall 2019) Off-Policy, Model-Free RL: DQN, SoftQ , DDPG, SAC Pieter Abbeel

CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture

CS 287 Advanced Robotics (Fall 2019) Lecture 9: Motion Planning Lecture by: Huazhe (Harry) Xu

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

Immigration and Customs Enforcement 287(g) Program & Secure Communities Terry S. Johnson

Reshaping Westchesters I-287 Corridor Making the most of a major investment in regional

Southeast Connector Update Village Creek Neighborhood Association I-20, I-820, & US 287

Realigning Cancer Care To Address Interactional Suffering Debra Parker Oliver PhD, MSW Paul

The Simulated Annealing Algorithm An approximation algorithm Course: CS 5130 - Advanced Data

I NFORMATION A RCHITECTURE ? 6/15 Eva Christina Edinger M ACRO L EVEL M ACRO L EVEL Konstanz:

Exploring a Labyrinth Without Getting Lost A depth-first search (DFS) in an undirected graph

Controlling modular robotic systems: some ideas from Computational Geometry Vera Sacristn

Deep Reinforcement Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

Ancient-Future Faith and the Spiritual Formation Movement Bruce Baker, Pastor Washington County

CS 287 Lecture 18 (Fall 2019) RL I: Policy Gradients Pieter Abbeel - PowerPoint PPT Presentation

CS 287 Lecture 18 (Fall 2019) RL I: Policy Gradients Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline for Todays Lecture Super-quick Refresher: Markov Model-free Policy

287(g) Program Sheriff Eric J. Severson Waukesha County, WI 287(g) Program Legal Authority

Outline Last time Image gradients Seam carving gradients as energy Edges

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

US 287 / SH 40 Passing Lane Pre-Proposal August 18, 2020 1 US 287 / SH 40 Passing Lane Project

CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy gradients fit a model to estimate

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

CS 287 Lecture 19 (Fall 2019) Off-Policy, Model-Free RL: DQN, SoftQ , DDPG, SAC Pieter Abbeel

CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture

CS 287 Advanced Robotics (Fall 2019) Lecture 9: Motion Planning Lecture by: Huazhe (Harry) Xu

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

Immigration and Customs Enforcement 287(g) Program &amp; Secure Communities Terry S. Johnson

Reshaping Westchesters I-287 Corridor Making the most of a major investment in regional

Southeast Connector Update Village Creek Neighborhood Association I-20, I-820, &amp; US 287

Realigning Cancer Care To Address Interactional Suffering Debra Parker Oliver PhD, MSW Paul

The Simulated Annealing Algorithm An approximation algorithm Course: CS 5130 - Advanced Data

I NFORMATION A RCHITECTURE ? 6/15 Eva Christina Edinger M ACRO L EVEL M ACRO L EVEL Konstanz:

Exploring a Labyrinth Without Getting Lost A depth-first search (DFS) in an undirected graph

Controlling modular robotic systems: some ideas from Computational Geometry Vera Sacristn

Deep Reinforcement Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

Ancient-Future Faith and the Spiritual Formation Movement Bruce Baker, Pastor Washington County

Immigration and Customs Enforcement 287(g) Program & Secure Communities Terry S. Johnson

Southeast Connector Update Village Creek Neighborhood Association I-20, I-820, & US 287