DS595/CS525: Reinforcement Learning --Introduction & Logistics - PowerPoint PPT Presentation

This lecture will be recorded! Welcome to DS595/CS525: Reinforcement Learning --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm –8:50pm THURSDAY Zoom Lecture Fall 2020

Who am I? Yanhua Li , PhD Assistant Professor Computer Science & Data Science PhD, Computer Science, U of Minnesota, 2013 PhD, Electrical Engineering, BUPT, 2009 Research Interests: Big data analytics, Artificial Intelligence, Spatio-temporal Data Mining, Smart Cities; Industrial Experience: Bell-Labs, Microsoft Research http://users.wpi.edu/~yli15/index.html

Teaching Assistant Yingxue Zhang PhD Student with WPI Data Science Program

What is this course about? v A advanced DS/CS course (primarily) for graduates v CS/DS Ph.D students in AI, DM, ML and related areas; v then, other Ph.D students or MS students with v Experience in Machine Learning, or equivalent knowledge. v Sufficient programming experience in python is expected so that you are comfortable to undertake the course projects. 4

Topics for today v What is reinforcement learning? v Difference from Supervised and unsupervised machine learning? v Application stories. Break v Topics to be covered in this course. v Course logistics 5

? Reinforcement Learning What is it? ????? Let’s see some more examples

Why (Deep) Reinforcement Learning? AlphaGo Mar. 2016

Why (Deep) Reinforcement Learning? AlphaStar: Mastering the Real-Time Strategy Game StarCraft II Apr. 2019

Why (Deep) Reinforcement Learning? MineRL competition: Minecraft ObtainDiamond task. Jun-Oct. 2019. http://minerl.io/competition/

Beyond Games -> Intelligent Agents Autonomous vehicles Intelligent and autonomous agents are as good as or doing better than human.

Beyond Games -> Robot Control Unmanned Aircraft Drone Control Intelligent and autonomous agents are as good as or doing better than human.

Beyond Games -> Robot Control Industrial Robots Robot Control Intelligent and autonomous agents are as good as or doing better than human.

? Reinforcement Learning What is it? Training intelligent agents?

Reinforcement Learning What is it? Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward. (From Wikipedia)

Scenario of Reinforcement Learning Action Observation State Change the environment Agent Reward Environment

Scenario of Reinforcement Learning Action Observation State Change the environment Agent Don’t do Reward that Environment

Scenario of Reinforcement Learning Agent learns to take actions maximizing expected reward. Action Observation State Change the environment Agent Reward Thank you. Environment https://yoast.com/how-to-clean-site-structure/

Reinforcement Learning ≈ Looking for a Function Actor/Policy Observation Action Action = Function Function π ( Observation ) output input Used to pick the Reward best function Environment

Learning to play Go Action Observation Reward Next Move Environment

Agent learns to take Learning to play Go actions maximizing expected reward. Action Observation Reward reward = 0 in most cases If win, reward = 1 If loss, reward = -1 Environment

Example: Playing Video Games • Space invader

Example: Playing Video Game Start with observation 𝑡 1 Observation 𝑡 3 Observation 𝑡 2 Obtain reward Obtain reward 𝑠 2 � 5 𝑠 1 � 0 Action 𝑏 2 � �fire� Action 𝑏 1 � �right� (kill an alien) Usually there is some randomness in the environment

Example: Playing Video Game Start with observation 𝑡 1 Observation 𝑡 � Observation 𝑡 � This is an episode . After many turns Game Over Learn to maximize the (spaceship destroyed) expected cumulative reward per episode Obtain reward 𝑠 � Action 𝑏 �

? Reinforcement Learning vs Machine Learning Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Branches of Machine Learning Supervised Unsupervised Learning Learning Machine Learning Reinforcement Learning From David Silver’s Slides

? Discussion ? Topics for today v What is reinforcement learning? v Difference from other machine learning paradigms? v Application stories. v Topics to be covered in this course. v Course logistics 26

? Other AI problems? v AI Planning v Supervised learning v Unsupervised learning v Imitation learning (inverse reinforcement learning) 27

RL involves 4 key aspects RL involves 4 key aspects 1. Optimization. 2. Exploration. 1. Optimization. 2. Exploration. v Goal is to find an optimal way to make decisions, with v Goal is to find an optimal way maximized total cumulated to make decisions, with rewards maximized total cumulated rewards 4. Delayed consequences 2. Generalization. 4. Delayed consequences 2. Generalization. v Programming all possibilities v Programming is not possible. all possibilities is not possible. $5 $20 28 $5 $20 28

AI planning vs RL • AI planning: – Optimization – Generalization – No Exploration – Delayed consequences • Computes good sequence of decisions • But given model of how decisions impact world 29

AI planning vs RL • AI planning: – Optimization: – Objective: Reward (e.g., likelihood of winning the game) – Generalization – Apply for all possible scenarios – No Exploration – Delayed consequences – A good move may lead to winning the game after multi-steps. • Computes good sequence of decisions • But given model of how decisions impact world 30

? Supervised Learning vs RL • Supervised Learning: – Optimization – Generalization – No Exploration – No Delayed consequences • Learns from experience • But provided correct labels 31

? Supervised Learning vs RL • Supervised Learning: – Optimization – Objective: Minimize the classification loss – Generalization – From training data to testing data – No Exploration – No Delayed consequences • Learns from experience • But provided correct labels 32

Unsupervised Learning vs RL • Unsupervised Learning: – Optimization – Generalization – No Exploration – No Delayed consequences • Learns from experience • But no labels from world 33

Unsupervised Learning vs RL • Unsupervised Learning: – Optimization – e.g., k-means, – objective: minimize within-cluster distance – Generalization – e.g., k-means, – New data have the same clusters (centroids) – No Exploration – No Delayed consequences • Learns from experience • But no labels from world 34

Imitation Learning vs RL • Imitation Learning: – Optimization – Generalization – No Exploration – Delayed consequences • Learns from experience of others • Assumes input demos of good policies 35

Taxi driver passenger-seeking strategy Expert drivers’ decision-making strategy leads to high hourly income Path 1 Paths Pa We Weathe Da Day of a Tr Traffic r we week alon al ong pat ath Rainy Week day Moderate ＃ 1 Clear Week day Light ＃ 2 Path 2 Rainy Weekend Light ＃ 3 Path 3 Imitation learning Output: Input: Reward function Expert driver’s R(path)=f(weather, day of week, traffic) trajectories

Imitation Learning vs RL • Reinforcement Learning: Given experts demonstration, inversely infer experts’ reward function. – Optimization –Objective: maximize the likelihood of the observed data – Generalization –New data from the expert matches the learned reward function – No Exploration – Delayed consequences –The same as RL • Learns from experience of others • Assumes input demos of good policies 37

Reinforcement Learning zation. • Imitation Learning: ng ties – Optimization ssible. • Cumulative reward – Generalization • To all scenarios – Exploration • Evaluate the reward of different choices/actions – Delayed consequences • Sparse reward • No data collected initially. • Learning as collecting data through exploration 38

Branches of Machine Learning AI planning Supervised Unsupervised Learning Learning Machine Learning Reinforcement Learning Imitation learning From David Silver’s Slides

Topics for today v What is reinforcement learning? v Difference from Supervised and unsupervised machine learning? v Application stories. v Topics to be covered in this course. v Course logistics 40

Many Faces of Reinforcement Learning Teaching Assistant Computer Science Engineering Neuroscience Machine Learning Optimal Reward Control System Reinforcement Learning Operations Classical/Operant Research Conditioning Bounded Mathematics Psychology Rationality Economics From David Silver’s Slides

Why Now? �� Intelligent Agents

Why Now? � AI Challenges

DS595/CS525: Reinforcement Learning --Introduction & Logistics - PowerPoint PPT Presentation

This lecture will be recorded! Welcome to DS595/CS525: Reinforcement Learning --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Zoom Lecture Fall 2020 Who am I? Yanhua Li , PhD Assistant Professor Computer

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525: Urban Network Analysis -- Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Learning and Mixed Integer Optimization Matteo Fischetti, University of Padova 1 Designing

CSE 291D/234 Data Systems for Machine Learning Fall 2020 Arun Kumar 1 About Myself 2009:

Adversarial Search (Game Playing) Chapter 5 Adapted from materials by Tim Finin, Marie

Welcome to CSCE 496/896: Deep Learning! Welcome to CSCE 496/896: Deep Learning! Please check

OOP Inheri -tance En- capsu- lation Learning to Play Tetris via Kuan-Ting Lai 2020/5/25

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

The Situation Calculus and Golog Golog A Tutorial Gerhard Lakemeyer Dept. of Computer

The Situation Calculus and the Frame Problem Using Theorem Proving to Generate Plans 1

DS595/CS525: Reinforcement Learning --Introduction & Logistics - PowerPoint PPT Presentation

This lecture will be recorded! Welcome to DS595/CS525: Reinforcement Learning --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Zoom Lecture Fall 2020 Who am I? Yanhua Li , PhD Assistant Professor Computer

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

DS595/CS525: Urban Network Analysis -- Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Learning and Mixed Integer Optimization Matteo Fischetti, University of Padova 1 Designing

CSE 291D/234 Data Systems for Machine Learning Fall 2020 Arun Kumar 1 About Myself 2009:

Adversarial Search (Game Playing) Chapter 5 Adapted from materials by Tim Finin, Marie

Welcome to CSCE 496/896: Deep Learning! Welcome to CSCE 496/896: Deep Learning! Please check

OOP Inheri -tance En- capsu- lation Learning to Play Tetris via Kuan-Ting Lai 2020/5/25

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

The Situation Calculus and Golog Golog A Tutorial Gerhard Lakemeyer Dept. of Computer

The Situation Calculus and the Frame Problem Using Theorem Proving to Generate Plans 1

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning