Reinforcement Learning: Not Just for Robots and Games Jibin Liu - PowerPoint PPT Presentation

Reinforcement Learning: Not Just for Robots and Games Jibin Liu Joint work with Giles Brown, Priya Venkateshan, Yabei Wu, and Heidi Lyons

Fig 1 . AlphaGo. Source: deepmind.com Fig 3 . Training robotic arm to reach target locations in the real world. Source: Kindred AI Fig 2 . A visualisation of the AlphaStar agent during game two of the match Fig 4 . A visualisation of how the OpenAI Five agent is making value against MaNa. Source: deepmind.com prediction. Source: openai.com

Where traditional web crawling fails I want to be smarter In traditional web crawling, if ● web spider we only want the targeted Home pages, we waste the time and bandwidth when crawling unnecessary pages (e.g., in red box)

In this talk: - Reinforcement Learning Demystified - Reinforcement Learning In Web Crawling - What Could Reinforcement Learning Work For Me

Reinforcement Learning Demystified Where RL sits in Machine Learning Machine Learning Supervised Unsupervised Reinforcement Learning Learning Learning

Reinforcement Learning Demystified Example of dog training Fig 5 . Dog sit. Source: giphy.com

Reinforcement Learning Demystified Dog training vs. Reinforcement Learning Agent Command, Voice, State Reward Action Action Gesture, Emotion Environment Environment

Reinforcement Learning Demystified Policy: mapping from state to Elements of RL ● action, deterministic or stochastic Reward signal: immediate ● desirability Value function: how good the ● Fig 6 . The agent–environment interaction in a Markov decision process. Source: state (state value) or action Reinforcement Learning: An Introduction, second edition (action value) is in long term Experience S t , A t , R t , S t+1 , A t+1 ... Model (optional): model-based ● vs. model-free solution

Reinforcement Learning Demystified One way to solve RL problem: Q-learning Action Value : How good it is, when the agent observes a given ● state and take an action - Q(s, a) To update q values (with SARSA sequence), we use: ● Fig 7 . Equation showing how to update q value. Source: wikipedia.com To pick an action for a given state, we use ε -greedy : ● trade-off between exploration and exploitation ○

Reinforcement Learning Demystified Q-learning code example: update q value def update_Q_using_q_learning( Q, state, action, reward, new_state, alpha, gamma ): max_q = max(Q[new_state]) if new_state is not None else 0 future_return = reward + gamma * max_q Q[state][action] += alpha * (future_return - Q[state][action]) return

Reinforcement Learning Demystified Exploration vs. Exploitation Dilemma Fig 8 . A real-life example of the exploration vs exploitation dilemma: where to eat? Source: UC Berkeley AI course, lecture 11.

Reinforcement Learning Demystified Q-learning code example: pick an action using ε -greedy import numpy as np def pick_action_using_epsilon_greedy(Q, state, epsilon): action_values = Q[state] if np.random.random_sample() > epsilon: return np.argmax(action_values) else: return np.random.choice(list(range(len(action_values))))

RL concepts on web crawling State: a given web page the url ● I'm a smart spider links (i.e., actions) in the page ● css/xpath/html attributes of ● each link Action: which link to visit next? State Reward Action Reward: defined by human Website if targeted page: +100 ● else: -10 ●

RL concepts on web crawling (cont'd) Value Functions: I'm a smart spider Q L (page, link) ● Value of a given link at a ○ specific web page State Reward Action Q A (level, attribute) ● Website Value of a given attribute ○ at a specific level

How to update the action values? Value Functions: Reward for page p , at level l Q L (page, link) ● Value of a given link at a ○ specific web page Get attributes of page p Q A (level, attribute) ● Update Q A Value of a given attribute ○ at a specific level

How to pick the next link? Value Functions: Get current level Q L (page, link) ● Value of a given link at a ○ Retrieve Q A specific web page Calculate Q L Based on Q A Q A (level, attribute) ● Value of a given attribute ○ at a specific level ε-greedy based on Q L

Results by metrics SmartSpider Traditional Spider increased and didn't increase but maintained the Cumulative rewards fluctuated around some increase rate after values (e.g., 0 or 10) 500 - 1000 episodes # of target pages 0.2 - 0.4 < 0.1 # of total downloads

Engineering overview Dockerized App (microservice) SmartSpider Extraction Download ● Custom framework in Python ● REST endpoints User ● sqlite database for q-value lookup

What problems RL can help Supervised Unsupervised Reinforcement Learning Learning Learning ● Labeled data ● No labels ● Trial-and-error search Training ● Direct feedback ● No feedback ● Reward signal ● Find "hidden ● Learn actions ● "Best" answer Usage structure" in ● Delayed reward ● Prediction data ● short-term vs long-term References: ● Reinforcement Learning: An Introduction, second edition, by by Richard S. Sutton and Andrew G. Barto ● Deep Learning Research Review Week 2: Reinforcement Learning, by Adit Deshpande. Original post link

Learning resources ● Book and Lecture Reinforcement Learning: An Introduction, second edition , by Richard ○ S. Sutton and Andrew G. Barto [UCL] COMPM050/COMPGI13 Reinforcement Learning by David ○ Silver ● Hands-on OpenAI Gym ○ Unity ML-Agents Toolkit ○

Questions?

Thank you

Reinforcement Learning: Not Just for Robots and Games Jibin Liu - PowerPoint PPT Presentation

Reinforcement Learning: Not Just for Robots and Games Jibin Liu Joint work with Giles Brown, Priya Venkateshan, Yabei Wu, and Heidi Lyons Fig 1 . AlphaGo. Source: deepmind.com Fig 3 . Training robotic arm to reach target locations in the real

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Center Moriches High School Science Research Program Update Jonathan Jeanes, High School

Using Ambient Vibration Measurements for Risk Assessment at Urban Scale : from Numerical Proof of

3 rd Quarter Earnings Conference October 11, 2016 Important Information Forward Looking

PDO Overview Budapest Energy Summit Women in Energy Conference Haifa Al Khaifi - PDO Finance

Role of Technology in Reducing CO 2 to Zero Nebojsa Nakicenovic Deputy Director General

Impact of climate change mitigation on ocean acidification projections G.-K. Plattner 1 , F. Joos

The Governed Planet? Climate Change Targets and the Role of Carbon Negative Technologies THE

Climate change and the future of the Great Barrier Reef. Ove Hoegh Guldberg Global Change