learning how to active learn a deep reinforcement
play

Learning how to Active Learn: A Deep Reinforcement Learning Approach - PowerPoint PPT Presentation

Learning how to Active Learn: A Deep Reinforcement Learning Approach Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018 April 05, 2018 1 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine


  1. Learning how to Active Learn: A Deep Reinforcement Learning Approach Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018 April 05, 2018 1 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  2. Overview Introduction 1 Model 2 Algorithms 3 Numerical Experiments 4 April 05, 2018 2 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  3. Introduction: Active Learning 1 Annotation: April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  4. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  5. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  6. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  7. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  8. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  9. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy of the classification model April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  10. Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy of the classification model ⋄ active learning becomes a sequential decision: as each sentence arrives, annotate it or not (our action) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  11. Introduction: MDP 1 Markov Decision Process (MDP): April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  12. Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  13. Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  14. Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state ( r ( s, a ) ) is generated and current state transits to next state April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  15. Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state ( r ( s, a ) ) is generated and current state transits to next state ⋄ agent aims maximizing the expected value of rewards over all stages April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  16. Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  17. Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  18. Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) ⋄ Bellman equation 2 (more common!): Q-function � Q ( s ′ , u ) Q ( s, a ) = ¯ r ( s, a ) + α P ss ′ ( a ) max u s ′ a ∗ s = argmax Q ( s, a ) April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  19. Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) ⋄ Bellman equation 2 (more common!): Q-function � Q ( s ′ , u ) Q ( s, a ) = ¯ r ( s, a ) + α P ss ′ ( a ) max u s ′ a ∗ s = argmax Q ( s, a ) ⋄ where ¯ r ( s, a ) is the expected reward, P ss ′ ( a ) is the transition probability from state s to s ′ , α is the discount of reward April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  20. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  21. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  22. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  23. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  24. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ ⋄ where t is iteration and ǫ t is the learning rate April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  25. Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ ⋄ where t is iteration and ǫ t is the learning rate ⋄ In practice, above is useless: | S | × | A | is huge April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  26. Deep Q-Learning 1 Deep Q-learning: April 05, 2018 7 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

  27. Deep Q-Learning 1 Deep Q-learning: ⋄ use the output of a DNN parametrized by θ , i.e., f θ ( s, u ) to approximate Q ( s, a ) : April 05, 2018 7 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend