journal of arti cial in telligence researc h
play

Journal of Articial In telligence Researc h - PDF document

Journal of Articial In telligence Researc h Submitted published Reinforcemen t Learning A Surv ey Leslie P ac k Kaelbling lpkcsbr o


  1. Journal of Arti�cial In telligence Researc h � ������ ������� Submitted ����� published ���� Reinforcemen t Learning� A Surv ey Leslie P ac k Kaelbling lpk�cs�br o wn�edu Mic hael L� Littman mlittman�cs�br o wn�edu Computer Scienc e Dep artment� Box ����� Br own University Pr ovidenc e� RI ���������� USA Andrew W� Mo ore a wm�cs�cmu�edu Smith Hal l ���� Carne gie Mel lon University� ���� F orb es A venue Pittsbur gh� P A ����� USA Abstract This pap er surv eys the �eld of reinforcemen t learning from a computer�science p er� sp ectiv e� It is written to b e accessible to researc hers familia r with mac hine learning� Both the historical basis of the �eld and a broad selection of curren t w ork are summarized� Reinforcemen t learning is the problem faced b y an agen t that learns b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a resem blance to w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� The pap er discusses cen tral issues of reinforcemen t learning� including trading o� exploration and exploitation� establishing the foundations of the �eld via Mark o v decision theory � learning from dela y ed reinforcemen t� constructing empirical mo dels to accelerate learning� making use of generalization and hierarc h y � and coping with hidden state� It concludes with a surv ey of some implemen ted systems and an assessmen t of the practical utilit y of curren t metho ds for reinforcemen t learning� �� In tro duction Reinforcemen t learning dates bac k to the early da ys of cyb ernetics and w ork in statistics� psyc hology � neuroscience� and computer science� In the last �v e to ten y ears� it has attracted rapidly increasing in terest in the mac hine learning and arti�cial in telligence comm unities� Its promise is b eguiling�a w a y of programming agen ts b y rew ard and punishmen t without needing to sp ecify how the task is to b e ac hiev ed� But there are formidable computational obstacles to ful�lling the promise� This pap er surv eys the historical basis of reinforcemen t learning and some of the curren t w ork from a computer science p ersp ectiv e� W e giv e a high�lev el o v erview of the �eld and a taste of some sp eci�c approac hes� It is� of course� imp ossible to men tion all of the imp ortan t w ork in the �eld� this should not b e tak en to b e an exhaustiv e accoun t� Reinforcemen t learning is the problem faced b y an agen t that m ust learn b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a strong family resem blance to ep on ymous w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� It is appropriately though t of as a class of problems� rather than as a set of tec hniques� There are t w o main strategies for solving reinforcemen t�learning problems� The �rst is to searc h in the space of b eha viors in order to �nd one that p erforms w ell in the en vironmen t� This approac h has b een tak en b y w ork in genetic algorithms and genetic programming� � ���� c AI Access F oundation and Morgan Kaufmann Publishers� All righ ts reserv ed�

  2. Kaelbling� Littman� � Moore T a s i I B R r Figure �� The standard reinforcemen t�learning mo del� as w ell as some more no v el searc h tec hniques �Sc hmidh ub er� ������ The second is to use statistical tec hniques and dynamic programming metho ds to estimate the utilit y of taking actions in states of the w orld� This pap er is dev oted almost en tirely to the second set of tec hniques b ecause they tak e adv an tage of the sp ecial structure of reinforcemen t�learning problems that is not a v ailable in optimization problems in general� It is not y et clear whic h set of approac hes is b est in whic h circumstances� The rest of this section is dev oted to establishing notation and describing the basic reinforcemen t�learning mo del� Section � explains the trade�o� b et w een exploration and exploitation and presen ts some solutions to the most basic case of reinforcemen t�learning problems� in whic h w e w an t to maximize the immediate rew ard� Section � considers the more general problem in whic h rew ards can b e dela y ed in time from the actions that w ere crucial to gaining them� Section � considers some classic mo del�free algorithms for reinforcemen t learning from dela y ed rew ard� adaptiv e heuristic critic� T D � � � and Q�learning� Section � demonstrates a con tin uum of algorithms that are sensitiv e to the amoun t of computation an agen t can p erform b et w een actual steps of action in the en vironmen t� Generalization�the cornerstone of mainstream mac hine learning researc h�has the p oten tial of considerably aiding reinforcemen t learning� as describ ed in Section �� Section � considers the problems that arise when the agen t do es not ha v e complete p erceptual access to the state of the en vironmen t� Section � catalogs some of reinforcemen t learning�s successful applications� Finally � Section � concludes with some sp eculations ab out imp ortan t op en problems and the future of reinforcemen t learning� ��� Reinforcemen t�Learning Mo del In the standard reinforcemen t�learning mo del� an agen t is connected to its en vironmen t via p erception and action� as depicted in Figure �� On eac h step of in teraction the agen t receiv es as input� i � some indication of the curren t state� s � of the en vironmen t� the agen t then c ho oses an action� a � to generate as output� The action c hanges the state of the en vironmen t� and the v alue of this state transition is comm unicated to the agen t through a scalar r einfor c ement signal � r � The agen t�s b eha vior� B � should c ho ose actions that tend to increase the long�run sum of v alues of the reinforcemen t signal� It can learn to do this o v er time b y systematic trial and error� guided b y a wide v ariet y of algorithms that are the sub ject of later sections of this pap er� ���

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend