Journal of Articial In telligence Researc h - PDF document

Journal of Arti�cial In telligence Researc h � �� Submitted �� published �� Reinforcemen t Learning� A Surv ey Leslie P ac k Kaelbling lpk�cs�br o wn�edu Mic hael L� Littman mlittman�cs�br o wn�edu Computer Scienc e Dep artment� Box �� Br own University Pr ovidenc e� RI �� USA Andrew W� Mo ore a wm�cs�cmu�edu Smith Hal l �� Carne gie Mel lon University� �� F orb es A venue Pittsbur gh� P A �� USA Abstract This pap er surv eys the �eld of reinforcemen t learning from a computer�science p er� sp ectiv e� It is written to b e accessible to researc hers familia r with mac hine learning� Both the historical basis of the �eld and a broad selection of curren t w ork are summarized� Reinforcemen t learning is the problem faced b y an agen t that learns b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a resem blance to w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� The pap er discusses cen tral issues of reinforcemen t learning� including trading o� exploration and exploitation� establishing the foundations of the �eld via Mark o v decision theory � learning from dela y ed reinforcemen t� constructing empirical mo dels to accelerate learning� making use of generalization and hierarc h y � and coping with hidden state� It concludes with a surv ey of some implemen ted systems and an assessmen t of the practical utilit y of curren t metho ds for reinforcemen t learning� �� In tro duction Reinforcemen t learning dates bac k to the early da ys of cyb ernetics and w ork in statistics� psyc hology � neuroscience� and computer science� In the last �v e to ten y ears� it has attracted rapidly increasing in terest in the mac hine learning and arti�cial in telligence comm unities� Its promise is b eguiling�a w a y of programming agen ts b y rew ard and punishmen t without needing to sp ecify how the task is to b e ac hiev ed� But there are formidable computational obstacles to ful�lling the promise� This pap er surv eys the historical basis of reinforcemen t learning and some of the curren t w ork from a computer science p ersp ectiv e� W e giv e a high�lev el o v erview of the �eld and a taste of some sp eci�c approac hes� It is� of course� imp ossible to men tion all of the imp ortan t w ork in the �eld� this should not b e tak en to b e an exhaustiv e accoun t� Reinforcemen t learning is the problem faced b y an agen t that m ust learn b eha vior through trial�and�error in teractions with a dynamic en vironmen t� The w ork describ ed here has a strong family resem blance to ep on ymous w ork in psyc hology � but di�ers considerably in the details and in the use of the w ord �reinforcemen t�� It is appropriately though t of as a class of problems� rather than as a set of tec hniques� There are t w o main strategies for solving reinforcemen t�learning problems� The �rst is to searc h in the space of b eha viors in order to �nd one that p erforms w ell in the en vironmen t� This approac h has b een tak en b y w ork in genetic algorithms and genetic programming� � �� c AI Access F oundation and Morgan Kaufmann Publishers� All righ ts reserv ed�

Kaelbling� Littman� � Moore T a s i I B R r Figure �� The standard reinforcemen t�learning mo del� as w ell as some more no v el searc h tec hniques �Sc hmidh ub er� �� The second is to use statistical tec hniques and dynamic programming metho ds to estimate the utilit y of taking actions in states of the w orld� This pap er is dev oted almost en tirely to the second set of tec hniques b ecause they tak e adv an tage of the sp ecial structure of reinforcemen t�learning problems that is not a v ailable in optimization problems in general� It is not y et clear whic h set of approac hes is b est in whic h circumstances� The rest of this section is dev oted to establishing notation and describing the basic reinforcemen t�learning mo del� Section � explains the trade�o� b et w een exploration and exploitation and presen ts some solutions to the most basic case of reinforcemen t�learning problems� in whic h w e w an t to maximize the immediate rew ard� Section � considers the more general problem in whic h rew ards can b e dela y ed in time from the actions that w ere crucial to gaining them� Section � considers some classic mo del�free algorithms for reinforcemen t learning from dela y ed rew ard� adaptiv e heuristic critic� T D � � � and Q�learning� Section � demonstrates a con tin uum of algorithms that are sensitiv e to the amoun t of computation an agen t can p erform b et w een actual steps of action in the en vironmen t� Generalization�the cornerstone of mainstream mac hine learning researc h�has the p oten tial of considerably aiding reinforcemen t learning� as describ ed in Section �� Section � considers the problems that arise when the agen t do es not ha v e complete p erceptual access to the state of the en vironmen t� Section � catalogs some of reinforcemen t learning�s successful applications� Finally � Section � concludes with some sp eculations ab out imp ortan t op en problems and the future of reinforcemen t learning� �� Reinforcemen t�Learning Mo del In the standard reinforcemen t�learning mo del� an agen t is connected to its en vironmen t via p erception and action� as depicted in Figure �� On eac h step of in teraction the agen t receiv es as input� i � some indication of the curren t state� s � of the en vironmen t� the agen t then c ho oses an action� a � to generate as output� The action c hanges the state of the en vironmen t� and the v alue of this state transition is comm unicated to the agen t through a scalar r einfor c ement signal � r � The agen t�s b eha vior� B � should c ho ose actions that tend to increase the long�run sum of v alues of the reinforcemen t signal� It can learn to do this o v er time b y systematic trial and error� guided b y a wide v ariet y of algorithms that are the sub ject of later sections of this pap er� ��

Journal of Articial In telligence Researc h - PDF document

Journal of Articial In telligence Researc h Submitted published Reinforcemen t Learning A Surv ey Leslie P ac k Kaelbling lpkcsbr o

Alte ternate te De Definiti tions (Ru Human inte telligence (Russell + Norv ssell + Norvig

Journal of Articial In telligence Researc h 12 (2000) 219-234 Submitted 5/99;

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

What t is Arti tificial Inte telligence? Webster says: a. the capacity to acquire and apply

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

Business Credit Journal Business Credit Journal Business Credit Journal Business Credit Journal

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan

BANKO BIKANO COMMUNITY LED SANITATION CAMPAIGN IN BIKANER Under Nirmal Bharat Abhiyan Arti Dogra

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

MIGRA GRATION AND ND SO SOCI CIAL AL SECURI SECURITY MIGRA GRATION AND ND SO SOCI CIAL

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

HOW TO WRITE A CASE COMMENT Georgetown Journal of Gender and the Law Georgetown Journal of Legal

COMPARATIVE LAW JOURNAL OF THE PACIFIC JOURNAL DE DROIT COMPARE DU PACIFIQUE Le Comparative Law

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

Moving Agents in Formation in Congested Environments Jiaoyang Li, 1 Kexuan Sun, 1 Hang Ma, 2 Ariel

Allergy and Immunology Pearls for Clinical Practice 2017 Katherine Gundling, MD FACP Professor,

Matching Theory Mihai Manea MIT Based on slides by Fuhito Kojima. Market Design Traditional

Introduction to Insects PJ Liesch UW-Madison Dept. Entomology ! What%are%Insects?% !

CSC2556 Lecture 5 Matching - Stable Matching - Kidney Exchange [Slides: Ariel Procaccia]

Reminder CS 188: Artificial Intelligence Only a very small fraction of AI is about making

Aedes Aegypti control experiences and challenges Fabiano Geraldo Pimenta Jnior

Middleware Challenges Ahead Kurt Geihs, Goethe University Presented by Eric Leshay CS 525M

Sambuz

Useful Links

Newsletter

Mail Us

Journal of Articial In telligence Researc h - PDF document

Journal of Articial In telligence Researc h Submitted published Reinforcemen t Learning A Surv ey Leslie P ac k Kaelbling lpkcsbr o

Alte ternate te De Definiti tions (Ru Human inte telligence (Russell + Norv ssell + Norvig

Journal of Articial In telligence Researc h 12 (2000) 219-234 Submitted 5/99;

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

What t is Arti tificial Inte telligence? Webster says: a. the capacity to acquire and apply

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

Business Credit Journal Business Credit Journal Business Credit Journal Business Credit Journal

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales &amp; Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data &amp; Distributed

CS 337: Arti fi cial Intelligence &amp; Machine Learning Instructor: Prof. Ganesh Ramakrishnan

BANKO BIKANO COMMUNITY LED SANITATION CAMPAIGN IN BIKANER Under Nirmal Bharat Abhiyan Arti Dogra

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

MIGRA GRATION AND ND SO SOCI CIAL AL SECURI SECURITY MIGRA GRATION AND ND SO SOCI CIAL

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

HOW TO WRITE A CASE COMMENT Georgetown Journal of Gender and the Law Georgetown Journal of Legal

COMPARATIVE LAW JOURNAL OF THE PACIFIC JOURNAL DE DROIT COMPARE DU PACIFIQUE Le Comparative Law

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

Moving Agents in Formation in Congested Environments Jiaoyang Li, 1 Kexuan Sun, 1 Hang Ma, 2 Ariel

Allergy and Immunology Pearls for Clinical Practice 2017 Katherine Gundling, MD FACP Professor,

Matching Theory Mihai Manea MIT Based on slides by Fuhito Kojima. Market Design Traditional

Introduction to Insects PJ Liesch UW-Madison Dept. Entomology ! What%are%Insects?% !

CSC2556 Lecture 5 Matching - Stable Matching - Kidney Exchange [Slides: Ariel Procaccia]

Reminder CS 188: Artificial Intelligence Only a very small fraction of AI is about making

Aedes Aegypti control experiences and challenges Fabiano Geraldo Pimenta Jnior

Middleware Challenges Ahead Kurt Geihs, Goethe University Presented by Eric Leshay CS 525M

Sambuz

Useful Links

Newsletter

Mail Us

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan