Capturing and Adapting Traces for Character Control in Computer Role - PDF document

Capturing and Adapting Traces for Character Control in Computer Role Playing Games Jonathan Rubin and Ashwin Ram Palo Alto Research Center 3333 Coyote Hill Road, Palo Alto, CA 94304 USA Jonathan.Rubin@parc.com , Ashwin.Ram@parc.com Abstract. We describe an architecture, in its early stages of development, that processes user traces in the domain of computer role playing games and utilises the resulting traces in order to control the behaviour of characters within the environment. Behaviour execution is handled via an online case-based planner, which dynamically adapts plans given dissimilarities between the learning and testing environments. The overall architecture is presented and we provide an example of applying the architecture to a 2D role playing game environment. We conclude with the future objectives of this work in progress. Our work builds heavily on previous research in the area of learning from demonstration and online case-based planning in real-time strategy games [1]. 1 Introduction In this paper, we detail the efforts of a work in progress in the area of learning from demonstration and case-based planning. We describe an architecture for performing online case-based planning within the domain of modern computer role-playing games. The overall purpose of the architecture we describe is to control game characters based on captured user traces. During demonstration a human expert controls a character within a virtual environment and a trace is recorded to capture their sequence of actions. The user traces gathered are combined with a real-time case-based planner, which results in the generation of similar strategies which can be used to influence the behaviour of autonomous agents within the environment. Our work builds on previous research efforts that have produced the Darmok [1, 2] and Darmok 2 [3, 4] systems. Darmok describes an architecture for performing online case-based planning based on capturing expert user traces. Darmok has been shown to be successful in producing coherent strategies, especially in the domain of real-time strategy games (RTS). The work we present here differs from Darmok in that it describes an architecture for controlling computer characters in role playing games (RPG). While Darmok 2 was able to be used as a general game player, it was particularly suited for playing RTS type games. The eventual goal of our work is to construct a system that controls one (or more) helpful, non-player characters (NPCs), which are able to aid human players with their goals and objectives in the domain of RPG games. 193

While our work is heavily influenced by research that has been conducted within the domain of RTS games, there are several important differences that result, given the modified objectives and the differences that exist between RTS and RPG domains. 1. To begin with, RTS environments are adversarial, whereas RPGs may not necessarily be so. While RPG games may contain adversarial scenarios (which are required to be handled by the system) the overall objective typically has more to do with space exploration and the appropriate selection of sequences of actions. 2. RTS games require the coordination of a team of agents, typically with the objective to destroy an enemy team. On the other hand, RPG games place a larger focus on the actions and goals of more well defined individual characters that exist within the environment. 3. Actions within RPG games are typically instantaneous as opposed to durative (as in RTS games). As such, there is less of a focus on the parallel management of durative actions (as in RTS games) and more of a focus on the appropriate selection of sequences of actions and goals to pursue. We refer to our architecture (and the system it produces) as Komrad 1 and the next section provides a high-level overview of its design. 2 Komrad Overview Figure 1 displays a high level overview of the current system architecture that consists of a training phase, a real-time planner, as well as adaptation & repair strategies applied within a particular environment. Each of these sections are described in more detail. 2.1 Training During the initial training phase, a user is able to demonstrate behavior to the system by navigating and controlling a character within the current environment. An environment is composed of a collection of entities, together with a set of actions, which are able to be performed by the user in order to modify the current world state. During this initial training episode traces are captured, which record each action that was chosen by the user. In addition to the set of possible actions that a user can take within the environment, a collection of goals are also specified that reflect more sophisticated milestones achieved by the user during their interaction with the environment. At present, all goals are required to be pre-specified and known beforehand. However, one of the eventual objectives of our research is to remove this as- sumption. The series of actions that result from recording a trace episode are processed into cases. Cases have the following representation: 1 Darmok backwards. 194

Fig. 1. Highlevel overview of the Komrad architecture. 195

C = ( W, G ( E ) , S ) Where, W , captures the current world state at the time goal, G ( E ), was achieved by the user and, S , is the sequence of actions or sub-goals that led to the achievement of the goal. The G ( E ) notation further highlights that each goal also specifies a single entity within the environment that it acts upon. Entities are described by a collection of attribute value pairs. The cases produced become the foundation for controlling the behaviour of an autonomous character that will reflect the style of play of the original expert who was used to capture the trace. 2.2 Real-Time Planner In order to control the behaviour of an autonomous agent the system architecture depicted in Figure 1 includes a real-time planner (top right). The planner within the current architecture functions by maintaining a goal stack and an action stack, where the actions currently present on the stack are required to be performed in order to achieve the goal at the top of the goal stack. At the start of a planning episode a single goal is placed onto the goal stack. During the episode the planner is continually queried for the next action it recommends. If there are currently no actions on the action stack, the goal at the top of the goal stack is decomposed into the sequence of actions and/or sub- goals that are required to achieve the particular goal (recall that this information was captured within a case in the case-base). In order to decompose the goal at the top of the stack, the case-base is searched for stored cases whose goals ( G ( E )) and world state ( W ) are similar to the current environment. Once an appropriate case has been found, the sequence of actions or sub-goals recorded by the retrieved case ( S ) are placed onto their appropriate stack within the planner. Goal decomposition continues until at least one action is present on the action stack, at which point the action at the top of the stack is returned by the planner. A goal is removed from the goal stack once all the actions required to be performed to achieve the goal have been popped off the action stack. One limitation of the current architecture is that goals can only be decomposed into a sequence of sub-goals or a sequence of actions, but not a mixture of the two, as this could result in obfuscating the order in which actions should be performed, according to the user traces. 2.3 Adaptation & Repair Once an action is retrieved from the action stack it is ready to be performed in the current environment. However, as the current environment is likely to be different from the environment initially encountered when gathering traces, 196

Capturing and Adapting Traces for Character Control in Computer Role - PDF document

Capturing and Adapting Traces for Character Control in Computer Role Playing Games Jonathan Rubin and Ashwin Ram Palo Alto Research Center 3333 Coyote Hill Road, Palo Alto, CA 94304 USA Jonathan.Rubin@parc.com , Ashwin.Ram@parc.com Abstract. We

Capturing Traffic Traces with Ground- Capturing Traffic Traces with Ground- Truth Information

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

Curriculum on Character Development Character in Leadership Character Development Agenda

m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. Adapting the covariance

Traces Exist (Hypothetically)! Carl Pollard Structure and Evidence in Linguistics Workshop in

MANAGING AGENCY PARTNERS Syndicate 2791 Adapting to ECF a practical solution Adapting to ECF

Character Education at Character Education at Northampton Academy An Academy of Character and

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

- Character set - Character escape conventions - Canonical form - Line editing conventions

Strings II Review Strings are stored character by character. Can access each character

ELLIPSOID : traces on the coordinate planes are ellipses 2 2 2 x 2 y 2 z 2 = 1 a b c

Eligibility Traces Chapter 12 Eligibility traces are Another way of interpolating between MC and

LN-10 Searching for traces of Norwegian economic thought Intro When trying to find the traces of

The evaluation of evidence relating to traces of cocaine on banknotes Amy Wilson September 2015

Modeling interaction traces of an online panel From raw interaction traces to actionable

MDD FOR G ENERATING RPG G AMES FOR M OBILE P HONES Eduardo Marques , Valte Balegas, Bruno

Parallel Computing with R using GridRPC Junji NAKANO Ei-ji NAKAMA The Institute of

Scaling up Partial Evaluation for Optimizing The Sun Commercial RPC Protocol Gilles Muller Nic

Remote Procedure Call Concept (RPC) Prof. Chuan-Ming Liu Computer Science and Information

MATH 20: PROBABILITY Midterm 1 Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020 Ex Exam

36.1 Relaxed Planning Graphs 34. Planning Formalisms 35.36. Planning Heuristics: Delete

On Generating Polygons: Computational Geometry and Applications Lab Introducing the Salzburg

A Gamified Approach to Nave Bayes Classification: A Case Study for Newswires and Systematic