What is Colored Trails

Colored Trails is a research test-bed for investigating decision-making in groups comrising people, computer agents, and a mix of these two.

Motivation

Widespread use of the Internet has fundamentally changed the ways in which we use computers, not only as individuals, but also within organizations. Settings in which many people and many computer systems work together, despite being distributed both geographically and in time, are increasingly the norm. Examples of group activities in which computer systems participate, whether as autonomous agents or as proxies for individual people, include online auctions, hospital care-delivery systems, emergency response systems, military systems, and systems administration groups. These increasingly prevalent, heterogeneous group activities of computer systems and people-- whether competitive, cooperative, or collaborative--frequently require decision-making on the part of autonomous-agent systems or the support of decision-making by people.

A wide variety of issues arise in the decision making that accompanies task-oriented group activities. How do groups decide who should perform which tasks in service of which goals? How are coalitions formed? How do agents coordinate their behavior? How do agents select an appropriate recipe for achieving a goal from among the set that might be applicable? How are tasks allocated among agents capable of performing them? How do agents reconcile their current intentions to perform actions in service of group goals with new opportunities that arise? How do agents avoid conflicts in their plans for achieving goals? How does altruistic or cooperative behavior arise in individual behaviors? What incentives can be provided to encourage cooperation, helpfulness, or other group-benefiting behaviors when agents are designed independently or serve different organizations?

Grosz and Kraus developed the game Colored Trails (CT) as a testbed for investigating the decision-making that arises in task settings, where the key interactions are among goals (individual and group), tasks required to accomplish those goals, and resources needed to perform the tasks. CT allows the modeling of all of these phenomena and exploration of their causes and ramifications. It provides the basis for development of a testbed that supports investigations of human decision-making and for computational strategies to be studied both in homogeneous groups comprising only people and in heterogeneous groups consisting of people and computer systems.

The proposed project would convert the existing code-base to make it robust and available for distribution to other researchers, extend the range of decision-making scenarios which could be modeled and tested using CT, and perform machine-learning research and testing of computerbased decision-making strategies in increasingly complex situations. The structure of CT provides the right basis for development of a decision-making testbed. The rules of CT are simple, abstracting from particular task domains, and thus enabling investigators to focus on understanding, modeling, and testing decision-making strategies rather than specifying and reasoning with domain knowledge. It is simpler to implement computer agents to play CT than games like Diplomacy. Nonetheless, CT has proven interesting for people to play. In its abstracting from "realworld" domains, CT is similar to the games developed in behavioral economics. However, CT abstracts less than typical economic games do. In particular, unlike behavioral economics games, it provides a clear analogue to task settings, and it provides situational contexts and interaction histories in which to make decisions. Thus, it takes into account the problems of framing effects (for example, the difference between describing a prisoner's dilemma game as "the Wall Street Game" and "the Community Game"), but does not reduce all decisions to choices between explicitlygiven utility values. As a result, some of the social factors that are known to influence human decision-making (e.g., inequality aversion, reciprocity) may be directly studied in CT, providing a better basis for the design of computational agents.

CT is parameterized in ways that allow for increasing complexity along a number of different dimensions that influence the performance of different approaches to decision making. It allows for specification of different reward structures, enabling examination of such trade-offs as the importance of the performance of others or the group as a whole to the outcome of an individual and the cost-benefit tradeoffs of collaboration-supporting actions. The game parameters may be set to vary environmental features such as task complexity, availability of and access to task-related information, and dependencies between agents. Although several testbeds and competitions have been developed to test strategies for automated agents operating in multi-agent settings, CT is the first testbed to be designed to investigate decision-making in heterogeneous groups of people and computer systems. It is thus novel in addressing the need to understand how computer agents should behave when they are participants in group activities that include people.

Each instance of CT includes a numeric scoring function, which can serve as a clear metric of evaluation of the performance of an agent playing the game, and therefore of the model that the agent uses and indirectly of the learning techniques that were used to derive the model. Our objective is, in any given experimental environment, to match the performance of humans participating in the identical environment. Our preliminary results for simple games show that it is possible not only to achieve such human-level performance but even to surpass it.

Structure of the game

Colored Trails (CT) is played by two or more players on a rectangular board of colored squares. The rules are simple: Each player is given a starting position, a goal position on the board, and a set of chips in colors taken from the same palette as the squares. Players may advance toward their goals by moving to an adjacent board square. Such a move is allowed only if the player has a chip of the same color as the square, and the player must turn in the chip to carry out the move. Players may negotiate with their peers to exchange chips. Communication is controlled; players used a fixed but expressive messaging protocol.

The scoring function, which determines the payoff to the individual players, is a parameter of CT game instances and may depend on a variety of factors. At its simplest, it may consist of a weighted sum of such components as: whether the individual reached the goal, the final distance of the agent from the goal, the final number of chips the agent held. The scoring function may be varied to reflect different possible social policies and utility trade-offs, establishing a context in which to investigate the effects of different decision-making mechanisms.

For example, by varying the relative weights of individual and group good in the scoring function we can make collaborative behavior may become more or less beneficial. Despite the simplicity of the rules, play of CT is able to model a broad range of aspects of task situations in which a group of agents perform actions; it allows for scenarios in which agents act as individuals or in teams or both. Traversing a path through the board corresponds to performing a complex task the constituents of which are the individual tasks represented by each square. Different colors represent different tasks. The existence of multiple paths to a goal corresponds to the availability of different "recipes" or methods for achieving goals. The possession of a chip of a particular color corresponds to having the skills and resources needed for a task, and being able to deploy them at the appropriate time. Not all players get chips of all colors much as agents have different capabilities, availability, and resources. The exchange of chips corresponds to agents producing the intended effects of actions for each other; in different domain settings an exchange could be taken to correspond to agents providing resources to each other, doing tasks for them, or enabling them in some other way to do tasks they otherwise could not do. The game environment may be set to model different knowledge conditions as well. For example, varying the amount of the board an agent can "see" corresponds to varying information about task constituents or resource requirements, whereas varying the information players have about each other's chips corresponds to varying information agents have about the capabilities of others. Various requirements on player objectives, goal squares, and paths correspond to different types of group activities and collaborative tasks. To distinguish cooperative settings from those in which agents act independently, the scoring function may have a significant factor in which an agent's reward depends on others' performance.

Mapping from task-environment decision problems to CT setting CT allows the modeling of all the decision-making questions listed above. We provide some simple illustrative examples here, but all of these questions and many others can be explored within instances of the CT framework. Consider the question of joint goals among agents. This kind of problem can be modeled by having the scoring function include a component that is maximized when all players complete their task, that is, only if the joint goal is satisfied. Each individual may still get a "base score" that can include the usual components (final number of chips, distance to the goal). Or, consider the question of conflict avoidance. When players have a significant overlap in their chip needs on one set of paths, but not on others, with the requisite chips being a scarce resource, the game inherently provides incentive for avoiding the conflicted paths. The initial game states can be selected so as to exhibit such situations with greater or lesser frequency in order to explore behaviors under such conditions. Finally, consider the question of altruistic behavior. Evidence for such behavior is possible just when some player has a chip or chips that can aid another player. Altruistic behavior in such situations may be pure altruism if no benefit accrues to the player, or based on reciprocity if in a multi-shot game context, or supported by a scoring function that rewards such behavior. Mapping standard economic game concepts to CT situations Many standard concepts from economic games can be modeled. Game concepts such as the prisoners' dilemma, public goods, ultimatum games, dictator games, trust, gift exchange, third party punishment, all may be exemplified in the CT framework. Moreover, they can be combined in new and affecting ways, while still being carefully controlled. Again, we provide a few representative examples. Most simply, the prisoners' dilemma corresponds to a game in which each player needs a chip that the other has, agreements are not enforceable, no communication other than sending chips is available, and there is a large reward for getting to goal. Ultimatum games (where a single agent makes a take-it-or-leave-it offer to which another agent responds) is another standard economic game with a natural CT instantiation. Here, we use a twoplayer CT instance in which both players have full information about the board and each other's chips. Players' scores are solely determined by their own individual outcomes and depend on the players' distance from their respective goals, as well as the chips that the players possess at the end of the game. Negotiation is restricted to a single round of offer and response. In fact, we have explored this game with human players, and describe our game-theoretically surprising results in a later section.