Plan, Activity, and Intent Recognition: Theory and Practice, FIRST EDITION (2014)
Part IV. Multiagent Systems
Chapter 9. Multiagent Plan Recognition from Partially Observed Team Traces
Hankz Hankui Zhuo, Sun Yat-sen University, Guangzhou, China
Abstract
Multiagent plan recognition (MAPR) aims to recognize team structures and team behaviors from the observed team traces (action sequences) of a set of intelligent agents. This chapter introduces the problem formulation of MAPR based on partially observed team traces and presents a weighted MAX-SAT-based framework to recognize multiagent plans from partially observed team traces. This framework spans two approaches, MARS (MultiAgent plan Recognition System) and DARE (Domain model-based multiAgent REcognition), with respect to different input knowledge. MARS requires as input a plan library, while DARE requires as input a set of previously created action models. Both approaches highlight our novel computational framework for multiagent plan recognition.
Keywords
Multiagent plan recognition (MAPR)
Team traceTeam plan
MAX-SAT
Action model
Acknowledgment
Hankz Hankui Zhuo thanks Professor Subbarao Kambhampati for suggesting combining the MARS and DARE algorithms initially. Zhuo’s research is supported in part by the National Natural Science Foundation of China (No. 61309011) and Research Fund for the Doctoral Program of Higher Education of China (No. 20110171120054), and Guangzhou Science and Technology Project (No. 2011J4300039).
9.1 Introduction
Plan recognition [14], as an inverse problem of plan synthesis, can be used for a variety of applications including natural language pragmatics [18], story understanding [30], psychological modeling [24], intelligent computer system interfaces [11], and report-based monitoring [13]. The plan-recognition problem has been addressed by many algorithms. For example, Kautz proposed an approach to recognize plans based on parsing viewed actions as sequences of subactions and essentially model this knowledge as a context-free rule in an “action grammar” [15]. Bui et al. presented approaches to probabilistic plan-recognition problems [7,10]. Instead of using a library of plans, Ramirez and Geffner proposed an approach to solving the plan-recognition problem using slightly modified planning algorithms [20]. These systems all focus on single-agent plan recognition.
Multiagent plan recognition (MAPR) seeks an explanation of observed team-action traces. From the set of team traces of a set of agents, MAPR aims to identify the team structures and behaviors of agents that are probably changing over time [29]. Many approaches have been proposed in the past to automatically recognize team plans given an observed team trace as input. For example, Avrahami-Zilberbrand and Kaminka presented a Dynamic Hierarchical Group Model (DHGM), which indicated the connection between agents and tracked the dynamically changed structures of groups of agents [2]. Banerjee et al. [4] proposed to formalize MAPR with a new model. This algorithm solved MAPR problems using a first-cut approach, provided that a fully observed team trace and a library of full team plans were given as input. Furthermore, to consider the relationship between activities, Banerjee and Kraemer [3] introduced a branch and price approach to solve multiagent plan recognition problems that accommodated activity interleaving and plan abandonment given a library of plan graphs.
These systems function well by assuming that team traces and team plans are complete (or fully observed); that is, missing values (denoted by null, indicating activities that are not observed) are not allowed. In many real-world applications, however, it is often difficult to observe full team traces or to collect full team plans because of environmental or resource limitations. For example, in military operations, it may be difficult to observe every activity of teammates, since they sometimes need to hide when their enemies are attacking. As another example, in teamwork at a company, there may not be sufficient sensors to set in every possible place to observe all activities of teammates. Thus, it is important to design a novel approach to solving the problem so that team traces (and team plans) can be fully observed.
We are aware of systems that are capable of recognizing multiagent plans from noisy observations (i.e., observations may be incorrect or missing), such as STABR [26,28] and the system proposed by Sadilek and Kautz [22]. These systems exploit data mining-based techniques (e.g., classification) provided a set of “numerical” data are available; however, in this chapter we focus on a symbolic relation (e.g., STRIPS [8]) technique that assumes numerical data are not available. To the best of our knowledge, there is no previous work using the symbolic relation scenario—that is, where missing values are allowed in the observed team traces when numerical data are unavailable.
This chapter addresses the new MAPR problem that allows team traces to be partially observed (i.e., there are missing values in them). We introduce two algorithm frameworks to solve this problem with different types of auxiliary input knowledge (i.e., a library of partial team plans and a set of action models, respectively). The first algorithm, called MARS [33], which stands for MultiAgent plan Recognition System, takes a library of partial team plans as the auxiliary input knowledge. The second algorithm, called DARE [35], which stands for Domain model-based multiAgent REcognition, takes a set of Stanford Research Institute Problem Solver (STRIPS) [8] action models as the auxiliary input knowledge. Note that action models can be created by experts or learned by previous systems such as ARMS [31] and LAMP [34]. Our approaches can be applied to domains (e.g., military operations, virtual laboratories [1], and other similar applications), where we assume activities are modeled with preconditions and effects (i.e., action models described by a formal language such as STRIPS [8], by domain experts, or by state-of-the-art learners).
To solve the MAPR problem, both algorithms take advantage of the maximum satisfiability (MAX-SAT) framework [5,16,32]. Specifically, MARS first builds a set of candidate occurrences (i.e., a possible case that a team plan occurs in the team trace). After that, it generates sets of soft constraints and hard constraints based on candidate occurrences. Finally, it solves all these constraints using a state-of-the-art weighted MAX-SAT solver, such as MaxSatz [16], and converts the output of the solver to the solution of our MAPR problem.
Likewise for DARE, it first builds a set of hard constraints that encode the correctness property of the team plans and a set of soft constraints that encode the optimal utility property of team plans based on the input team trace and action models. After that, it solves all these constraints using a state-of-the-art weighted MAX-SAT solver, such as MaxSatz [16], and converts the solution to a set of team plans as output.
The rest of the chapter is organized as follows. In Section 9.2, we first introduce preliminaries related to weighted MAX-SAT problems and multiagent plan STRIPS-based planning. Then in Section 9.3 we discuss the recognition problem and details of the MARSalgorithm. After that we introduce the details of DARE in Section 9.4. We show the experimental results and present related work in Sections 9.5 and 9.6, respectively. Finally, we conclude the chapter with discussion on future work.
9.2 Preliminaries
This section introduces preliminaries related to our problems—that is, the MAX-SAT problem and multiagent STRIPS-based planning.
9.2.1 MAX-SAT Problem
In computer science, satisfiability (often abbreviated SAT) is the problem of determining whether there exists an interpretation that satisfies the formula. In other words, it establishes whether the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to true. Equally important is to determine whether no such assignments exist, which would imply that the function expressed by the formula is identically false for all possible variable assignments. In this latter case, we would say that the function is unsatisfiable; otherwise, it is satisfiable. To emphasize the binary nature of this problem, it is often referred to as Boolean or propositional satisfiability.
SAT is known to be NP-complete [9], but the flip side is that it is very powerful in its representational ability: any propositional logic formula can be rewritten as a conjunctive normal form (CNF) formula. A CNF formula is a conjunction of clauses. A clause is a disjunction of literals. A literal is a variable or its negation . A variable may take values 0 (for false) or 1 (for true). The length of a clause is the number of its literals. The size of , denoted by , is the sum of the length of all its clauses. An assignment of truth values to the variables satisfies a literal if takes the value 1, satisfies a literal if takes the value 0, satisfies a clause if it satisfies at least one literal of the clause, and satisfies a CNF formula if it satisfies all the clauses of the formula. An empty clause contains no literals and cannot be satisfied. An assignment for a CNF formula is complete if all the variables occurring in have been assigned; otherwise, it is partial.
The MAX-SAT problem for a CNF formula is the problem of finding an assignment of values to variables that minimizes the number of unsatisfied clauses (or equivalently, that maximizes the number of satisfied clauses). Two CNF formulas, and , are equivalent if and have the same number of unsatisfied clauses for every complete assignment of and . There are many solvers for MAX-SAT problems (e.g., MaxSatz [17]); we used the weighted version of MaxSatz1 in our implementation.
9.2.2 Multiagent STRIPS-Based Planning
In artificial intelligence (AI), STRIPS was developed by Richard Fikes and Nils Nilsson [8]. This language is the basis for most of the languages for expressing automated planning problem instances in use today; such languages are commonly known as action languages. A STRIPS action model is a quadruple , where is an action name with 0 or more parameters, is a list of preconditions specifying the conditions that should be satisfied when applying action is a list of adding effects specifying the propositions added after applying action , and is a list of deleting effects specifying the propositions deleted after applying action . An example STRIPS action model pickup is as follows:
Name: (pickup ?x - block)
Pre: (ontable ?x) (handempty) (clear ?x)
Add: (holding ?x)
Del: (ontable ?x) (handempty) (clear ?x)
A STRIPS problem is composed of a triple , where a set of STRIPS action models, is an initial state that is composed of a set of propositions, is a goal state that is also composed of a set of propositions. A solution (or a plan) to the problem is a sequence of actions.
An expressible problem in an MA-extension of the STRIPS language is called MA-STRIPS. Formally, an MA-STRIPS planning problem for a system of agents is given by a quadruple [6], where
• is a finite set of atoms (also called propositions), encodes the initial situation, and encodes the goal conditions.
• For is the set of action models that the agent is capable of performing. Each action model in , where , has the standard STRIPS syntax and semantics; that is, each action model is composed of .
A solution to an MA-STRIPS problem is a plan that is composed of a sequence of ordered actions . These actions are executed by different agents to project the initial state to the goal .
9.3 Multiagent Plan Recognition with Plan Library
This section introduces the first algorithm, MARS, which takes as input a library of team plans. We will first describe the formulation of the recognition problem and then present the details of the MARS algorithm.
9.3.1 Problem Formulation
Considering a set of agents , a library of team plans of agents is defined as a set of matrices. Specifically, each team plan2 is an matrix, where , and is the number of total timesteps. is an activity that is expected to be executed at time by agent , where . The value of each team plan is associated with a utility function .
An observed team trace executed by agents is a matrix . is the observed activity executed by agent at timestep , where . A team trace is partial if some elements in are empty (denoted by null) (i.e., there are missing values in ). We define occurrence (we slightly revised the original definition given by Benerje et al. [4]) as follows:
Definition 9.1 Occurrence A team plan (submatrix) is said to occur in a matrix if contiguous rows and columns (say , a -selection in any order from agent indices) can be found in such that
where is the observed activity of row and column in .
We denote an occurrence by , where is the start position of the occurrence in is a team plan id, and is the agent sequence as is described in Definition 9.1.
Our MAPR problem can be defined as follows: Given as input a team trace and a library of team plans, both of which may have missing values, our algorithm outputs a set of occurrences that satisfies the following conditions C1 through C3:
C1: All occurrences in occur in the team trace.
C2: For each activity in occurs in exactly one occurrence in .
C3: The total utility of is optimal; that is, is the maximal utility that can be obtained.
We show an example of our recognition problem in Figure 9.1. In the team trace, are agents. is a timestep (()). are activities. indicates the activity that is missing (note that we do not allow noisy activities in the team trace). are four team plans that compose a library. In the output, there are five occurrences that exactly cover the team trace that is input.
FIGURE 9.1 An example of our recognition problem.
9.3.2 The MARS Algorithm
An overview of the MARS algorithm is shown in Algorithm 9.1. We will present each step of the algorithm in detail in Sections 9.3.3 through 9.3.6.
Algorithm 9.1 An overview of the algorithm
9.3.3 Creating Candidate Occurrences
In Step 1 of Algorithm 9.1, we first create a set of candidate occurrences, denoted by , by scanning all the team plans in . Each candidate occurrence is probably an occurrence that composes the final solution to the MAPR problem (i.e., probably holds). We call a candidate occurrence a solution occurrence if . We describe the creation process in Algorithm 9.2.
Algorithm 9.2 )
In Step 4 of Algorithm 9.2, is a -selection in any order from agent indices ( is the number of columns in ), as is described in Definition 9.1. Note that since there may be different -selections, such that occurs in (because there may be different columns with the same values), we need to search all the possible -selections to create all possible candidate occurrences. For instance, in Figure 9.1, there are two possible -selections (or equivalently, ) for starting at the position in . As a result, we can build all the candidate occurrences , as is shown in Table 9.1, with inputs given by Figure 9.1.
Table 9.1
An Example of Candidate Occurrences That Can Be Created by Algorithm 9.2
,
,
,
}
9.3.4 Generate Soft Constraints
For each candidate occurrence , we conjecture that it could be possibly one of the final set of solution occurrences . In other words, for each candidate occurrence , the following constraint could possibly hold: . We associate this constraint with weight to specify that it is not to be 100% true. We call these kind of constraints soft constraints (denoted by ).
We calculate weight of candidate occurrence with the following equation:
(9.1)
where the first term is the observing rate (defined later), describing the degree of confidence we have on the candidate occurrence . Generally, we assume that the more activities being observed, the more confidence we have in the happening of . The second term is the utility of that needs to be considered as an impact factor in order to maximize the total utility. Note that is the team plan in . The observing rate is defined by
where is the total number of actions of team plan is the number of in the scope of restricted by is the number of in .
For example, consider the occurrence that is shown earlier in Figure 9.1. The number of is 6 (i.e., ) . The number of is 1, corresponding to the occurrence in the observed team trace (i.e., ) . The number of in is 2 (i.e., ). Thus, we have
It is easy to find that is equivalent to 0 when , resulting in . This suggests that the occurrence cannot be a solution occurrence if none of its actions are observed. This is not true according to our MAPR problem definition. Thus, we relax this constraint by revising as follows:
(9.2)
where “1” can be interpreted as if there is a virtual action (that makes the occurrence happen) that can always be observed.
9.3.5 Generating Hard Constraints
According to condition C2, each element of should be covered by exactly one solution occurrence. In this step, we seek to build hard constraints to satisfy this condition. To do this, we first collect an occurrence subset of for each element , such that is covered by all the occurrences in the subset. We use to denote this subset and to denote the collection of all the subsets with respect to different elements of ; thst is, . The detailed description can be found from Algorithm 9.3. Note that collection has different elements, which is guaranteed by the union operator in Step 10 of Algorithm 9.3.
Algorithm 9.3 Build a collection of subsets of candidate occurrences in
With , we generate hard constraints to guarantee condition C2 as follows. For each subset , there is only one occurrence that belongs to ; that is, the proposition variable “” is assigned to be true. Formally, we have the following constraints:
where the term indicates all occurrences, which are different from , do not belong to . Furthermore, we have the following constraints with respect to :
(9.3)
We set the weights for this kind of constraint, denoted by , with “high” enough values to guarantee the constraints are hard. We empirically choose the sum of the weights of soft constraints as this “high” value.
9.3.6 Solving Constraints
With Steps 2 and 3 of Algorithm 9.1, we have two kinds of weighted constraints (i.e., soft constraints, , and hard constraints, ). In this step, we put and together and solve them using a weighted MAX-SAT solver. In the experiment, we would like to test two different cases using, or not using, the observing rate function .
We introduce a new parameter, , and revise Eq. 9.1 to a new equation as follows:
(9.4)
If is 1, Eq. 9.4 is reduced to Eq. 9.1; otherwise, Eq. 9.4 is reduced to . Our evaluation shows that is helpful in improving the recognizing accuracy in the experiment section.
The solving result of the weighted MAX-SAT solver is a set of assignments (true or false) to proposition variables, such that
If a proposition variable “” is assigned to be true, is one of the solution occurrences output by MARS; otherwise, is not output by MARS.
9.3.7 Discussion
In this section, we discuss the properties of our MARS algorithm related to completeness and soundness.
Property 9.1 Completeness
The completeness of MARS depends only on the completeness of the weighted MAX-SAT solver (i.e., given an MAPR problem that is solvable). MARS can output a solution to this problem if the weighted MAX-SAT solver is complete.
The sketch of the proof can be presented as follows. For each solvable MAPR problem, we can encode the problem with constraints and in polynomial time using Steps 1 though 3 of Algorithm 9.1. Furthermore, if the weighted MAX-SAT solver is complete, it can successfully solve these constraints (by Step 4) and output a solving result, which can be converted to the solution to the MAPR problem in polynomial time (by Step 5).
Property 9.2 Soundness Given an MAPR problem, if the is set to be 0 in Eq. 9.4 , the output of MARS is the solution to the MAPR problem.
The sketch of the proof can be described as follows. From Step 1 of Algorithm 9.1, we can see that the candidate occurrences , covered by the observed team trace, are all from the library of team plans, which satisfies the first condition, C1. From Step 2, if is set to be 0, the weights of soft constraints are determined by the utility function . Furthermore, the solution output by the weighted MAX-SAT solver maximizes the total weights (i.e., done by Step 4), which suggests the second condition, C3, is also satisfied. Finally, the third condition, C3, is satisfied by the hard constraints established by Step 3. Thus, the output of MARS satisfies C1 through C3, which means it is the solution to the MAPR problem.
9.4 Multiagent Plan Recognition with Action Models
The MARS algorithm assumes that a library of team plans has been collected beforehand and provided as input. However, there are many applications where collecting and maintaining a library of team plans is difficult and costly. For example, in military operations, it is difficult and expensive to collect team plans, since activities of teammates may consume lots of resources (e.g., ammunition and human labor). Collecting a smaller library is not an option because it is not feasible to recognize team plans if they are not covered by the library. Thus it is useful to design approaches for solving the MAPR problem where we do not require libraries of team plans to be known. This section introduces the DARE algorithm that takes as input a set of action models instead of a library of team plans. We first define the input and output of the recognition problem and then present the details of DARE.
9.4.1 Problem Formulation
We first define an action model in the MA-STRIPS language [6], as is described in Section 9.2.2. An action model is a tuple , where is an action name with 0 or more parameters, is a list of preconditions of is a list of add effects, and is a list of deleting effects. A set of action models is denoted by . An action name with 0 or more parameters is called an activity. An observed activity in a partial team trace is either an instantiated action of or noop or null, where noop is an empty activity that does nothing.
An initial state is a set of propositions that describes a closed world state from which the team trace starts to be observed. In other words, activities at timestep can be applied in the initial state . When we say an activity can be applied in a state, we mean the activity’s preconditions are satisfied by the state. A set of goals , each of which is a set of propositions, describes the probable targets of any team trace. We assume and can both be observed by sensing devices.
A team is composed of a subset of agents . A team plan is defined as
where is an activity or noop. A set of correct team plans is required to have properties P1 through P5.
P1: is a partition of the team trace ; that is, each element of should be in exactly one of and each activity of should be an element of .
P2: should cover all the observed activities; that is, for each , if , then , where .
P3: is executable starting from and achieves some goal ; that is, is executable in state for all and achieves after Step , where .
P4: Each team plan is associated with a likelihood . specifies the likelihood of recognizing team plan and can be affected by many factors, including the number of agents in the team, the cost of executing , and so on. The value of is composed of two parts, ; that is,
where depends on , the number of activities of depends on , the number of agents (i.e., teammates) of . Generally, (or ) becomes larger when (or ) increases. Note that more agents would have a smaller likelihood (or larger cost) of coordinating these agents in order to successfully execute . Thus, we require that should satisfy this condition:
For each goal , the output plan should have the largest likelihood; that is,
where is a team-plan set that achieves . Note that we presume that teams are (usually) organized with the largest likelihood.
P5: Any pair of interacting agents must belong to the same team plan. In other words, if agent interacts with another agent (i.e., provides or deletes some conditions of ), then should be in the same team, and activities of agents in the same team compose a team plan. Agents exist in exactly one team plan; that is, team plans do not share any common agents.
Our multiagent plan recognition problem can be stated as: Given a partially observed team trace , a set of action models , an initial state , and a set of goals , the recognition algorithm must output a set of team plans with the maximal likelihood of achieving some goal , where satisfies the properties P1 through P5.
Figures 9.2 and 9.3 show example input and output of our MAPR problem from blocks.3 In part (a) of Figure 9.2, the first column indicates the timesteps from 1 to 5. are five hoist agents. The value null suggests the missing observation, and noop suggests the empty activity. We assume are defined by . Based on , the corresponding output is shown in Figure 9.3, which is the set of two team plans .
FIGURE 9.2 An example of the input for MAPR problem from the blocks domain. Note: (a) is an observed team trace, (b) is a set of propositions: {(ontable A)(ontable B)(ontable F)(ontable G)(on C B)(on D F)(on E D)(clear A)(clear C)(clear E)(clear G)(handempty)}, (c) is a goal set composed of one goal , and is composed of propositions: {(ontable A)(ontable D)(ontable E)(on B A)(on C B)(on F D)(on G F)(clear C)(clear G)(clear E)}, and (d) is a set of action models.
FIGURE 9.3 An example of the output of the MAPR problem from the blocks domain. Note that this is a set of team plans (a) team plan and (b) team plan .
9.4.2 The DARE Algorithm
Algorithm 9.4 below describes the plan recognition process in DARE. In the subsequent subsections, we describe each step of this algorithm in detail.
Algorithm 9.4 An overview of our algorithm framework
9.4.3 Candidate Activities
In Step 3 of Algorithm 9.4, we build a set of candidate activities by instantiating each parameter of action models in with all objects in the initial state , team trace , and goal . We perform the following phases. First, we scan each parameter of propositions (or activities) in and collect sets of different objects. Note that each set of objects corresponds to a type (e.g., there is a type “block” in the blocks domain). Second, we substitute each parameter of each action model in with its corresponding objects—the correspondence relationship is reflected by type (i.e., the parameters of action models and objects should belong to the same type); this results in a set of different activities, called candidate activities . Note that we also add a noop activity in .
For example, there are seven objects {A, B, C, D, E, F, G} corresponding to type “block” in Figure 9.2 . The set of candidate activities is: {noop, pickup(A), pickup(B), pickup(C), pickup(D), pickup(E), pickup(F), pickup(G), …}, where the “dots” suggest other activities that are generated by instantiating parameters of actions “putdown, stack, unstack.”
9.4.4 Hard Constraints
With the set of candidate activities , we build a set of hard constraints to ensure the properties P1 to P3 in Step 4 of Algorithm 9.4. We associate each element with a variable ; that is, we have a set of variables, such that
which is also called a variable matrix. Each variable in the matrix will be assigned a specific activity in candidate activities , and we partition these variables to attain a set of team plans that have the properties P1 through P5 based on the assignments. According to properties P2 and P3, we build two kinds of hard constraints: observation constraints and causal-link constraints. Note that P1 is guaranteed since the set of team plans output is a partition of the team trace.
Observation constraints: For P2— given a team plan composed of agents , if , then . This suggests should have the same activity of if , since the team plan is a partition of and is an element of of . Thus, we build hard constraints as follows. For each and , we have
We call this kind of hard constraints the observation constraints, since they are built based on the partially observed activities of .
Causal-link constraints: For P3—each team plan should be executable starting from the initial state , suggesting that each row of variables should be executable, where . Note that “executable” suggests that the preconditions of should be satisfied. This means, for each , the following constraints should be satisfied:
• Each precondition of either exists in the initial state or is added by and is not deleted by any activity between and , where and .
• Likewise, each proposition in goal either exists in the initial state or is added by and is not deleted by any activity between and , where and .
We call this kind of hard constraints causal-link constraints, since they are created according to the causal-link requirement of executable plans.
9.4.5 Soft Constraints
In Step 5 of Algorithm 9.4, we build a set of soft constraints based on the likelihood function . Each variable in can be assigned any element of the candidate activities . We require that all variables in be assigned exactly one activity from . For each , we have
We calculate the weights of these constraints by the following phases. First, we partition the variable matrix based on property P5 into a set of team plans ; that is, agent provides or deletes some conditions of , then and should be in the same team, and activities of agents in the same team compose a team plan. Second, for all team plans, we calculate the total likelihood ; that is,
and let be the weight of the soft constraints. Note that we aim to maximize the total likelihood when solving these constraints (together with hard constraints) with a weighted MAX-SAT solver.
9.4.6 Solving Constraints
In Step 6 of Algorithm 9.4, we put both hard and soft constraints together and solve these constraints using MaxSatz [16], a MAX-SAT solver. The solution is an assignment for all variables in , and is the total weight of the satisfied constraints corresponding to the solution . In Step 8 of Algorithm 9.4, we partition into a set of team plans based on P5.
As an example, earlier in part (a) of Figure 9.2, the team trace’s corresponding variable in is assigned activities, which means the null values in Figure 9.2(a) are replaced with the corresponding assigned activities in V. According to property P5, we can simply partition the team trace into two team plans, as was shown before in Figure 9.3, by checking preconditions and effects of activities in the team trace.
In MAX-SAT, a proposition variable may take values false or true. A literal is either a variable or its negation . A clause is a disjunction of literals, and a CNF formula is a conjunction of clauses. An assignment of truth values to the propositional variables satisfies a literal if takes the value true and satisfies a literal if takes the value false. An assignment satisfies a clause if it satisfies at least one literal of the clause, and it satisfies a CNF formula if it satisfies all the clauses of the formula. An assignment for a CNF formula is complete if all the variables occurring in have been assigned; otherwise, it is partial.
The MAX-SAT problem for a CNF formula is the problem of finding an assignment of values to propositional variables that minimizes the number of unsatisfied clauses (or equivalently, that maximizes the number of satisfied clauses). The MAX-SAT problem is NP-hard, since the Boolean satisfiability problem, which is NP-complete, can be easily reduced into MAX-SAT.
One extension to MAX-SAT is weighted MAX-SAT, which asks for the maximum weight that can be satisfied by any assignment given a set of weighted clauses. There have been many approaches that solve weighted MAX-SAT problems, such as MaxSatz [16,17], which implements a lower-bound computation method that consists of incrementing the lower bound by one for every disjoint inconsistent subset that can be detected by unit propagation.
9.4.7 Properties of DARE
DARE can be shown to have the following properties:
Theorem 9.1 Conditional Soundness
If the weighted MAX-SAT solver is powerful enough to optimally solve all solvable SAT problems, DARE is sound.
Theorem 9.2 Conditional Completeness If the weighted MAX-SAT solver we exploit in DARE is complete, DARE is also complete.
For Theorem 9.1, we only need to check that the solutions output by DARE satisfy P1 through P5. P2 and P3 are guaranteed by observation constraints and causal-link constraints; P4 is guaranteed by the soft constraints built in Section 9.4.5 and the MAX-SAT solver; P1 and P5 are both guaranteed by the partition step in Section 9.4.6 (i.e., partitioning the variable matrix into a set of team plans); that is to say, the conditional soundness property holds.
For Theorem 9.2, since all steps in Algorithm 9.4, except Step 6 that calls for a weighted MAX-SAT solver, can be executed in finite time, the completeness property only depends on the weighted MAX-SAT solver; this means that the conditional completeness property holds.
9.5 Experiment
In this section we first evaluate MARS in different settings. After that, we compare MARS and DARE with respect to a different number of agents and a different percentage of null values.
9.5.1 Evaluation of MARS
We follow the experimental method prescribed by Banerjee et al. [4] to generate a set of multiagent plan recognition problems. For generating an MAPR problem, we first generate a random team trace with dimensions (i.e., 100 timesteps and 50 agents). Each element of the team trace belongs to a set of activities with . We randomly partition the team trace into a set of team plans that initiate the members of the library of team plans . This guarantees there is a solution to the MAPR problem. We generate a set of such problems , where . After that we add random team plans to library of each MAPR problem to enlarge the library. We test different values of from to vary the size of the library. We also test different percentages of random missing values from for each team trace and team plan. Note that “” indicates that there are randomly 10% of values that are missing in each team trace and team plan, likewise for other . To define the utility function of team plans, we associate each team plan with a random utility value.
To evaluate MARS, we define a recognizing accuracy as follows. For each MAPR problem with a specific value and (without any missing values), we solve the problem using MARS and denote the solution by . After that, we revise the problem by setting to another percentage to get a new problem . We solve and get a solution . If is the same as , the function with respect to is set to be 1; otherwise, is set to be 0. The recognizing accuracy with respect to and can be defined as follows:
(9.5)
As a special case, when is 0%.
9.5.1.1 Experimental results
We would like to test the following four aspects of MARS:
1. The recognizing accuracy with respect to different percentages of missing values.
2. The recognizing accuracy with respect to different numbers of randomly added team plans (referred to as “team plans” for simplicity).
3. The number of generated clauses with respect to each percentage.
4. The running time with respect to different percentages of missing values.
We present the experiment results in these aspects next.
Varying the percentage of missing values
To evaluate that MARS functions well in missing-value problems, we set the number of team plans to be 80 and vary the percentage of missing values from to see the recognizing accuracy defined earlier by Eq. 9.5. For each , we run five random selections to calculate an average accuracy. The results are shown in Figure 9.4, where the curve denoted by indicates the result obtained by setting in Eq. 9.4, likewise for the curve denoted by .
FIGURE 9.4 The recognizing accuracy with respect to different percentages of missing values ( is set to be 80).
From Figure 9.4, we can see that the accuracy generally decreases when the percentage increases, no matter whether is 1 or not. This is expected because missing values can provide information that may be exploited to find accurate occurrences. The more values that are missing, the more information is lost. Considering the difference between two curves, we can find that the accuracy is generally larger when than that when . The results are statistically significant; we performed the Student’s t-test and the result is 0.0482, which indicates that the two curves are significantly different at the 5% significance level. This suggests that the observing rate function is helpful in improving the accuracy. We can also find that the difference becomes larger when more values are missing, which indicates that is more significant then. By observation, MARS generally functions well in handling missing values when less than 30% of values are missing, where there is no less than an accuracy of 0.9 by setting .
Varying the number of team plans
We now analyze the relationship between recognizing accuracy and the size of team plans. We set the percentage to be 30% and vary the size of team plans to see the change of accuracies. We show the results in Figure 9.5. From the curves, we can see that the accuracy is generally reduced when the number of team plans increases. This is consistent with our intuition, since more “uncertain” information is introduced as more team plans are added (each has 30% of missing values that introduce uncertain information).
FIGURE 9.5 The recognizing accuracy with respect to different numbers of team plans ( is set to be 30%).
We also observe that the curve denoted by is generally better than the one denoted by , and the difference between two curves becomes sharp as the size of team plans increases. This indicates that MARS exploits the observing rate function (corresponding to ) and performs better than the one that does not exploit it, especially when the size of team plans is large.
From Figures 9.4 and 9.5, we can conclude that in order to improve the recognizing accuracy, we should better exploit the observing rate function to control weights of constraints as is given in Eq. 9.1, especially when the percentage of missing values and the size of team plans are large.
The generated clauses
We would like to verify that the generated constraints and would not increase too fast when the percentage of missing values increases. We record the total number of clauses, which correspond to and , with respect to each percentage of missing values. Note that the clauses obtained from and are disjunctions of propositions that can be solved directly by a weighted MAX-SAT solver. The results are shown in Table 9.2, where the first column is the number of team plans and the other columns are numbers of clauses corresponding to each percentage of missing values. For instance, in the second row and the last column, “392” is the number of clauses generated from 20 team plans with 50% of missing values. Note that the numbers of clauses in the table are average results over 3000 MAPR problems. We observe that we can fit the performance curve with a polynomial of order 2.
Table 9.2
Average Numbers of Clauses with Respect to Different Percentages of Missing Values
As an example, we provide the polynomial for fitting the numbers of clauses of the last row in Table 9.2, which is , where is the number of percentage points (e.g., in the last column of the table) and is the number of clauses. This suggests that MARS can handle MAPR problems with missing values well since the number of clauses would not increase too fast when missing values increase. Note that clauses increasing fast may make the weighted MAX-SAT solver have difficulty, or even fail to solve. Likewise, we can also verify that the number of clauses would not increase fast when the size of team plans increases.
9.5.2 Comparison between MARS and DARE
To compare MARS and DARE, we test them in three planning domains: blocks, driverlog4 and rovers. We modify the three domains for multiagent settings. In blocks, there are multiple hoists, which are viewed as agents that perform actions of pickup, putdown, stack, andunstack. In driverlog, there are multiple trucks, drivers, and hoists, which are agents that can be grouped together to form different teams (trucks and drivers can be on the same team, likewise for hoists). In rovers, there are multiple rovers that can be grouped together to form different teams.
For each domain, we set and generate 50 team traces with the size of for each . For each team trace, we have a set of optimal team plans—viewed as the ground truth—denoted by , and its corresponding goal , which best explains the team trace according to the likelihood function . We define the likelihood function by , where and , as was presented at the end of the problem definition section.
We randomly delete a subset of activities from each team trace with respect to a specific percentage . We test different values with 0%, 10%, 20%, 30%, 40%, and 50%. As an example, suggests there are 10 activities deleted from a team trace with 100 activities. We also randomly add 10 more goals, together with , to form the goal set , as was presented in the problem definition section. We define the accuracy as follows:
where “correctly recognized team plan sets” suggests the recognized team plan sets and goals are the same as the expected team plan sets and goals .
We generate 100 team plans as the library as described by MARS [33] and compare the recognition results with MARS as a baseline.
9.5.2.1 Experimental results
We evaluate DARE in the following aspects:
1. Accuracy with respect to different number of agents.
2. Accuracy with respect to different percentages of null values.
3. Running time
Varying the number of agents
We would like to evaluate the change in accuracies when the number of agents increases. We set the percentage of null values to be 30% and also ran DARE five times to calculate an average of accuracies. The result is shown in Figure 9.6. From the figure, we found that the accuracies of both DARE and MARS generally decreased when the number of agents increased. This is because the problem space is enlarged when the number of agents increases, which decreases the available information compared to the large problem space and is not enough to attain high accuracies.
FIGURE 9.6 Accuracies with respect to different number of agents.
We also found that the accuracy of DARE was lower than MARS at the beginning and then became better than MARS as the number of agents became larger. This indicates that DARE performs better when handling a large number of agents based on action models. This is because DARE builds the MAX-SAT problem space (described as proposition variables and constraints) based on model inferences (i.e., action models), while MARS is based on instances (i.e., plan library). When the number of agents is small, the problem space built by MARS is smaller than that built by DARE. In other words, when the number of agents becomes larger, the problem space built by MARS becomes larger than that built by DARE; the larger the problem space is, the more difficult it is for MAX-SAT to solve the problem; thus, DARE performs worse than MARS with less agents but better with more agents.
Varying the percentage of null values
We set the number of agents to be 60 and run DARE five times to calculate an average of accuracies with a percentage of null values. We found both accuracies of DARE and MARS decreased when the percentage increased as a result of less information being provided when the percentage increased. When the percentage is 0%, both DARE and MARS can recognize all the team traces successfully.
By observing all three domains in Figure 9.7, we note that DARE does not function as well as MARS when the percentage of incompleteness is large. This relative advantage for the library-based approach is due in large part to the fact that all team plans to be recognized are covered by the small library in the experiment, and the library of team plans helps reduce the recognition problem space compared to DARE. We conjecture that if the team plans to be recognized are not covered by the library (because of the library’s size restrictions), DARE will perform better than MARS. In this case, MARS cannot successfully recognize some team plans.
FIGURE 9.7 Accuracies with respect to different percentages of null values.
9.6 Related Work
The plan-recognition problem has been addressed by many researchers. Except for the previous works [15,7,10,20] on the plan-recognition problem presented in the introduction section, Singla and Mooney proposed an approach to abductive reasoning using a first-order probabilistic logic to recognize plans [25]. Amir and Gal addressed a plan-recognition approach to recognizing student behaviors using virtual science laboratories [1]. Ramirez and Geffner exploited off-the-shelf classic planners to recognize probabilistic plans[21]. Despite the success of these systems in solving the plan-recognition problem, a limitation is that they all focus only on single-agent plans; this is different from our problem setting that focuses on multiagent environments.
Different from the aforementioned single-agent work, there have been approaches related to the MAPR problem. For example, Saria and Mahadevan presented a hierarchical multiagent Markov process as a framework for hierarchical probabilistic plan recognition in cooperative multiagent systems [23]. Sukthankar and Sycara presented an approach that leveraged several types of agent resource dependencies and temporal ordering constraints to prune the size of the plan library considered for each observation trace [27]. Avrahami-Zilberbrand and Kaminka preferred a library of single-agent plans to team plans; however, they identified dynamic teams based on the assumption that all agents in a team are executing the same plan under the temporal constraints of that plan[2]. The constraint on activities of the agents that can form a team can be severely limiting when teammates can execute coordinated but different behaviors.
Instead of using the assumption that agents in the same team should execute a common activity, besides the approaches introduced in Section 9.1 [4,3,33], Sadilek and Kautz provided a unified framework to model and recognize activities that involved multiple related individuals playing a variety of roles with the help of GPS data [22].In addition, Masato et al. proposed a probabilistic model based on conditional random fields to automatically recognize the composition of teams and team activities in relation to a plan[19]. In these systems, although coordinated activities can be recognized, they are applicable in scenarios where a set of real-world GPS data is available or team traces are fully observed. In contrast, we do not require GPS data and full team traces in both MARS andDARE.
9.7 Conclusion
This chapter presented two algorithms, MARS and DARE, to recognize multiagent plans from partially observed team traces. Given a partial team trace and a library of team plans, MARS builds a set of soft/hard constraints and solves them using a weighted MAX-SAT solver. The solution obtained is a set of occurrences that covers each element in the team trace exactly once. DARE takes a partial team trace and a set of action models as input. It first builds a set of candidate activities and then builds sets of hard and soft constraints to finally recognize team plans.
From MARS, we observed the following from our empirical evaluation:
1. Using the observing rate function can help improve the recognizing accuracy, especially when the percentage of missing values and the size of team plans are large.
2. The recognizing accuracy decreases with an increase in missing values or the size of the library.
3. The running time of our algorithm increases polynomially with the percentage of missing values increasing. From DARE, we show that it is effective in three benchmark domains compared to the state-of-the-art multiagent plan recognition system of MARS, which relies on a library of team plans. Thus, our approach is well suited for scenarios where collecting a library of team plans is not feasible before performing team plan-recognition tasks.
In the current work, we assume that the action models are complete. A more realistic assumption is to allow the models to be incomplete [12]. In future work, one promising direction is to extend our recognizing algorithms to work with incomplete action models. Another assumption in the current model is that it expects, as input, the alternative sets of goals, one of which the observed plan is expected to be targeting. Another direction therefore is to relax this so that these algorithms can take as input a set of potential goals, with the understanding that the observed plan is achieving a bounded subset of these goals. We believe that both these extensions can be easily incorporated into the MAX-SAT framework of the current algorithms.
References
1. Amir O, Gal YK. Plan recognition in virtual laboratories. Proceedings of IJCAI. 2011:2392-2397.
2. Avrahami-Zilberbrand D, Kaminka GA. Towards dynamic tracking of multi-agents teams: an initial report. Proceedings of the AAAI Workshop on Plan, Activity, and Intent Recognition. 2007.
3. Banerjee B, Kraemer L. Branch and price for multi-agent plan recognition. Proceedings of AAAI. 2011:601-607.
4. Banerjee B, Kraemer L, Lyle J. Multi-agent plan recognition: formalization and algorithms. Proceedings of AAAI. 2010:1059-1064.
5. Borchers B, Furman J. A two-phase exact algorithm for MAX-SAT and weighted MAX-SAT problems. J Comb Optim. 1998;2(4):299-306.
6. Brafman RI, Domshlak C. From one to many: planning for loosely coupled multi-agent systems. Proceedings of ICAPS. 2008:28-35.
7. Bui HH. A general model for online probabilistic plan recognition. Proceedings of IJCAI. 2003:1309-1318.
8. Fikes R, Nilsson NJ. STRIPS: a new approach to the application of theorem proving to problem solving. Artif Intell J. 1971:189-208.
9. Garey MR, Johnson DS. Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman; 1979.
10. Geib CW, Goldman RP. A probabilistic plan recognition algorithm based on plan tree grammars. Artif Intell. 2009;173(11):1101-1132.
11. Goodman BA, Litman DJ. Plan recognition for intelligent interfaces. Proceedings of the Sixth Conference on Artificial Intelligence Applications. 1990:297-303.
12. Kambhampati S. Model-lite planning for the web age masses: the challenges of planning with incomplete and evolving domain theories. Proceedings of AAAI. 2007:1601-1605.
13. Kaminka G, Pynadath DV, Tambe M. Monitoring teams by overhearing: a multiagent plan-recognition approach. J Artif Intell Res. 2002;17:83-135.
14. Kautz HA. A formal theory of plan recognition and its implementation. Morgan Kaufmann; 1991.
15. Kautz HA, Allen JF. Generalized plan recognition. Proceedings of AAAI. 1986:32-37.
16. Li CM, Manya F, Mohamedou N, Planes J. Exploiting cycle structures in Max-SAT. Proceedings of Twelth International Conference on the Theory and Applications of Satisfiability Testing. 2009:467-480.
17. Li CM, Manya F, Planes J. New inference rules for MAX-SAT. J Artif Intell Res. October 2007;30:321-359.
18. Litman DJ, Allen JF. A plan recognition model for subdialogues in conversation. Cognitive Sci. 1987;11:163-200.
19. Masato D, Norman TJ, Vasconcelos WW, Sycara K. Agent-oriented incremental team and activity recognition. Proceedings of IJCAI. 2011:1402-1407.
20. Ramirez M, Geffner H. Plan recognition as planning. Proceedings of IJCAI. 2009:1778-1783.
21. Ramirez M, Geffner H. Probabilistic plan recognition using off-the-shelf classical planners. Proceedings of AAAI. 2010:1121-1126.
22. Sadilek A, Kautz H. Recognizing multi-agent activities from GPS data. Proceedings of AAAI. 2010:1134-1139.
23. Saria S, Mahadevan S. Probabilistic plan recognition in multiagent systems. Proceedings of ICAPS. 2004:287-296.
24. Schmidt CF, Sridharan NS, Goodson JL. The plan recognition problem: an intersection of psychology and artificial intelligence. Artif Intell. 1978;11(1–2):45-83.
25. Singla P, Mooney R. Abductive Markov logic for plan recognition. Proceedings of AAAI. 2011:1069-1075.
26. Sukthankar G, Sycara K. Simultaneous team assignment and behavior recognition from spatio-temporal agent traces. Proceedings of Twenty-First National Conference on Artificial Intelligence. 2006:716-721.
27. Sukthankar G, Sycara K. Hypothesis pruning and ranking for large plan recognition problems. Proceedings of AAAI. 2008:998-1003.
28. Sukthankar G, Sycara K. Activity recognition for dynamic multi-agent teams. ACM transactions on intelligent systems and technology. 2011;3(1):18:1-18:24.
29. Tambe M. Towards flexible teamwork. J Artif Intell Res. 1997;7:83-124.
30. Wilensky R. Planning and understanding. Addison-Wesley; 1983.
31. Yang Q, Wu K, Jiang Y. Learning action models from plan examples using weighted MAX-SAT. Artif Intel J. 2007;171:107-143.
32. Zhang L, Bacchus F. Maxsat heuristics for cost optimal planning. Proceedings of AAAI. 2012:1846-1852.
33. Zhuo HH, Li L. Multi-agent plan recognition with partial team traces and plan libraries. Proceedings of IJCAI. 2011:484-489.
34. Zhuo HH, Yang Q, Hu DH, Li L. Learning complex action models with quantifiers and implications. Artif Intell. 2010;174(18):1540-1569.
35. Zhuo HH, Yang Q, Kambhampati S. Action-model based multi-agent plan recognition. Proceedings of NIPS. 2012:377-385.
1 www.laria.u-picardie.fr/∼cli/EnglishPage.html
2 Note that we do not allow AND/OR cases in the team plan description (i.e., there are no AND/OR branches in the team plan). We assume that plans with AND/OR branches have been compiled into multiple different team plans.
3 www.cs.toronto.edu/aips2000/
4 http://planning.cis.strath.ac.uk/competition/