Plan, Activity, and Intent Recognition: Theory and Practice, FIRST EDITION (2014)
Part V. Applications
Chapter 11. Probabilistic Plan Recognition for Proactive Assistant Agents
Jean Oha, Felipe Meneguzzib and Katia Sycaraa, aCarnegie Mellon University, Pittsburgh, PA, USA, bPontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil
Human users dealing with multiple objectives in a complex environment (e.g., military planners or emergency response operators) are subject to a high level of cognitive workload. When this load becomes an overload, it can severely impair the quality of the plans created. To address these issues, intelligent assistant systems have been rigorously studied in both the artificial intelligence (AI) and the intelligent systems research communities. This chapter discusses proactive assistant systems, which predict future user activities that can be facilitated by the assistant. We focus on problems in which a user is solving a complex problem with uncertainty, and thus on plan-recognition algorithms suitable for the target problem domain. Specifically, we discuss a generative model of plan recognition that represents user activities as an integrated planning and execution problem. We describe a proactive assistant agent architecture and its applications in practical problems including emergency response and military peacekeeping operations.
Prognostic normative assistance
The research for this chapter was sponsored by the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. government. The U.S. government is authorized to reproduce and distribute reprints for government purposes notwithstanding any copyright notation hereon.
Plan recognition, which refers to the task of identifying the user’s high-level goals (or intentions) by observing the user’s current activities, is a crucial capability for intelligent assistant systems that are intended to be incorporated into the user’s computing environment.
This chapter discusses how we use plan recognition to develop a software agent that can proactively assist human users in time-stressed environments. Human users dealing with multiple objectives in a complex environment (e.g., military planners or emergency response operators) are subject to a high level of cognitive load. When this load is excessive, it can severely impair the quality of the plans that are created . To help users focus on high-priority objectives, we develop a software assistant agent that can recognize the user’s goals and plans in order to proactively assist with tedious and time-consuming tasks (e.g., anticipating information needs or reserving resources ahead of user need).
Plan-recognition algorithms generally center on a model that describes how a user behaves. Such a model can be built by collecting frequently observed sequences of user actions (e.g., as a plan library) [13,11,3]. By contrast, a generative approach can be taken to develop a user model to represent how a user makes decisions to solve a problem (e.g., as a planning process) [4,24,20,21]. Choosing the right approach requires an understanding of where the target problem’s uncertainty originates. Whereas a plan library is suitable for representing a user’s activities that may constitute multiple unrelated problems (i.e., uncertainty lies in a user’s objectives), a planning process can succinctly represent a complex decision-making process that may result in a large number of various plans (i.e., uncertainty lies in an environment).
In this chapter, we focus on a case where a user is solving a domain-specific problem that deals with a high level of complexity and uncertainty (e.g., an emergency response system), where a flexible plan is made in advance but the actual course of action is dynamically determined during execution . Thus, our discussion here focuses on the generative approach—using a planner to represent a user model—and how this model can be used in intelligent assistant systems.
The rest of this chapter is organized as follows. After discussing proactive assistant systems generally in Section 11.2, a generative plan-recognition algorithm is described in detail in Section 11.3, followed by a description of how the results of plan recognition are used within a proactive assistant architecture in Section 11.4. Section 11.5 presents two examples of fully implemented proactive assistant systems. Finally, the chapter is summarized in Section 11.6.
11.2 Proactive Assistant Agent
A software assistant system, like a human assistant, is expected to perform various tasks on behalf of a user. An assistant’s role has a set of desired qualifications including the ability to learn a user’s preferences [17,22], the ability to assess the current state and to make rational decisions in various situations , and the ability to speculate on a user’s future activities so that time-consuming actions can be taken proactively . Here, we focus on the assistant’s ability to make proactive decisions where plan recognition is a crucial part.
The core componency of an intelligent assistant system is its decision-making module. For instance, an agent can make decisions according to a set of prescribed rules if complete information about its tasks is available a priori. An assistant’s decision making can also be data-driven—that is, an action is executed whenever its preconditions are changed as new information is propagated (e.g., as with constraint-based planners) . Alternatively, a decision–theoretic planner can be adopted; for example, the Electric Elves uses a Markov decision process (MDP) to develop a personal assistant, known as Friday, which determines the optimal action given various states. For instance, given an invitation (to which a user is supposed to respond), Friday may wait a little until its user responds or take an action on behalf of the user according to the expected reward for each action.
To add the notion of plan recognition to an assistant’s decision-making module, a partially observable MDP (POMDP) is generally used, where a user’s goals (or intentions) are inserted as unobservable variables . In this approach, plan recognition is tightly coupled with an assistant’s action selection. That is, an assistant learns an optimal action to take in response to each user state without having a notion of its own (agent) goals or planning. In other words, traditional (PO)MDP approaches model immediate assistant actions in response to individual user actions, even if they implicitly consider the reward of future user actions for this action selection. This approach does not explicitly “look ahead” within a user’s plan nor does it consider time constraints. For these reasons, the types of support that this approach can provide may be limited to atomic (or single) actions (e.g., opening a door for a user as in ) and may not be suitable for time-consuming actions, such as information prefetching, or more complex jobs that require the planning of multiple actions.
By contrast, the proactive assistant agent architecture, known here as Anytime Cognition (ANTICO1), separates plan recognition from the assistant’s decision-making module . Figure 11.1 illustrates an abstract view of the architecture. At this point, the user plan represents the assistant’s estimation of how a user makes decisions. Based on this user plan, plan recognition is performed to generate sequences of expected actions. The proactive manager evaluates the predicted user plan to identify potential assistance needs. Here, the general purpose of the evaluation is to identify a set of unmet preconditions (or prerequisites) of predicted user actions, but the criteria for evaluating user plans is specific to each problem domain—for instance, identifying information needed to execute certain actions [19,15] or detecting potential norm violations . The set of identified assistance needs is labeled as new tasks for the assistant and is passed to the assistant’s planning module.
FIGURE 11.1 An abstract view of a proactive assistant agent architecture.
The ANTICO architecture also supports the notion of an assistant’s goals and planning similar to Friday’s planner in Electric Elves . Whereas Friday’s actions are triggered on the receipt of a new request, ANTICO determines a set of assistive tasks according to its prediction of user needs.
Figure 11.1 includes a simple example, which can be seen within the dotted lines. By evaluating the user plan, it is predicted with 0.9 probability that heading toward area suggests information about the red zone is needed. The prediction also suggests information about the blue zone is needed, but the need has a low probability, so the requirement has been pruned. The proactive manager assigns the assistant a new goal of acquiring red zone information. Note that a deadline time constraint is imposed on this assignment because the user will need this information by timestep .
The assistant plans and schedules necessary resources to acquire the needed information. For instance, the assistant first selects an information source from which the needed information can be retrieved before the deadline. After the assistant retrieves the information from the selected source, a data postprocessing action can be taken to excerpt the information for a user to parse quickly. The information that has been prepared is passed back to the proactive manager to be presented to the user when needed.
Disengaging an assistant’s planning from its user’s planning has several advantages over approaches based on tight coupling. First, the size of the state space is exponentially reduced as follows. Let us define a user’s planning space in terms of a set of variables, where a subset of those variables can be delegated to an assistant. The size of the state space grows exponentially in the number of variables. Let and denote the number of user variables and assistant variables, respectively. Without loss of generality, we add two simplifying assumptions that user and agent variables are exclusive and that the domain size for all variables is a constant . Then, the size of the state space where these variables are tightly coupled is , whereas that of the detached approach is .
The ANTICO architecture has been flexibly applied to two types of information assistants [19,15] and to an assistant that supports humans in complying with organizational norms , which will be described further in Section 11.5.
11.3 Probabilistic Plan Recognition
This section describes a generative approach to plan recognition  and discusses related work.
11.3.1 Plan Recognition as Planning
The idea of using artificial intelligence (AI) planning for plan recognition has been gaining attention in various fields including cognitive science, machine learning, and AI planning.
In cognitive science, Baker et al.  used a set of Markov decision processes to model how a human observer makes predictions when observing other agents’ activities. Their results show that the MDP framework resembles how humans make predictions in experiments in which human subjects were asked to recognize the goal of an animated agent.
The idea of plan recognition as planning is also closely related to the notion of inverse optimal control in MDP-based planners . Inverse optimal control, which refers to the task of recovering a cost function from observed optimal behavior, has been studied under various names including inverse reinforcement learning , apprenticeship learning , and imitation learning . These algorithms focus on the learning of hidden cost functions (as opposed to using predetermined cost functions) and have been specifically designed for the MDP framework.
A series of studies by Ramírez and Geffner contributes to bridging AI planning and goal recognition, establishing the notion of plan recognition as planning. Because their main objective is to identify a user’s goals, it is more appropriate to refer to their work as “goal recognition.” Their initial work used classic planners for goal recognition ; in it goal prediction worked only when the observed actions precisely matched an expected sequence of actions. To overcome this drawback, they adopted a probabilistic model to address uncertainty . This framework has also been applied to the partially observable MDP framework .
In the following subsections, we describe a probabilistic plan-recognition algorithm presented in Oh et al. [20,21].
11.3.2 Representing a User Plan as an MDP
We use an MDP  to represent a user’s decision-making model. An MDP is a rich framework that can represent various real-life problems involving uncertainty.
The use of an MDP to represent a user plan is justified for the problem domain of our interest in which users are strongly motivated to accomplish a set of clearly defined goals. Thus, we can assume that a user is executing a sequence of planned actions; that is, the user has planned the observed actions. For instance, in emergency-response situations, every major governmental organization has a set of emergency operations plans (EOP) that has been created in advance. The EOP provides a foundation for the creation of specific plans to respond to the actual details of a particular event .
To model the user’s planning process, we consider an AI planner so that we can generate a set of alternative plans by solving a user’s planning problem. At the same time, we need a model that can capture the nondeterministic nature of real-life applications. Since an MDP is a stochastic planner, it suits both of our purposes.
Throughout the chapter we use Definition 11.1 to refer to an MDP. We note that the discount factor, , in the definition is an optional component used to ensure that the Bellman equations converge in infinite horizon. When the discount factor is not specified, it is assumed to be 1. Moreover, given the multiple equivalent ways to render the equations that solve MDPs, in this chapter we use the presentation style of Russell and Norvig (see Chapter 17)  for clarification.
Definition 11.1 MDP A Markov decision process is represented as a tuple , where denotes a set of states; , a set of actions; , a function specifying a reward of taking an action in a state; , a state-transition function; and, a discount factor indicating that a reward received in the future is worth less than an immediate reward. Solving an MDP generally refers to a search for a policy that maps each state to an optimal action with respect to a discounted long-term expected reward.
Without loss of generality we assume that the reward function, , can be given such that each individual state yields a reward when the agent reaches it.2 Although the MDP literature sometimes refers to a goal state as being an absorbing or terminal state; that is, a state with for all and for all in except the current state (i.e., a state with no possibility of leaving), we mean a goal state to be a state with a positive reward (i.e., any state with ). Note that satisfying time constraints is imperative in the target problem domain; that is, actions must be taken in a timely manner (e.g., in an emergency-response case). Here, discount factor is used to manage time constraints in a planner, specifying that a reward is decayed as a function of time.
Definition 11.2 Value Iteration Given an MDP, denoted by a tuple , the value of state , denoted by , can be defined as the discounted long-term expected reward when starting from state and taking the best action thereafter, which is known as the Bellman equation as follows:
The value iteration algorithm initializes the values of states with some value (e.g., an arbitrary constant) and iteratively updates values for all states until they converge. The algorithm is guaranteed to converge when . Value iteration computes a deterministic policy by selecting an optimal action in each state as follows:
In addition to optimal actions, there can be “good” actions with expected values that come close to the optimum. It would be too naive for an assistant to assume that a human user will always choose the optimal action. In Definition 11.3, instead of computing a deterministic policy as in Definition 11.2, we compute a stochastic policy that, instead of selecting an optimal action, ascribes probability of selecting action in state according to the expected value of taking action . This policy expresses the probability with which an imperfect decision maker selects an action based on the actual perfectly rational choice. This stochastic policy allows the assistant to prepare for a wider range of user actions that are likely to be chosen in reality. A similar idea of computing a stochastic policy from value iteration can be found in Zeibart et al. .
Definition 11.3 Value Iteration for a Stochastic Policy Let be an action and be states; we define a stochastic policy denoting the probability of selecting action in state . This probability is computed as a proportion of the maximum expected reward of selecting action in state , such that:
Algorithm 11.1 predictFuturePlan
Let denote an MDP representing the user’s planning problem. The plan-recognition algorithm shown in Algorithm 11.1 is a two-step process. The agent first estimates which goals the user is trying to accomplish and then predicts a sequence of possible plan steps that the user is most likely to take to achieve those goals.
11.3.3 Goal Recognition
In the first step, the algorithm estimates a probability distribution over a set of possible goals. We use a Bayesian approach that assigns a probability mass to each goal according to how well a series of observed user actions is matched with the optimal plan toward the goal.
We define set of possible goal states as all states with positive rewards such that and . The algorithm initializes the probability distribution over the set of possible goals, denoted by for each goal in , proportionally to the reward , such that and . The algorithm then computes an optimal policy, , for every goal in , considering the positive reward only from the specified goal state and 0 rewards from any other states . We use the variation of the value iteration algorithm described in Definition 11.3 for computing an optimal policy.
For each potential goal , the algorithm computes a goal-specific policy, , to achieve goal . Following the assumption that the user acts more or less rationally, this policy can be computed by solving the MDP to maximize the long-term expected rewards. Instead of a deterministic policy that specifies only the best action that results in the maximum reward, we compute a stochastic policy, such that probability of taking action given state when pursuing goal is proportional to its long-term expected value :
where is a normalizing constant. Note that this step of computing optimal policies is performed only once and can be done offline, and the resulting policies are also used in the second step, as will be described in Section 11.3.4.
Let denote a sequence of observed states and actions from timesteps 1 through , where . Here, the assistant agent must estimate the user’s targeted goals.
After observing a sequence of user states and actions, the assistant agent updates the conditional probability, , that the user is pursuing goal given the sequence of observations . The conditional probability can be rewritten using the Bayes rule as:
By applying the chain rule, we can write the conditional probability of observing the sequence of states and actions given goal as:
By the MDP problem definition, the state-transition probability is independent of the goals. With the Markov assumption, the state-transition probability is also independent of any past states except the current state, and the user’s action selection depends only on the current state and the specific goal. By using these conditional independence relationships, we get:
where the probability represents the user’s stochastic policy for selecting action from state given goal , which was computed during the initialization step.
By combining Eqs. 11.1 and 11.2, the conditional probability of a goal given a series of observations can be obtained. We use this conditional probability to assign weights when constructing a predicted plan-tree in the next step.
The algorithmic complexity of solving an MDP using value iteration is quadratic in the number of states and linear in the number of actions. Here, the optimal policies for candidate goals can be precomputed offline. Thus, the actual runtime complexity of our goal-recognition algorithm is linear in the number of candidate goals and the number of observations.
11.3.4 Plan Prediction
Based on the predicted goals from the first step, we now generate a set of possible scenarios that the user will follow. Recall that we solved the user’s MDP to get stochastic policies for each potential goal. The intuition for using a stochastic policy is to allow the agent to explore multiple likely plan paths in parallel, relaxing the assumption that the user always acts to maximize her or his expected reward.
Algorithm 11.2 buildPlanTree
Using the MDP model and the set of stochastic policies, we sample a tree of the most likely sequences of user actions and resulting states from the user’s current state, known here as a plan-tree. In a predicted plan-tree, a node contains the resulting state from taking a predicted user action, associated with the following two features: priority and deadline. We compute the priority of a node from the probability representing the agent’s belief that the user will select the action in the future; that is, the agent assigns higher priorities to assist those actions that are more likely to be taken by the user. On the other hand, the deadline indicates the predicted timestep when the user will execute the action; that is, the agent must prepare assistance before a certain point in time by which the user will need help.
The algorithm builds a plan-tree by traversing the actions that, according to the policy generated from the MDP user model, the user is most likely to select from the current state. First, the algorithm creates a root node with probability 1 with no action attached. Then, according to the MDP policy, likely actions are sampled, such that the algorithm assigns higher priorities to those actions that lead to a better state with respect to the user’s planning objective. Note that the algorithm adds a new node for an action only if the agent’s belief about the user selecting the action is higher than some threshold ; otherwise, actions are pruned. Note that the assistant may prepare for all possible outcomes if the problem space is manageably small; however, resources (e.g., time, CPU, and network bandwidth) are generally constrained and it is thus necessary to prioritize assistive tasks according to predicted needs.
The recursive process of predicting and constructing a plan-tree from a state is described in Algorithm 11.2. The algorithmic complexity of plan generation is linear in the number of actions. The resulting plan-tree represents a horizon of sampled actions and their resulting states for which the agent can prepare appropriate assistance.
11.4 Plan Recognition within a Proactive Assistant System
This section describes how the predicted plan-tree from Section 11.3 fits inside the ANTICO architecture shown earlier in Figure 11.1. ANTICO is a scalable model where the assistant agent dynamically plans and executes a series of actions to manage a set of current tasks as they arise.
11.4.1 Evaluating Predicted User Plan
After a user plan is predicted through a process of plan recognition, the proactive manager evaluates each node in a predicted plan-tree according to domain-specific criteria. For example, if a user is solving a maze game that requires a password to enter a certain room, the passwords in the predicted user path are identified as unmet requirements . A user plan can also be evaluated according to a set of regulatory rules such as social norms. In this case, any potential norm violation in the predicted user plan gives rise to a need for assistance .
The evaluation of a user plan results in a set of new tasks for the assistant (e.g., acquiring necessary information or resolving norm violations) to restore normative states. Since the evaluation of the user plan is not the focus of this chapter, we refer readers to related work of Oh et al. for further details .
11.4.2 Assistive Planning
In ANTICO, the assistant is essentially a planning agent that can plan its actions to accomplish a specified goal. The proactive manager formulates an assistive task in terms of the assistant’s initial state and its goal state.
The architecture is not bound to any specific type of planners (e.g., a classic planner may be used). Recall that a predicted user plan from the plan recognizer imposes deadline constraints (specified as the node depth) on the agent’s planning. An MDP is a preferred choice not only because it is consistent with the user-plan model but also because the discount factor can be used to implement ad hoc deadline constraints. A deadline constraint is used to determine the horizon for an MDP plan solver, such that the agent planner can complete the task to satisfy the deadline. For more principled time-constraint management, integrated planning and resource scheduling can be considered.
The planning problem formulated by the proactive manager may not always be solvable; for instance, the goal state may only be accomplished by modifying those variables that the assistant cannot access, or none of the assistant’s actions have affects that can lead to the specified goal state. In such cases, the assistant notifies the user immediately so that she can take appropriate action on her own. Otherwise, the assistant starts executing its actions according to the optimal policy until it reaches a goal state.
11.4.3 Cognitively Aligned Plan Execution
Execution of an agent action may change one or more variables. For each newly generated plan (or policy) from the planner module, an executor is created as a new thread. An executor waits for a signal from the variable observer that monitors changes in the environment variables to determine the agent’s current state. When a new state is observed, the variable observer notifies the plan executor to wake up. The plan executor then selects an optimal action in the current state according to the policy and executes the action. After taking an action, the plan executor is required to wait for a new signal from the variable observer. If the observed state is an absorbing state, then plan execution is terminated; otherwise, an optimal action is executed from the new state.
The agent’s plan can be updated during execution as more recent assessments of rewards arrive from the proactive manager, forcing the assistant to replan. Any plans that are inconsistent with the current assessment are aborted.
To handle unexpected exceptions during execution, an executable action has a timeout, such that when the execution of an action reaches its timeout the plan is aborted. When a plan is aborted, the specific goals of the plan are typically unmet. If the goals are still relevant to the user’s current plan (according to a newly predicted user plan), then the proactive manager will generate them as new goals for the agent.
In this section, we study two examples of ANTICO implemented in practical problem domains, summarizing work presented in Oh et al. and Meneguzzi et al. [21,19,15].
11.5.1 Norm Assistance
In certain scenarios, human decision making is affected by policies, often represented by deontic concepts such as permissions, obligations, and prohibitions. Individual rules within a policy have been actively studied in the area of normative reasoning . Norms generally define constraints that should be followed by the members in a society at particular points in time to ensure certain systemwide properties . These constraints are generally specified by guarded logic rules of the form , which indicate that when the condition occurs, a norm becomes activated, imposing a restriction on the set of desirable states in the domain. If is an obligation, the norm defines a set of states through which an agent must pass; otherwise, if is a prohibition, the norm defines a set of states that must be avoided.
For example, in international peacekeeping operations, military planners must achieve their own unit’s objectives while following standing policies that regulate how interaction and collaboration with non-governmental organizations (NGOs) ought to take place. Because the planners are cognitively overloaded with mission-specific objectives, such normative stipulations increase the complexity of planning to both accomplish goals and abide by the norms.
Although much of the research on normative reasoning focuses on deterministic environments populated by predictable agent decision making, such a model is not suitable for reasoning about human agents acting in the real world. By leveraging recent work on normative reasoning over MDPs , it is possible to reason about norm compliance in nondeterministic environments; however, the issue of nondeterminism in the decision maker has remained problematic. To address this problem, an instantiation of the proactive assistance architecture was created to provide prognostic reasoning support by designing the proactive manager to analyze user plans for normative violations  in the context of military escort requests for relief operations. An overview of this architecture is provided in Figure 11.2a, while a screenshot of the assistance application is shown in Figure 11.2b.
FIGURE 11.2 Norm assistance agent overview. (a) Agent architecture; (b) Application screenshot.
The normative assistant relies on a probabilistic plan recognizer to generate a tree of possible plan steps. The proactive manager evaluates a user plan through a norm reasoner, which analyzes the sequence of states induced by the predicted plan for norm violations. These predicted violations are the object of the agent planner, which tries to find the nearest norm-compliant states in order to recommend user actions that will ensure norm-compliant behavior. If compliant states are not achievable, for example, because some violations are unavoidable in the user’s possible future state, or if the user has already violated certain norms, the agent can also suggest remedial actions to either comply with penalties or mitigate the effects of a violation (i.e., contrary-to-duty obligations ).
11.5.2 Emergency Response
ANTICO was applied in an emergency-response application assistant aimed at proactively supporting a manager responsible for responding to emergencies in civilian areas, including natural disasters or attacks on infrastructure . The architecture customized for information assistant is shown in Figure 11.3a, and a screenshot of this application in emergency response is shown in Figure 11.3b. Given this application scenario, the adaptations deployed in ANTICO have two purposes. First, given the fluidity of an emergency situation, plan recognition should focus on providing assistance only to events in the near future with a high degree of confidence about the user’s current activity. Second, given the structure of the emergency-response plans, assistance should follow the steps of these plans as closely as possible.
FIGURE 11.3 ANTICO overview. (a) Agent architecture; (b) Application screenshot.
The performance of proactive assistants depends on the accuracy of the plan-recognition algorithm, combined with seamless integration of the assistance provided to the user’s regular workflow. On one extreme of the performance spectrum, a perfect assistance agent always recognizes user behavior correctly and provides assistance exactly when the user expects it; on the other extreme, the agent’s predictions are never correct and the assistance provided is intrusive to the user, causing the user to waste time dealing with agent interventions. To assess the impact of prediction correctness on user performance, we carried out a study analyzing the effect of various levels of accuracy for the plan-recognition module on a simulated user’s response time in an emergency scenario . This study has shown that, for the selected emergency domain, with unobtrusive assistance, even moderate accuracy from plan recognition can result in gains in user performance. More specifically, the agent significantly reduces the completion time as long as its prediction accuracy is above 0.2.
Figure 11.4 shows that the amount of time reduced is increased as the human user’s task becomes more challenging, thus requiring more time to finish unaided. For the most challenging category of the problem set, agent assistance reduced the job completion time by more than 50%.
FIGURE 11.4 The amount of time saved with proactive agent assistance for problems with varying difficulty. Note: Problem difficulty is measured in terms of unaided completion time (-axis).
This chapter discussed probabilistic plan recognition suitable for proactive assistant agents. We specifically focus on a class of problems in which a user faces a highly complex task of planning, executing, and replanning in a time-stressed environment. For example, an operator in an emergency response must react to high-risk and rare events by creating a specific plan from a set of predefined EOPs to deal with uncertainty in that specific event setting. Since the user’s main activities form a planning process, this has led to a generative approach to probabilistic plan recognition that has been gaining attention within both the cognitive science and the AI communities. We described a generative plan-recognition algorithm that can predict a user’s future plan steps with probabilities, such that the assistant’s plan can be optimized with respect to expected user benefit. We developed an assistant agent framework using this plan-recognition algorithm and demonstrated two applications that have been implemented in practical problem domains.
1. Abbeel P, Ng AY. Apprenticeship learning via inverse reinforcement learning. Proceedings of the 25th International Conference. 2004.
2. Ambite JL, Barish G, Knoblock CA, Muslea M, Oh J, Minton S. Getting from here to there: interactive planning and agent execution for optimizing travel. Proceedings of the National Conference on Artificial Intelligence. 2002:862-869.
3. Armentano MG, Amandi A. Plan recognition for interface agents. Artif Intell Rev. 2007;28(2):131-162.
4. Baker CL, Saxe R, Tenenbaum JB. Action understanding as inverse planning. Cognition. 2009;31:329-349.
5. Barish G, Knoblock CA, et al. Speculative execution for information gathering plans. Proceedings of the 6th International Conference on AI Planning and Scheduling. 2002:184-193.
6. Bellman R. A Markovian decision process. Indiana Univ Math J. 1957;6:679-684.
7. Card AJ, Harrison H, Ward J, Clarkson PJ. Using prospective hazard analysis to assess an active shooter emergency operations plan. J Healthcare Risk Management. 2012;31(3):34-40.
8. Chalupsky H, Gil Y, Knoblock CA, Lerman K, Oh J, Pynadath DV, et al. Electric elves: applying agent technology to support human organizations. Proceedings of the 13th Conference on Innovative Applications of Artificial Intelligence. 2001:51-58.
9. Fagundes MS, Billhardt H, Ossowski S. Reasoning about norm compliance with rational agents. In: Helder Coelho, Rudi Studer, Michael Wooldridge, eds. Proceedings of the 19th European Conference on Artificial Intelligence. Frontiers in artificial intelligence and applications. vol. 215. 2010:1027-1028.
10. Fern A, Natarajan S, Judah K, Tadepalli P. A decision-theoretic model of assistance. In: Veloso Manuela M, ed. Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007:1879-1884.
11. Geib CW, Steedman M. On natural language processing and plan recognition. Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007:1612-1617.
12. Jones AJI, Sergot M. On the characterisation of law and computer systems: the normative systems perspectiveDeontic logic in computer science: normative system specification. Wiley; 1993 pp. 275–307.
13. Kautz H, Allen JF. Generalized plan recognition. Proceeding of the AAAI. vol. 19. 1986:86.
14. Meneguzzi F, Luck M. Norm-based behaviour modification in BDI agents. Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 2009:177-184.
15. Meneguzzi F, Mehrotra S, Tittle J, Oh J, Chakraborty N, Sycara K. A cognitive architecture for emergency response. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. 2012:1161-1162.
16. Meneguzzi F, Oh J, Chakraborty N, Sycara K, Mehrotra S, Lewis M. Anytime cognition: an information agent for emergency response. Proceedings of the 5th Annual Conference of the International Technology Alliance. 2011.
17. Mitchell TM, Caruana R, Freitag D, McDermott J, Zabowski D, et al. Experience with a learning personal assistant. Comm ACM. 1994;37(7):80-91.
18. Ng AY, Russell S. Algorithms for inverse reinforcement learning. Proceedings of the 21st International Conference. 2000:663-670.
19. Oh J, Meneguzzi F, Sycara K. Probabilistic plan recognition for intelligent information agents: towards proactive software assistant agents. Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 2011:281-287.
20. Oh J, Meneguzzi F, Sycara KP. ANTIPA: an architecture for intelligent information assistance. Proceedings of the 19th European Conference on Artificial Intelligence. 2010:1055-1056.
21. Oh J, Meneguzzi F, Sycara KP, Norman TJ. An agent architecture for prognostic reasoning assistance. Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011:2513-2518.
22. Oh J, Smith S. Learning user preferences in distributed calendar scheduling. Prac Theory Automated Timetabling V. 2005:3-16.
23. Prakken H, Sergot MJ. Contrary-to-duty obligations. Studia Logica. 1996;57(1):91-115.
24. Ramíez M, Geffner H. Probabilistic plan recognition using off-the-shelf classical planners. Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010.
25. Ramírez M, Geffner H. Goal recognition over POMDPs: inferring the intention of a POMDP agent. Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011.
26. Ratliff N, Ziebart B, Peterson K, et al. Inverse optimal heuristic control for imitation learning. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. 2009.
27. Russell Stuart, J, Norvig Peter. Artificial intelligence—a modern approach. 3rd ed Pearson Education; 2010.
28. Sycara K, Norman TJ, Giampapa JA, Kollingbaum MJ, Burnett C, Masato D, et al. Agent support for policy-driven collaborative mission planning. Computer J. 2010;53(5):528-540.
29. Ziebart BD, Maas A, Bagnell JA, Dey AK. Maximum entropy inverse reinforcement learning. Proceedings of the 22nd AAAI Conference on Artificial Intelligence. 2008:1433-1438.
1 In earlier work, an instance of ANTICO is referred to as ANTicipatory Information and Planning Agent (ANTIPA).
2 It is trivial to see that .