Metrics for Evaluating Generalized Plans
The computation and analysis of representations that aid generalizability and transfer in planning is a rich area of research with multiple approaches. For instance, generalized plans are program-like structures that solve a related set of problems:
while (∃b: clear(b) ∧ ¬onTable(b)) {
mvToTable(b)
}
This plan for unstacking blocks solves all blocks-world problems where the goal
is to have all blocks on the table. During instantiation, this program like data
structure needs to be converted into a sequence of concrete
mvToTable(_)
actions with object names as arguments.
However, not all such structures are useful. E.g., the following generalized plan “solves” all object arrangement problems (including arranging your desk!):
while (∃a,b: clear(a) ∧ clear(b)) {
pickup(a)
placeOn(a,b)
}
However it is practically useless because it doesn’t specify which objects a and b to use in each iteration, and figuring that out amounts to solving the original problem from scratch.
In order to learn or compute more useful representations that aid generalization and transfer in planning, we need to consider a more nuanced set of metrics than just the number of problems that could possibly be solved. To aid readability in discussing these metrics we will use the term “generalizable representation” for planning to refer to a representation that is learned or computed for the purpose of aiding planning across a range of problem instances, e.g., generalized plans, hierarchical models, generalized heuristics or non-deterministic finite-state machines. These metrics were first presented by Srivastava, Immerman, and Zilberstein in an AIJ article (2011).
Domain Coverage
This metric captures the most obvious desired property: the fraction of problem instances of interest where the generalizable representation can be used. Solving broad classes of problems is the most commonly addressed metric among approaches for generalization in planning. However, an exclusive focus on domain coverage can result in “solutions” like the second example above, which applies to wide range of problems but doesn’t help much.
Complexity of Plan Instantiation
This metric represents the computation cost of using a generalizable representation to compute a solution for specific problem instance. This distinguishes the first example above, which has a linear-time cost of instantiation, from the second example, whose cost of instantiation is the computational complexity of the original planning problem. We often see a trade-off between the domain coverage and the complexity of plan instantiation: generalizable structures that have a high domain coverage (e.g., the second example above) suffer from a high complexity of plan instantiation.
Complexity of Checking Applicability
This metric represents the computational cost of determining whether a generalizable representation can be used to aid planning for a given problem instance. Generalizable representations can be designed to be used in one of two ways when given an input problem instance: (1) conduct a pre-designed applicability test to determine if an instantiation will be possible, and if so, proceed to find it; or, (2) directly attempt an instantiation. The problem with the second approach is that instantiation can be an expensive and wasteful operation if the generalized plan cannot actually solve the given problem instance. While the first approach is desirable, it is often very difficult to construct an applicability test; the ideal situation would be to have a linear time applicability test.
Complexity of Computing a Generalizable Representation
This metric captures the computational cost of computing the generalizable representation itself. It can be parameterized in terms of the sample complexity of learning or the computational complexity of the algorithm that computes the generalized plan. Representations play a signficant role in reducing this cost. For instance, if an explicit policy representation is used, just writing down the policy takes \(O(S)\) time where \(S\) is the size of the state space. On the other hand, a generalized plan for the a set of related tasks can have a bounded representation independent of the number of objects in the state space.
Quality of the Instantiated Plan
This metric evaluates the quality of solutions computed using the generalizable representation. Ideally, instantiated plans and policies should be optimal according to a user-desired measure like the expected cost or probability of success. However, there are typically trade-off between the complexity of plan instantiation, domain coverage and the quality of instantiated plans. Structures that enable the computation of optimal plans are often specific to narrow classes of problems and thus have low domain coverage. On the other hand, algorithmic processes that use generalizable representations to compute optimal solutions can result in a high complexity of plan instantiation.