Dynamic programming approaches for the traveling salesman problem with drone

A promising new delivery model involves the use of a delivery truck that collaborates with a drone to make deliveries. Effectively combining a truck and a drone gives rise to a new planning problem that is known as the traveling salesman problem with drone (TSP‐D). This paper presents exact solution approaches for the TSP‐D based on dynamic programming and provides an experimental comparison of these approaches. Our numerical experiments show that our approach can solve larger problems than the mathematical programming approaches that have been presented in the literature thus far. Moreover, we show that restrictions on the number of locations the truck can visit while the drone is away can help significantly reduce the solution times while having relatively little impact on the overall solution quality.


INTRODUCTION
Several Internet retailers and logistics service providers including Amazon, Singapore post and DHL are experimenting with the use of drones to support the delivery of parcels and mail. One promising new operating model is the use of a regular delivery truck that collaborates with a drone to make deliveries. This way, the high capacity and long range of the truck can be combined with the speed and flexibility of a drone.
Together with the University of Cincinnati, Amp Electric Vehicles supports this new operating model by developing a drone that deploys from a compartment in the roof of an electric delivery truck. A new Mercedes Benz concept vehicle called "Vision Van" also includes roof-top drones [5] and UPS recently tested launching a drone from the roof of a delivery van [21].
Effectively combining a drone and a truck gives rise to a new planning problem that we call the traveling salesman problem with drone (TSP-D). The problem aims to find a route for both the delivery vehicle and the drone that minimizes the total joint time to serve all delivery tasks. Due to its limited capacity, the drone has to return to the vehicle to pick up the parcel before each new delivery. For this reason, the route of the drone needs to be synchronized with that of the truck. Figure 1 shows an example solution of the TSP-D.
The problem has only recently started to receive some attention from the transportation optimization community. Wang et al. [22] and Poikonen et al. [18] analyze the theoretical benefits of using one or more drones together with one or more delivery vehicles to make deliveries. Agatz et al. [1] experimentally analyze the time savings of using truck and drone instead of using truck only. Carlsson and Song [4] use continuous approximation to derive analytical formulas to estimate the expected delivery costs in this setting. Table 1 provides an overview of the different solution approaches that have been proposed in the literature for several variants of the problem. We see that up to now, most research has primarily focused on heuristic approaches. Exact solution approaches based on mathematical programming so far are only capable of solving small instances of the problem, Exact methods Integer linear programming [1] Dynamic programming (this paper) Approximation algorithms Improvements starting from TSP-tour [1] Heuristic methods Simple heuristics [7,16] Genetic algorithms [15] Simulated annealing [19] that is, up to 10 locations [1]. In this contribution, we propose an exact approach based on dynamic programming that is able to solve larger instances. For the classic traveling salesman problem (TSP), dynamic programming approaches were first proposed in Held and Karp [10] and Bellman [3]. For the general TSP without additional assumptions, this is the exact algorithm with the best known worst-case running time to this day [2]. Dynamic programming approaches have been proposed for various variants of the TSP, including the single vehicle dial-a-ride problem [20], the time-dependent TSP [12] and the TSP with time window and precedence constraints [14]. Dynamic programming approaches also have been used as the basis for heuristics for various TSP and VRP problems [11,13].
In this paper, we introduce a 3-pass dynamic programming approach to solve the TSP-D problem and extend the last pass of this approach to an A* algorithm. Furthermore, we apply these exact algorithms to the problem where we restrict the number of locations the truck may visit while separated from the drone. These restrictions result in shorter computation times, but come at the cost of potentially removing the optimal solution from the solution space. We evaluate the different approaches in an extensive computational study comparing run times and solution quality.
The remainder of this paper is organized as follows. In Section 2, we formally define the TSP-D. In Section 3, we develop a dynamic programming algorithm for the TSP-D and also present two ways to speed up the algorithm. Section 4 presents the results of an extensive numerical study. Finally, in Section 5, we offer some final remarks and directions for future research.

PROBLEM DESCRIPTION
The TSP-D can be modeled as a set V of n locations, which include a depot v 0 and n − 1 customer locations. Furthermore, two cost functions c, c d : V 2 → R model distances or travel times between the locations. We generally assume that c(v, w) stands for the driving time of the truck driving from v to w, and c d (v, w) stands for the time it takes the drone to fly from v to w, but other types of nonnegative cost functions can be considered as well.
The objective of the TSP-D is to find the minimum cost tour which serves all customer locations by either the truck or the drone. We assume that the drone has unit-capacity and has to pick up a new parcel at the truck after each delivery. Moreover, the pickup of parcels from the truck can only take place at the customer locations, that is, the drone can only land on and depart from the truck while it is located at a node. This assumption is common in the literature, and can be justified by the fact that (1) launching and receiving drones on a moving truck is technologically still very challenging [17] and (2) parking at intermediate points on the route may not always be possible.
We define a constant , that relates the speed of the drone and the speed of the truck to each other such that the drone can be at most times faster than the truck for a pair of locations. Formally, we say that is the minimum value for which c d (v,

FIGURE 2
An operation with a combined node serving as start node (1), a combined node serving as end node (4), a drone node (3), and a truck node (2) w) ≥ c(v, w) for every pair of locations v, w ∈ V. The special case where c d (v, w) = c(v, w) for any pair of locations can be interpreted as a situation where the drone and truck travel via the same network, but the drone is a factor faster than the truck. For a given tour, we distinguish between the following types of nodes: Drone node: a node that is visited by the drone separated from the truck Truck node: a node that is visited by the truck separated from the drone Combined node: a node that is visited by both truck and drone To compute the time needed for a tour, we have to consider that the two vehicles need to be synchronized, that is, they have to wait for each other at the combined nodes. Hence, we decompose the tour into a sequence of operations (o 1 , o 2 , …, o l ).
An operation o k consists of two combined nodes, called start node and end node, at most one drone node, and potentially several truck nodes. If the operation contains a drone node, the drone departs from the truck at the start node, then serves the drone node and meets up with the truck again at the end node. The truck can travel directly from the start node to the end node or can visit any number of truck nodes in between. When the start node is identical to the end node and the operation does not contain any truck nodes, the truck waits at the start/end node for the drone to deliver the package and return. If the operation does not contain a drone node, the drone travels on the truck directly from start node to end node. Figure 2 provides an example of an operation [1]. We obtain the time duration t(o) of an operation o as the maximum over the truck driving time needed to drive between the nodes as described above, and the drone flying time from start node to drone node to end node.
To compute the time duration of the full tour, we simply sum up the time durations of all operations it contains.

SOLUTION APPROACHES
In this section we present dynamic programming approaches for the TSP-D based on the well-known Bellman-Held-Karp dynamic programming algorithm for the TSP [3,10]. Dynamic Programming can be used to find optimal solutions to problems, if the optimal solutions have an optimal substructure, that is, if they can be split up in an optimal solution to a smaller subproblem and some contribution that is independent from the solution to the smaller subproblem. Our approaches consist of the following three passes: 1. Enumerate the shortest paths for the truck for every start node, end node, and set of truck nodes covered by the path.

2.
Combine these truck paths with drone nodes to obtain efficient operations, that is, operations that represent the least costly way to cover a set of nodes with an operation. 3. Compute the optimal sequence of these operations such that all locations are covered and the sequence start and ends at the depot.
We now first summarize the well-known Bellman-Held-Karp dynamic programming algorithm for the TSP [3,10] in Section 3.1.
Afterwards, we describe how we modify this procedure to execute the three passes. The first and second passes are described in Sections 3.2 and 3.3. For the third pass, we describe the general idea in Section 3.4 and two different variants of how to implement it in Sections 3.4.1 and 3.4.2. Furthermore, we discuss to restrict the number of nodes in an operation in Section 3.5.

Dynamic programming approach for the standard TSP
The optimal solution for the regular TSP can be considered as a path that starts at an arbitrary origin point v, visits all locations in V and ends at v. The costs of the final solution, which can be distance, time, emissions, or anything else that can be modeled as the sum of individual steps, is denoted as D TSP (V, v).
For a set S ⊂ V that contains v and a node w ∈ S⧵{v} let D TSP (S, w) be the costs of the shortest path that starts at the origin point, visits all locations in S and ends at location w. The shortest path of this subproblem consists of two parts: the last arc on the path that goes from some location u ∈ S to w, and a shortest path that starts at the origin point v, visits all locations in S⧵{w}, and ends in u. If the path for the full set S does not contain the shortest subpath for S⧵{w}, we can obtain a better solution by replacing this subpath with the shortest one, contradicting the fact that the path for S was shortest. As a consequence, the subproblem D TSP (S, w) can be solved by considering all arcs (u, w): u ∈ S⧵{w}, and adding each of those arcs to the solution for the smaller subproblem D TSP (S⧵{w}, u).
This structure, in which the optimal solution of a subproblem can be expressed in terms of smaller subproblems can be formalized by means of a recursive formula as follows: As there 2 n possible subsets of V and n locations in v, we can see that there are at most n ⋅ 2 n subproblems to solve here. For each subproblem in the recurrence relation we only consider at most n smaller subproblems, and as a result this algorithm can be run in O(n 2 ⋅ 2 n ) time. A possible implementation of an algorithm that exploits this structure in a bottom up fashion is presented in Algorithm 1.

First pass
As the first pass of our three-pass approach, we adapt the dynamic programming approach for the regular TSP to find the shortest path for the truck for every start node v, end node w, and set of truck nodes S covered by the path. Each of these subproblems can be solved based on smaller subproblems using the same idea as for the regular TSP: The number of subproblems we need to solve to compute table D T is 2 n ⋅ n 2 . By extending the approach presented in Algorithm 1, the computation of the table can be performed in O(2 n ⋅ n 3 ) time.

Second pass
In the second pass, we combine the truck paths with drone nodes to obtain efficient operations, that is operations that represent the least costly way to cover a set of nodes S with an operation for given start node v and end node w. An optimal TSP-D tour can be constructed using efficient operations only as a TSP-D tour will not become longer when an inefficient operation is replaced by an efficient operation.
To find efficient operations associated with every triplet (S, v, w), where the operation starts at location v, ends at location w and visits all locations in S, we expand the table of truck paths to create a table of operations, by adding the drone movement on top of the truck paths. The problem of finding an efficient operation that starts in v, ends in w and visits all locations in S can be broken down into the problem of selecting a single drone node d from S⧵{v, w} and combining the flight of the drone via this location with the shortest truck path that starts in v, ends in w and visits all locations in S⧵{d}. As the subproblem of finding a shortest truck path was already solved when we constructed table D T , the table D OP containing the value of an efficient operation for every triplet (S, v, w) can be computed as follows: When we assume that all truck tours D T have been computed prior to the construction of D OP , we can compute the D OP table in O(2 n ⋅ n 3 ) time, as there are at most 2 n ⋅ n 2 subproblems, that all can be solved in O(n) time.

Third pass
In the third pass, we compute the optimal sequence of efficient operations that covers all locations and starts and ends at the depot. To do this, we iteratively solve the subproblem of finding the best sequence of operations starting at the depot, covering set of nodes S, and ending in node w. That is: w is the end node of the last operation in the sequence, we call it sequence end node in the following.
In Sections 3.4.1 and 3.4.2 we present two variants of this approach which we compare later in our numerical experiments. These variants can be considered as different methods to find a shortest path from source to sink in a so-called state-graph. In a state-graph, vertices represent subproblems (called states) and the arcs correspond to dependencies between states in the computation.
Thus, in the third pass of our algorithm, each vertex corresponds to a pair (S, w) of locations already covered in our sequence of operations and sequence end location w. We aim to find a shortest path from the source state ({v 0 }, v 0 ) where we have not covered any locations yet and start from the depot v 0 to the sink state (V, v 0 ) which covers all locations and ends in the depot v 0 .
The arcs of the graph represent going from one state to another by executing one more operation. That is, state (S 1 , w 1 ) is connected to state (S 2 , w 2 ) by an arc if S 1 ⊊ S 2 . The cost of the arc is the cost of an efficient operation with start node w 1 , end node w 2 covering nodes S 2 ⧵S 1 . This cost is given by D OP (S 2 ⧵S 1 , w 1 , w 2 ) which we compute in the second pass. The structure of this state-graph is visualized in greater detail in Section 3.5, where we consider how restrictions of the number of truck-only nodes in an operation affect the structure of the state-graph.
Finding an optimal TSP-D tour now corresponds to finding a shortest path from source ({v 0 }, v 0 ) to sink (V, v 0 ) in this state-graph with respect to the described arc costs. We present two different approaches to find such a shortest path. The first implementation corresponds to a "standard" dynamic programming approach, while the second one corresponds to a A* algorithm Hart et al. [8,9]. We describe these two approaches in more detail below.

Standard dynamic programming implementation of third pass
We compute the costs of all states using a table D. To compute the costs of a state (subproblem) D(S, w) we look for a minimum cost sequence of efficient operations that starts at v 0 , ends at w and visits all locations in S. This is specified in (4), where we compute the cost of a state as the minimum over the cost of possible preceding states plus the arc costs of the connecting arc connecting the two states (which is the cost of the corresponding efficient operation).
A straightforward analysis of the runtime of this algorithm tells us that there are at most 2 n ⋅ n subproblems and that we have to do at most 2 n ⋅ n work to solve a subproblem (assuming the smaller subproblems have been solved), resulting in a runtime of O(4 n ⋅ n 2 ). However, with a more careful analysis we can see the algorithm actually performs better. To prove this, we apply the binomial theorem, which reads: time. In order to solve all subproblems, we need to solve the subproblems for every i from 1 to n. Since we overestimate the number of subproblems when we let i range from 0 to n rather than 1 to n, we can safely apply the binomial theorem: Here we first include a factor 1 n−i in the analysis, which allows us to apply the binomial theorem to the result. We have three passes needed to compute the final TSP-D tour, where the first pass and second pass both take O(2 n ⋅ n 3 ) time. However, we can . As a factor n grows asymptotically slower than ) n , the O(3 n n 2 ) running time is the dominant one among the three passes. Pseudo code that implements this approach in a bottom-up fashion is presented in Algorithm 2.

A* implementation of third pass
One possible disadvantage of the approach described in Section 3.4.1 is that even if an instance has many subproblems that are clearly not relevant to the optimal solution, they are still solved. For this reason, we also consider an approach that attempts to skip irrelevant subproblems, based on ideas from informed search. By only generating states that seem relevant to finding a good solution, it may not be necessary to solve all 2 n ⋅ n subproblems, but significantly less. The idea is to estimate how good a partial solution is, and guide the search towards the optimal path. This way, we potentially limit the number of subproblems that needs to be considered in the third pass of the algorithm. We do this by executing an A* algorithm [8,9] to find a shortest path from source ({v 0 }, v 0 ) to sink (V, v 0 ) on the state-graph.
The key idea of A* is to introduce a function that estimates the distance from a given state to the sink state. By only expanding states for which the sum of the path to that state and the value of the function is minimal, the algorithm does not generate states that are far from the optimal path to the sink state. It was proven by Dechter and Pearl [6] that if the function never overestimates the distance to the goal state, this approach finds an optimal solution. This is the case with the function that we use in our approach.
To obtain a lower bound on the path from a given state to the goal state, we compute 1 2+ times the length of a minimum spanning tree connecting the last location visited in our current state, the locations that are not covered yet, and the depot. In Agatz et al. [1], we show that in instances where distances fulfill the triangle inequality, a solution (, ) to the TSP-D, consisting of a TSP tour  constructed with the minimum spanning tree heuristic, is a (2 + )-approximation for the TSP-D.
Here, denotes the minimum value for which c d (v, w) ≥ c(v, w) for every pair of locations v, w ∈ V, as introduced earlier.

Restricting the number of operations
Instead of creating all possible drone operations, we may limit the set of operations to reduce the number of subproblems we need to consider in the first and second passes. This directly influences memory requirements and the running times of the algorithm. In particular, we consider restricting the maximum number of truck-only nodes per operation to k. This helps reduce the amount of required work in all passes of the algorithm.
In the small example given in Figure 3, the number of additional arcs that are introduced when we increase k from 0 to 1 and 2 is quite small. Solid arcs are included when k ≥ 0, dashed arcs are included when k ≥ 1 and the dotted arc is included when k ≥ 2. For larger instances, however, the number of introduced arcs increases rapidly. This can be observed from Table 2, where we can see that the number of arcs for an instance with 9 locations and k = 2 is already larger than the number of arcs for an instance with 10 locations and k = 0.
It is relatively straightforward to adopt our approaches to take into account these restrictions, as we need to consider only truck paths of at most 2 + k, as the start and end nodes of an operation do not count as truck nodes. Thus, we need to only compute the solution to the subproblems D T (S, v, w) where |S| ≤ k + 1, as drone nodes are added in the second phase and we have by convention v ∉ S and w ∈ S. When we compute the table D OP (S, v, w), we must take care to consider only the subproblems where |S| ≤ k + 2 if v ≠ w and |S| ≤ k + 1 otherwise.

This implies that for the first two passes we get
subproblems rather than O(n 2 ⋅ 2 n ) for the unrestricted case. For the third pass, we unfortunately still need to consider all O(2 n ⋅ n) subproblems, but the amount of work required to solve each subproblem can be reduced: instead of considering all subsets of S we only need to consider subsets T of size k + 2, rather than all subsets of S. This implies that every subproblem can be solved in O(n k+1 ) time and thus all solutions can be computed in O(2 n ⋅ n k+2 ) time, rather than in O(3 n n 2 ) time. If we for example forbid truck-only nodes and thus have k = 0, we get an O(2 n ⋅ n 3 ) algorithm.
Since A* often performs best when the branching factor is small, we expect this approach to be most advantageous when we restrict the number of nodes in an operation (small k), given the growth behavior of the number of arcs observed in Table 2. That is, while both the DP-algorithm and A* benefit from the fact that fewer arcs are evaluated when k is small, the A* algorithm additionally can take advantage of the fact that fewer states are expanded in each iteration when k is small.
It seems reasonable to assume that the number of truck nodes per operation in most good solutions for most instances will be relatively small. This is especially the case if the drone travels faster than the truck. That is, fewer truck nodes per operation implies that more work is performed by the drone and thus the advantage of parallelization is greater.
Unfortunately, restrictions on the number of truck nodes may prevent finding an optimal solution. That is, it is not difficult to construct a problem instance in which the optimal solution consists of just one operation with many truck nodes. This is for example the case if all but one node (the designated drone node) are close to the depot, while the designated drone node is located sufficiently far from all other nodes.

Theorem 3.3. For any instance of the TSP-D with symmetric drone costs, that is, c
, there is a TSP-D tour without truck nodes (i.e., every operation consists of a start node, an end node, and possibly a drone node), whose time duration is at most twice the time duration of the optimal TSP-D tour for that instance.
Proof . LetÔ = (ô 1 ,ô 2 , … ,ô l ) be an optimal TSP-D tour for the considered instance. We construct a TSP-D tour without truck nodes fromÔ by replacing each operation o =ô i for i = 1, …, l by a sequence of operations without truck nodes with at most double completion time.
We can see that this bound is tight from the example in Figure 4, where we have a depot and two customers v 1 and v 2 . In this example, we assume that the speed of the drone is equal to the speed of the truck and both customers have the same travel time from the depot. In the optimal solution, one node is served by the drone and one by the truck as both vehicles start and end at the depot. As the drone and the truck can serve the nodes simultaneously, this gives a solution value of 2. In case truck nodes are not allowed in an operation, it is not possible to parallelize the deliveries. This means that in the optimal restricted solution, we either serve the nodes sequentially with either one of the vehicles or the truck has to wait for the drone at one of the locations. Both cases result in a solution value of 4. In the first case the round trip time from the depot to either one of the locations is 2 and both locations are visited sequentially. In the second case, both vehicles travel to one location together in one time unit. At that location, the drone travels from the location to the other location in two time units, after which both vehicles travel to the depot in parallel during the fourth and final time unit. In the numerical experiments in Section 4, we evaluate the impact of these restrictions in more practical instances than the presented example.

NUMERICAL EXPERIMENTS
Using the instances presented in Agatz et al. [1] and several new ones generated using the same approach, we conduct various experiments. The first set of instances consist of points uniformly distributed in a 100 × 100 two-dimensional square with the depot at one of the corners, while the 1-center and 2-center instances have locations that are clustered around one and two centers, respectively, and we take the Euclidian distance between pairs of locations. As a consequence the triangle inequality holds, so we can use the minimum spanning tree lower bound for the A* algorithm.
With these instances, we assess our two implementations of the exact dynamic programming approach: the standard dynamic programming approach (DP) and the A*. Moreover, we also evaluate the impact of restricting the number of truck nodes per operation k, that is, 0, 1, 2, 4. We will refer to the unrestricted case as k = ∞. Unless stated otherwise, we assume that the drone is twice as fast as the truck ( = 2).
Experiments were executed on the Lisa computer cluster of SURFsara. During each run, 10 instances were solved in parallel by a cluster node equipped with an Intel Xeon E5-2650 v2 CPU (Intel, Santa Clara, California). Depending on the size of the instances solved, nodes with either 32 GB or 64 GB of RAM were used, with either 2 GB or 6 GB of RAM assigned per instance. The computer cluster runs on Debian Linux. The algorithms were implemented in Java, and executed using the Oracle JDK version 1.8.0_40 (Oracle Corporation, Armonk, New York) on the cluster. We compare the solution times of our approaches to the time needed to solve the integer linear program (IP) from [1]. The IP was solved using CPLEX 12.6.3 (IBM, Armonk, New York). Minimum spanning trees were computed using Kruskal's algorithm on the Delaunay triangulation of the geometric instances. The Delaunay triangulations were computed using the Java Topology Suite version 1.13 (Eclipse Foundation, Ottawa, Ontario, Canada).
In the first set of experiments, we compare the running times of the two different variants of the dynamic programming approach DP and A* for different restrictions on the number of truck nodes k. As an additional benchmark, we also compare the results from the IP as presented in Agatz et al. [1]. This IP uses binary decision variables for each feasible drone operation and constraints that ensure that the associated TSP-D tour covers all nodes. These constraints include subtour elimination constraints for every subset of locations. In the unrestricted case k = ∞, a variable for all efficient operations must be added, all of which can be generated by the first two passes of our dynamic programming case. For restricted cases, many operations are considered and thus fewer variables need to be added to the model. Unfortunately, the number of subtour elimination constraints does not decrease when the number of truck nodes is restricted as they are generated for every subset V, independent of the number of operations. Figure 5 provides the running times of the different algorithms for the uniform instances with 10 nodes from Agatz et al. [1]. We see that the dynamic programming approaches consistently outperform the IP. The main reason for this bad performance is that the IP contains not only an exponential number of variables, obtained by passes one and two of the dynamic programming approach, but also an exponential number of constraints (n ⋅ 2 n to be precise). This results in a large IP, even if the number of operations is reduced.

Comparing running times
Comparing the running times of the dynamic programming approaches, we see that the DP is faster than the A*. This suggests that savings of potentially finding the optimal solution faster do not offset the costs for the numerous additional lower bound calculations in the A* for these instances.  Table 3 presents the running times for larger instances of up to 20 nodes. As expected, the results show that restricting the number of allowed truck nodes per operation k significantly reduces the running times. In all but one set of instances, we see that the running times strictly increase with k. As an example, we see that it takes less than 30 minutes to solve the n = 20 instances with k = 0 while it takes more than 10 hours to solve the same instances with k = 2.
For the instance with 10 nodes, however, the unrestricted cases (k = ∞) have slightly smaller running times, on average, than the approaches with k = 4. The reason for this is that our implementation uses different data structures to build the table of operations created in passes one and two depending on whether or not restrictions are in place. For the unrestricted case it is relatively straightforward to work with an arrays indexed by bitwise representations of the sets. In case of restrictions this approach breaks down and hash-based sparse data structures were used. These sparse data structures have more overhead and as the restrictions do not eliminate many operations in the smaller instances, the savings obtained from building fewer operations are less than the additional costs due to the overhead of the hash-based data structure.
Comparing between the different exact approaches, we see that the DP is faster for smaller instances and A* is faster for larger instances in all cases with restrictions. However, in the unrestricted k = ∞ cases the DP always outperforms the A*. A possible explanation for this, is that without restrictions there is already an exponential number of operations (O(2 n n)) to consider in the first step. As a consequence, all relevant subproblems are immediately generated. Furthermore, the lower bound for each subproblem is immediately computed, resulting in a lot of overhead compared to the DP. For cases with restrictions, fewer search states are reachable from the initial search state and as a result, we observe that A* performs better than DP starting at instances of size 14 with k = 0, while it is already faster for instances of size 11 with k = 4. Table 4 shows the impact of the restrictions on the solution quality for the uniform instances. Therefore, we compare the optimal solutions of the approaches that allow at most k truck nodes per operation Z k with the optimal solution for the problem without restrictions Z ∞ , i.e., Z k −Z ∞ Z ∞ × 100. We provide the following three measures for the solution quality: • Δ (%): the average relative deviation from the optimal solution Z ∞ • Max (%): the maximum relative deviation from the optimal solution Z ∞ • # opt: the number of times that the algorithm with restrictions finds the optimal solution Z ∞ As expected, we see that the solution quality improves with k. The worse performance is associated with the approach that does not allow any truck nodes (k = 0). In this case, the average optimality gap ranges from 2.4% to 5.9% with a maximum gap of 11%. While this is a substantial gap, it is much better than the theoretical worse-case gap of 50% from Theorem 3.5. We see that the maximum optimality gap quickly goes down to 5.6% for k = 1 and only 2.8% for k = 2. For k = 4, we find the optimal solution in 69 of the 70 instances and have a maximum gap of less than 1%.

Impact of restrictions
Moreover, we see that the performance is not that sensitive to the size of the instance n. This suggests that the costs of not allowing large operations with many truck nodes is relatively small, even for larger instances. One potential explanation for this is that while the larger instances may potentially involve larger operations, there are also many more good solutions possible with less truck nodes.  Next, we consider the impact of the restrictions on the solution quality for different relative drones speeds ( ) and different types of instances. Figure 6 shows that limiting k has a larger negative impact at lower drone speeds. Especially the difference between k = 0 and k = 1 is substantial. This is intuitive as it is more beneficial for the truck to visit additional nodes separated from the drone when the drone is relatively slow. This suggests that the restrictions are most suitable in situations where the drone is faster than the truck. Figure 7 shows a similar trend as the performance at lower k-values is worse for the clustered instances than the uniform instances. In clustered instances, it is beneficial to have the truck serve the clustered nodes in the centers while the drone visits the outliers. However, this is not possible when we restrict the number of truck nodes in an operation.
To provide more insight into the characteristics of the solutions for different truck node restrictions, we compare the total number of drone and truck nodes in the solutions as a percentage of the total number of nodes n in the instance in Figure 8. Looking at the impact of k, we see that the number of drone nodes increases when allowing less truck nodes. This is intuitive as larger operations with one drone and several truck nodes are replaced by several smaller operations in the case with restrictions. As noted before, the figure shows that this effect is more pronounced in the clustered instances, that is, 1-center and 2-center. In line with this observation, we see relative more drone nodes in the uniform instances than the clustered instances in the unrestricted cases. Figure 9 provides a different way of looking at this behavior. Here, we see the total waiting times (summed over all operations) of the truck and the drone relative to the total time required to serve all nodes. The results show that, without restrictions, the drone generally incurs more waiting time than the truck. This is related to the fact that our drone is faster than the truck. As expected, we see that the waiting time of the truck decreases with the maximum number of allowed truck nodes per operation. When comparing the different instance types, we see that the waiting times for the drone are longer in the uniform instances than the clustered instances. A potential reason for this is that the drone makes shorter flights in the uniform case which together with its higher speed leads to more waiting.

CONCLUSIONS AND FUTURE RESEARCH
We show that by using dynamic programming, we can solve larger problems than with the mathematical programming approaches that have been presented in the literature so far. Moreover, we show that restrictions on the number of operations can help significantly reduce the solution times while having relatively little impact on the overall solution quality. While we have considered restricting the number of truck nodes per operation to reduce the running times of our algorithm, future research could investigate and develop extensions of this approach. One promising direction for future research is to study different ways to identify "bad" operations upfront, for example, in which either the drone or the truck incurs a lot of waiting time. Also, we have considered simple operations with at most one drone. If we assume that all drones in an operation must start and end at the same node, it is straightforward to include multi-drone operations in the second pass of our dynamic programming approach. However, it is not that clear how to incorporate multiple drones if each drone can start and end at different nodes. This is an interesting area for future research.
Furthermore, it would be interesting to investigate to what extent the techniques developed for the TSP-D could be extended for a problem version in which drones can depart an any part of the route, or at specific locations (which do not need to be restricted to customer locations).