BEST ROUTE FOR TREEHUGGER DELIVERIES

Martin, Jesus Lemuel Jr.

INTRODUCTION

A certain company is a distributor of pencils, ballpens and other similar products to various bookstores located around Metro Manila. A single delivery truck will load its cargo from a single warehouse near Robinson’s Pioneer in Mandaluyong, and from there will proceed to deliver the products to each bookstore as needed.

The problem is finding a route from the warehouse to all the store locations and back that is optimal; that is, a route that is covered in the least amount of time. The system is bounded naturally by the fact that all delivery locations are restricted to the Metro Manila region only, and thus all considerations will flow from that particular area.

The model will take into consideration the different possible routes and output the optimal route the truck should take while considering several factors such as traffic, color coding, etc. The problem is restricted to one truck available from 8:00 to 18:00 six days a week, whereas the individual stores accept deliveries from 9:00 to 16:30 daily, with a total of 55 locations.

CONSIDERATIONS

The problem is a particular instance of the Traveling Salesman family of problems that are known to be NP hard. The problem consists of constructing a Hamiltonian cycle of minimum weight within a complete weighted directed graph. A brute force solution of this problem can be done in O(n!) which is of courseincreasingly computationally complex for large values of n. Thus several heuristic approaches have been considered that would hopefully converge to a near-optimal solution.

In this particular case, we have several considerations. Assuming that the locations are the vertices, each directed edge would represent a route leading from one vertex to the other, where the direction indicates the starting and ending point of the route. We consider now a route’s non-viability, denoted by NV, as being equal to the scaled normalized value of that route’s distance multiplied by its density ranking. That is, a high NV would indicate that the road has a higher density ranking and/or greater distance, hence not making it desirable to pass there. We denote the route’s viability by V and it is equal to 1 – NV. These two values highlight two things simultaneously: V indicates which roads are more viable (less distance or traffic density) whereas NV indicates which roads are not desirable.

Thus, solving for best routes using V and NV (meaning, paths with maximum weights within V and NV) will give roads that are more likely to be part of the optimal Hamiltonian cycle and roads that are definitely not part of the optimal Hamiltonian cycle.

We determine these routes by the process of reinforcement learning. The paths generated by reinforcement learning will yield several paths that converge to specific routes. From there, the paths to be considered can be narrowed down to generate a “better” Hamiltonian cycle for the graph.

Three heuristic methods will be used to generate possible optimal cycles. The first method will generate the optimal paths from each vertex converging towards the edge of maximum viability (or non-viability). This method will generate a set of candidate edges O from V that are more likely to be part of the optimal cycle (by virtue of the fact that they have maximum weight, thus they are more likely to be part of the optimal cycle). Done to NV, it will yield the opposite result; that is, the set of candidate edges P from this portion will yield the edges which are most likely to never appear in the optimal cycle, by the same argument. From there, we count the frequency of each edge in O or P and construct a weight aggregate matrix W, where entries are between 0 and 1. Edges that do not appear in O or P are given a value of 0.5, implying that we do not know anything about the effect of these edges to the solution. Edges that appear in O frequently have values closer to 1, indicating that they are more likely to be chosen when generating a cycle, whereas the opposite will happen for P. From there, we construct cycles using W and compute the net viability of the generated cycle. Continuous iterations of this path generating procedure will give us a (surely) suboptimal solution.

A second heuristic method involves an iterated reinforcement learning approach. From the viability matrix V we generate an optimal path P1 of length k1 starting at an initial vertex. From there, we erase all columns of V corresponding to the vertices in the current path and re-compute a new optimal path P2 starting from the end vertex of P1 and traversing the other vertices. This will have length k2. This process will be continued until all vertices are exhausted, and a (surely) suboptimal solution will be generated.

The last method is similar to the second method; however, the path iteration will consider both end vertices of the current path; from there, the better path will be chosen until a Hamiltonian cycle is generated. For example, P2 would be the better path (by viability) between the path generated from the starting vertex of P1 (excluding vertices already in P1) and the path generated from the terminal vertex of P1.

Note that certain extensions of this problem would be required to solve the bigger problem of scheduling truck deliveries. We would have to construct k cycles of net minimum weight where each vertex appears in exactly one cycle; furthermore, the density ranking matrix should be updated after certain periods to indicate the change in traffic flow from one period to another. Implementing these additional constraints leads to great problem complexities, thus they will not be considered for now.

CONJECTURES

Both methods will yield Hamiltonian cycles (by the algorithm construction). However, whether they will converge to the optimal solution or not is debatable. It is easy to see that both methods will surely be suboptimal since they include at least one portion of the optimal Hamiltonian cycle (due to the initial generated sub-path of maximum viability).

We conjecture that the first method converges better for small values of n, due to the fact that the computation effectively cuts a great portion of the edges to consider, while at the same time increasing the likelihood that edges that are part of the optimal cycle. Iterations of this method will yield suboptimal solutions with greater likelihood.

We conjecture that the convergence of the second and third methods to “better” solutions will depend on the length of the initial path generated. Note that the initial path generated has the greatest likelihood of being in the optimal Hamiltonian cycle; thus, subsequent iterations of reinforcement learning on the remaining vertices will be more accurate if there are a lesser number of iterations left to perform.

ALGORITHM

Given distance matrix D, density ranking matrix T and scaling factor SF (which will magnify/minimize our values for easier viewing) we have

NVij = SF*Dij*Tij/max(Dij*Tij)

Vij = SF – NVij , if i > j ; 0 if i = j

(see functionVNV() orviability())

METHOD 1

From there, we compute our learning matrices R and Q from V and NV, respectively using reinforcement learning (see function learnRQ()). From there, we generate our optimal paths O and P for V and NV (respectively) (see function smalloptimal()) and construct the weight aggregate matrix W based on the edge frequency of each edge in O or P (see function edgeaggregate()). From there, we construct a path from the first vertex to each vertex and back (see function generateHCycle()) and compute the net viability of the constructed path. Finally, iterations of this method (see function generateBCycle()) will hopefully converge to a good solution.

METHOD 2

From there, we compute our learning matrices R from V using reinforcement learning (see function learnR()). From there, we generate our optimal paths Pi for V (see function koptimal()) by starting from the first vertex, getting the optimal path, zeroing out the columns in V corresponding to the vertices in the path, then repeating the process until we have exhausted all vertices (see function generateHCycleShift()). The computed cycle will be the final solution.

METHOD 3

From there, we generate our initial optimal path P1 as in Method 2. Then subsequent iterations will test the initial and final vertex of our current path; the better path (with higher viability) will then be added to the current path until all vertices are exhausted. The computed cycle will be the final solution.

RESULTS

The algorithm was implemented in SciLab 5.3.0, which was chosen for easy adaptability to differently-sized matrices and the efficiency of its computational capabilities. Method 1 was implemented in the file ‘tsp.sci,’ Method 2 was implemented in the file ‘tsp2.sci’ and Method 3 was implemented in the file ‘tsp3.sci.’

Two matrices, one of dimension 8 and another of dimension 20, were used as test cases for both methods. Eight rounds of testing were generated for each case, subject to the following cases:

METHOD 1 – SF 100, LF 0.8, CF 0.5, 200 learning iterations, 100 path iterations

METHOD 2 – SF 100, LF 0.8, 200 learning iterations

(See Results folder for trial results)

After eight rounds, the first method converged to a cycle with a net viability of approximately 592 for the 8x8 matrix.The second and third methods yielded a cycle with a net viability of approximately 604. However, for the 20x20 matrix, the first method did not converge to a best solution as quickly, and generated many variations differing by more than 50 points. The best solution offered after eight rounds of testing had a net viability of 1803.2795, whereas the second method would yield a result of 1809.0792and the third method would yield a result of 1814.9435.Thus it seems that the second and third methods yield better solutions at a faster rate than the first method.

CONCLUSIONS AND RECOMMENDATIONS

Both methods, while providing sub-optimal solutions, are not likely to converge to the optimal solution (or would probably converge only after a long time). The primary concern of the two methods could be summarized as tractability versus optimality. The first method, which is likely to converge to a near-optimal solution, is not as tractable as the other methods; however, the second method may be too conservative in its solution with regards to the optimal solution itself. The third method remedies this conservativeness with a simultaneous R learning approach, yet the final solution may still be more conservative than the optimal solution. Further recommendations that the authors could make regarding extensions of this program would be a combination of these methods. A possible method would make use of iterated reinforcement learning methods but this time taking into consideration the effect of the weight aggregate matrix on subsequent iterations, as opposed to eliminating certain vertices altogether.

APPENDIX: PROGRAM MANUAL

Both methods require SciLab to be installed to run. After installing SciLab, open scilab-5.3.0 and go to
“File -> Open” and choose to open ‘tsp.sci’ or ‘tsp2.sci.’ Go to the opened code file and press “Ctrl+Shift+E” to execute the program in the console. Further instructions below are to be typed into the console itself.

METHOD 1

Given the file ‘tsp.sci’, this will be a step by step demo of what the program implementation will do (and as given by function testTSP(example) )

We use the case for when example = 1.

We are given the distance matrix D and density ranking matrix T as follows:

T=[0,3,5,4,4,4,4,2;3,0,3,3,4,4,3,3;5,3,0,2,5,5,3,2;4,3,2,0,4,4,3,3;4,4,5,4,0,1,4,4;4,4,5,4,1,0,4,4;4,3,3,3,4,4,0,3;2,3,2,3,4,4,3,0];

D=[0,4.53,2.7,1.54,0.99,1.35,0.77,3.43;4.53,0,2.06,2.87,4.07,4.62,5.3,4.21;2.7,2.06,0,1.69,3.21,3.76,3.47,2.15;1.54,2.87,1.69,0,2.07,2.62,2.31,2.52;0.99,4.97,3.21,2.07,0,0.55,1.01,4.8;1.35,5.52,3.76,2.62,0.55,0,1.34,4.59;0.77,5.3,3.47,2.31,1.01,1.34,0,4.2;3.43,4.21,2.15,2.52,4.8,4.59,4.2,0];

The only initial input for the algorithm to run would be a distance matrix D and a density ranking matrix T.

From there, the function viability(D,T,SF) will compute the viability and non-viability matrices V and NV and scale them to the scaling factor SF. (In this case, SF=100)

Afterwards, the function learnRQ(V,NV,LF,iter) will compute the learning matrices R and Q for V and NV respectively with a learning factor of LF and within iterrepetitions. (In this case, LF=0.8, iter=200)

From there, the optimal paths for V and NV are obtained from R and Q and stored in the matrices O and P by the function smalloptimal(R,Q). This O and P will then be used to calculate the weight aggregate matrix W which is given by edgeaggregate(O,P). The entries of W are determined by the frequency of each edge within the optimal paths O and P. The more an edge appears in O, the greater its entry in W, and vice versa for an edge in P. Edges that appear in neither O nor P are assigned a default value of 0.5.

To account for incidents when edges have equal weight aggregate value, the choice factor CF will be used to randomize which edge to pick, thus ensuring that the solution for each computation will (quite possibly) not be the same solution. In effect, the function generateHCycle(W,V,CF) will generate a Hamiltonian cycle and its net viability based on the weight aggregate matrix W. (In this case, CF=0.5)

When example=1, we generate a single path using the above process. However, when example=2, testTSP(example) will conduct trials to compute a best possible Hamiltonian cycle using the function generateBCycle. In this case, the same matrices D and T are used but 100 trials are considered.

The final testTSP(example), example=3, is the same case as example=2 but for a 20x20 matrix.

METHOD 2:

Given the file ‘tsp2.sci’, this will be a step by step demo of what the program implementation will do (and as given by function testTSP2(example) )

We assume the same matrices used from Method 1. In this case, example=1 uses the 8x8 matrix while example=2 uses the 20x20 matrix.

Once again, the only input would be D and T.

Compute the viability V via the function viability(D,T,SF) and use the function learnR(V,LF,iter) to output the learning matrix R after iter repetitions.

From there, the function koptimal(R,k) will compute an optimal path from k using the matrix R.

To obtain a Hamiltonian path, we use the function generateHCycleShift(V) which will take as input the viability matrix V and create the path by a continuous process. This process generates the learning matrix R from V, computing the optimal path Pi from R and storing it in the main path P, shifting the starting point to the last point of Pi, and re-computing R by taking the vertices of Pi out of consideration (zeroing out the columns of V corresponding to the vertices of Pi). This will exhaust all vertices until the generated path P is a Hamiltonian cycle. The examples use SF=100, LF=0.8 and 200 learning iterations as the set values.

METHOD 3:

Given the file ‘tsp3.sci’, this will be a step by step demo of what the program implementation will do (and as given by function testTSP3(example) )

We assume the same matrices used from Method 1. In this case, example=1 uses the 8x8 matrix while example=2 uses the 20x20 matrix.

Once again, the only input would be D and T.

Compute the viability V via the function viability(D,T,SF) and use the function learnR(V,LF,iter) to output the learning matrix R after iter repetitions.

From there, the function koptimal(R,k) will compute an optimal path from k using the matrix R.

To obtain a Hamiltonian path, we use the function generateDblEndCycle(V) which will take as input the viability matrix V and create the path by a continuous process. This process generates the learning matrix R from V, computing the initial optimal path P1 from R and storing it in the main path P, shifting the starting point to the last point of Pi, and re-computing R by taking the vertices of P1 out of consideration (zeroing out the columns of V corresponding to the vertices of P1). Afterwards, the process is repeated for the initial and terminal vertices of P, and the better path generated will be the Pi to be appended to P. This will exhaust all vertices until the generated path P is a Hamiltonian cycle. The examples use SF=100, LF=0.8 and 100 learning iterations as the set values.

RESOURCES

SciLab is available at