1 Solving the Equality Generalized Traveling Salesman Problem Using the Lin-Kernighan-Helsgaun Algorithm Keld Helsgaun E-mail: [email protected] Computer Sc...

0 downloads 56 Views 2MB Size

MAY 2014

ROSKILDE UNIVERSITY

COMPUTER SCIENCE RESEARCH REPORT #141

c 2014 Copyright Keld Helsgaun

Computer Science Roskilde University P. O. Box 260 DK–4000 Roskilde Denmark Telephone: Telefax: Internet: E-mail:

+45 4674 3839 +45 4674 3072 http://www.ruc.dk/dat en/ [email protected]

All rights reserved Permission to copy, print, or redistribute all or part of this work is granted for educational or research use on condition that this copyright notice is included in any copy. ISSN 0109–9779

Research reports are available electronically from: http://www.ruc.dk/dat en/research/reports/

Solving the Equality Generalized Traveling Salesman Problem Using the Lin-Kernighan-Helsgaun Algorithm Keld Helsgaun E-mail: [email protected] Computer Science Roskilde University DK-4000 Roskilde, Denmark1

Abstract The Equality Generalized Traveling Salesman Problem (E-GTSP) is an extension of the Traveling Salesman Problem (TSP) where the set of cities is partitioned into clusters, and the salesman has to visit every cluster exactly once. It is well known that any instance of E-GTSP can be transformed into a standard asymmetric instance of the Traveling Salesman Problem (TSP), and therefore solved with a TSP solver. This paper evaluates the performance of the state-of-the art TSP solver Lin-Kernighan-Helsgaun (LKH) on transformed E-GTSP instances. Although LKH is used as a black box, without any modifications, the computational evaluation shows that all instances in a well-known library of benchmark instances, GTSPLIB, could be solved to optimality in a reasonable time. In addition, it was possible to solve a series of new very-large-scale instances with up to 17,180 clusters and 85,900 vertices. Optima for these instances are not known but it is conjectured that LKH has been able to find solutions of a very high quality. The program is free of charge for academic and non-commercial use and can be downloaded in source code. Keywords: Equality generalized traveling salesman problem, E-GTSP, Traveling salesman problem, TSP, Lin-Kernighan Mathematics Subject Classification: 90C27, 90C35, 90C59

1. Introduction The Equality Generalized Traveling Salesman Problem (E-GTSP) is an extension of the Traveling Salesman Problem (TSP) where the set of cities is partitioned into clusters, and the salesman has to visit every cluster exactly once. The E-GTSP coincides with the TSP whenever all clusters are singletons. The problem has numerous applications, including airplane routing, computer file sequencing, and postal delivery [1]. The E-GTSP is defined on a complete graph G = (V, E), where V={v1....vn} is the vertex set and E={(vi,vj) : vi, vj ∈ V, i ≠ j} is the edge set. A non-negative cost cij is associated with each edge (vi, vj) and the vertex set V is partitioned into m mutual exclusive and exhaustive clusters V1....Vm, i.e., V = V1 ∪ V2 ∪ Vm with Vi ∩ Vj = ∅, for all i, j, i ≠ j. The E-GTSP can be stated as the problem of finding a minimum cost cycle that includes exactly one node from each cluster. If the cost matrix C = (cij) is symmetric, i.e., cij = cji for all i, j, i≠j, the problem is called symmetric. Otherwise it is called asymmetric. __________________________________ December 2013. Updated March 19, 2014

1

Figure 1 is an illustration of the problem. The lines depict a feasible cycle, called a g-tour.

Figure 1 Illustration of the E-GTSP for an instance with 6 clusters (n = 23, m =6). It is well known that any E-GTSP instance can be transformed into an asymmetric TSP instance containing the same number of vertices [2, 3, 4]. The transformation can be described as follows, where V’ and c’ denote the vertex set and cost matrix of the transformed instance: a) V’ is equal to V. b) Create an arbitrary directed cycle of the vertices within each cluster and define c’ij = 0, when vi and vj belong to the same cluster and vj succeeds vi in the cycle. c) When vi and vj belong to different clusters, define c’ij = ckj+M, where vk is the vertex that succeeds vi in a cycle, and M is a sufficiently large constant. It suffices that M is larger than the sum of the n largest costs. d) Otherwise, define c’ij = 2M. This transformation works since having entered a cluster at a vertex vi, an optimal TSP tour always visits all other vertices of the cluster before it moves to the next cluster. The optimal TSP tour must have zero cost inside the cluster and must have exactly m inter-cluster edges. Thus, the cost of the g-tour for the E-GTSP is the cost of the TSP tour minus mM. The g-tour can be extracted by picking the first vertex from each cluster in the TSP tour. The transformation allows one to solve E-GTSP instances using an asymmetric TSP solver. However, in the past this approach has had very little application, because the produced TSP instances have an unusual structure, which is hard to handle for many existing TSP solvers. Since a near-optimal TSP solution may correspond to an infeasible E-GTSP solution, heuristic TSP solvers are often considered inappropriate [5, 6]. In this paper, it is shown that this need not be the case if the state-of-the-art heuristic TSP solver LKH is used. LKH [7, 8] is a powerful local search heuristic for the TSP based on the variable depth local search of Lin and Kernighan [9]. Among its characteristics may be mentioned its use of 1-tree approximation for determining a candidate edge set, extension of the basic search step, and effective rules for directing and pruning the search. LKH is available free of charge for scientific and educational purposes from http://www.ruc.dk/~keld/research/LKH. The following section describes how LKH can be used as a black box to solve the E-GTSP. 2

2. Implementing an E-GTSP Solver Based on LKH The input to LKH is given in two files: (1) A problem file in TSPLIB format [10], which contains a specification of the TSP instance to be solved. A problem may be symmetric or asymmetric. In the latter case, the problem is transformed by LKH into a symmetric one with 2n vertices. (2) A parameter file, which contains the name of the problem file, together with some parameter values that control the solution process. Parameters that are not specified in this file are given suitable default values. An E-GTSP solver based on LKH should therefore be able to read an E-GTSP instance, transform it into an asymmetric TSP instance, produce the two input files required by LKH, let LKH solve the TSP instance, and extract the g-tour from the obtained TSP tour. A more precise algorithmic description is given below: 1. 2. 3. 4. 5. 6. 7.

Read the E-GTSP instance. Transform it into an asymmetric TSP instance. Write the TSP instance to a problem file. Write suitable parameter values to a parameter file. Execute LKH given these two files. Extract the g-tour from the TSP solution tour. Perform post-optimization of the g-tour.

Comments: 1. The instance must be given in the GTSPLIB format, an extension of the TSPLIB format, which allows for specification of the clusters. A description of the GTSPLIB format can be found at http://www.cs.rhul.ac.uk/home/zvero/GTSPLIB/. 2. The constant M is chosen as INT_MAX/4, where INT_MAX is the maximal value that can be stored in an int variable. The transformation results in an asymmetric n x n cost matrix. 3. The problem file is in TSPLIB format with EDGE_WEIGHT_TYPE set to EXPLICIT, and EDGE_WEIGHT_FORMAT set to FULL_MATRIX. 4. The transformation induces a fair amount of degeneracy, which makes the default parameter settings of LKH inappropriate. For example, tests have shown that it is necessary to work with candidate edge set that is larger than by default. For more information, see the next section. 5. The E-GTSP solver has been implemented in C to run under Linux. This has made it possible to execute LKH as a child process (using the Standard C Library function popen()).

6. The g-tour is easily found by picking the first vertex from each cluster during a sequential traversal of the TSP tour. The g-tour is checked for feasibility. 3

7. In this step, attempts are made to optimize the g-tour by two means: (1) Using LKH for local optimization as described above but now on an instance with m vertices (the vertices of the g-tour). (2) Performing so-called cluster optimization, a well-known post-optimization heuristic for the E-GTSP [11]. This heuristic attempts to find a gtour that visits the clusters in the same order as the current g-tour, but is cheaper than this. It is implemented as a shortest path algorithm and runs in O(nm2) time. If the smallest cluster has a size of O(1), the algorithm may be implemented to run in O(nm) time. A detailed description of the heuristic and its implementation can be found in [6, 12]. Local optimization and cluster optimization are performed as long as it is possible to improve the current best g-tour.

4

3. Computational Evaluation The program, which is named GLKH, was coded in C and run under Linux on an iMac 3.4 GHz Intel Core i7 with 32 GB RAM. Version 2.0.7 of LKH was used. The program was tested using E-GTSP instances generated from instances in TSPLIB [10] by applying the clustering method of Fischetti, Salazar, and Toth [12]. This method, known as Kcenter clustering, clusters the vertices based on proximity to each other. For a given instance, the number of clusters is fixed to m = ⎡n/5⎤. In addition, the program has been tested on a series of large-scale instances generated from clustered instances taken from the 8th DIMACS Implementation Challenge [13] and from the national instances on the TSP web page of William Cook et al. [14]. The number of clusters in the test instances varies between 4 and 17,180, and the number of vertices varies between 14 and 85,900. For instances with at most 1084 vertices, the following non-default parameter settings for LKH were chosen and written to a parameter file: PROBLEM_FILE = GTSPLIB/

5

OPTIMUM: This parameter may be used to supply a best known solution cost. The algorithm will stop if this value is reached during the search process. PI_FILE: The penalties (π values) generated by the Held-Karp ascent are saved in a file such that subsequent test runs can reuse the values and skip the ascent. POPULATION_SIZE: A genetic algorithm is used, in which 10 runs are performed (RUNS = 10 is default in LKH) with a population size of 5 individuals (TSP tours). That is, when 5 different tours have been obtained, the remaining runs will be given initial tours produced by combining individuals from the population. LKH’s default basic move type, MOVE_TYPE = 5, is used. LKH offers the possibility of using higher-order and/or non-sequential move types in order to improve the solution quality [8]. However, the relatively large size of the candidate set makes the local search too timeconsuming for such move types. Table 1 and 2 show the test results for instances with at most 1084 vertices. This set of benchmark instances is commonly used in the literature. Each test was repeated ten times. The tables follow the format used in [16]. The column headers are as follows: Name: the instance name. The prefix number is the number of clusters of the instance; the suffix number is the number of vertices. Opt.: the best known solution cost. The exact solution cost (optimum) is known for all instances with at most 89 clusters and 443 vertices. Value: the average cost value returned in the ten tests. Error (%): the error, in percent, of the average cost above the best known solution cost. Opt. (%): the number of tests, in per cent, in which the best known solution cost was reached. Time (s): the average CPU time, in seconds, used for one test. As can be seen in Table 1, the small benchmark instances are quickly solved to optimality. Table 2 shows that all large benchmark instances are solved to optimality too. In comparison with the results obtained for the same instances by the state-of-the-art solver GK [16, p. 58], the optimality percentage for GLKH is higher (98% versus 81%). This higher success rate is obtained at the expense of worse running times (a factor of about 40 for the largest instances). However, the running times for GLKH are satisfactory and reasonable for practical purposes. Considering that GLKH uses LKH as a black box, without any modifications, its performance is surprisingly impressive.

6

Name 3burma14 4br17 (asym.) 4gr17 5gr21 5gr24 5ulysses22 6bayg29 6bays29 6fri26 7ftv33 (asym.) 8ftv35 (asym.) 8ftv38 (asym.) 9dantzig42 10att48 10gr48 10hk48 11berlin52 11eil51 12brazil58 14st70 16eil76 16pr76 20gr96 20rat99 20kroA100 20kroB100 20kroC100 20kroD100 20kroE100 20rd100 21eil101 21lin105 22pr107 24gr120 25pr124 26bier127 26ch130 28gr137 28pr136 29pr144 30ch150 30kroA150 30kroB150 31pr152 32u159 35si175 36brg180 39rat195 Average

Opt. 1805 31 1389 4539 334 5307 707 822 481 476 525 511 417 5394 1834 6386 4040 174 15332 316 209 64925 29440 497 9711 10328 9554 9450 9523 3650 249 8213 27898 2769 36605 72418 2828 36417 42570 45886 2750 11018 12196 51576 22664 5564 4420 854

Value 1805.0 31.0 1389.0 4539.0 334.0 5307.0 707.0 822.0 481.0 476.0 525.0 511.0 417.0 5394.0 1834.0 6386.0 4040.0 174.0 15332.0 316.0 209.0 64925.0 29440.0 497.0 9711.0 10328.0 9554.0 9450.0 9523.0 3650.0 249.0 8213.0 27898.0 2769.0 36605.0 72418.0 2828.0 36417.0 42570.0 45886.0 2750.0 11018.0 12196.0 51576.0 22664.0 5564.0 4420.0 854.0

Error (%) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Opt. (%) 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

Time (s) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.5 0.1 0.1 0.2 0.1 0.8 0.2 0.3

Table 1 Results for small benchmark instances.

7

Name 40d198 40kroa200 40krob200 41gr202 45ts225 45tsp225 46pr226 46gr229 53gil262 53pr264 56a280 60pr299 64lin318 65rbg323 (asym.) 72rbg358 (asym.) 80rd400 81rbg403 (asym.) 84fl417 87gr431 88pr439 89pcb442 89rbg443 (asym.) 99d493 107ali535 107att532 107si535 113pa561 115u574 115rat575 131p654 132d657 134gr666 145u724 157rat783 200dsj1000 201pr1002 207si1032 212u1060 217vm1084 Average

Opt. Value 10557 10557.0 13406 13406.0 13111 13111.0 23301 23301.0 68340 68340.0 1612 1612.0 64007 64007.0 71972 71972.0 1013 1013.0 29549 29549.0 1079 1079.0 22615 22615.0 20765 20765.0 471 471.0 693 693.0 6361 6361.0 1170 1170.0 9651 9651.0 101946 101946.0 60099 60099.0 21657 21657.0 632 632.0 20023 20023.4 128639 128639.0 13464 13464.0 13502 13502.0 1038 1038.0 16689 16689.0 2388 2388.0 27428 27428.0 22498 22498.0 163028 163028.0 17272 17272.0 3262 3262.9 9187884 9187884.0 114311 114311.0 22306 22306.0 106007 106029.5 130704 130704.0

Error (%) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.02 0.00 0.00

Opt. (%) 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 70 100 100 100 50 100 98

Table 2 Results for large benchmark instances.

8

Time (s) 1.9 0.4 0.6 0.4 1.9 2.3 0.1 0.3 1.4 0.6 0.9 1.5 1.7 0.3 0.8 8.1 3.9 2.0 7.6 2.6 8.1 25.6 170.5 18.4 12.0 34.5 7.7 26.5 45.5 14.0 490.8 162.3 145.4 764.4 794.4 164.8 1202.5 2054.9 209.0

To provide some very-large-scale instances for research use, GTSPLIB has been extended with 44 instances ranging in size from 1000 to 85,900 vertices (see Table 3). The instances are generated from TSPLIB instances with the following exceptions: •

The instances 200E1k.0, 633E3k.0, 2000E10k.0, 6325E31k.0, 200C1k.0, 633C3k, and 6325C31k.0 are generated from instances used in the 8th DIMACS Implementation Challenge [13]. The E-instances consist of 1000, 3162, 10000, and 31623 uniformly distributed points in a square. The C-instances consist of 1000, 3162, 10000, and 31623 clustered points. For a given size n of a C-instance, its points are clustered around ⎣n/10⎦ randomly chosen centers in a square.

•

The instances 4996sw24978 and 14202ch71009 are generated from the National TSP benchmark library [14]. They consist, respectively, of 24978 locations in Sweden and 71009 locations in China.

All instances mentioned above were generated using Fischetti et al.’s clustering algorithm. The following 4 instances in which clusters correspond to natural clusters have been added: 49usa1097, 10C1k.0, 31C3k.0, 100C10k.0, and 316C31k.0. The instance 49usa1097 consists of 1097 cities in the adjoining 48 U.S. states, plus the District of Columbia. Figure 2 shows the current best g-tour for this instance. Figure 3 and 4 show the current best g-tour for 10C1k.0 and 200C1k.0, respectively.

Figure 2 Current best g-tour for 49usa1097 (length: 10,465,466 meters ≈ 6,503 miles).

9

Figure 3 Current best g-tour for 10C1k.0 (10 natural clusters).

Figure 4 Current best g-tour for 200C1k.0 (200 K-center clusters).

10

The column Best of Table 3 shows the current best solution costs found by GLKH. These costs were found using several runs of GLKH where in each run the current best g-tour was used as input tour to GLKH and using the following non-default parameter settings: PROBLEM_FILE = GTSPLIB/