Two important factors in parallel design
1) Determine if the parallelsection worths it, e.g. will it give speed up proportionally to when num processor increases? The must thing to avoid is it will slower than sequential. I used Amdahl's Law (standard law to analysis parallel algorithm) to determine that this worths doing. If necessary, I can provide details on this law.
2) Applying new parallelism part means break the sequential code, redesign it. Therefore it's necessary that the parallel algorithm must give results (quality sols) equivalent to sequential, in both theoritical and practical
–
Summary: All my parallelism part (in this paper) will concentrate on the Ant loop for several reasons: the for loop algorithm is the most effective (for speeding up) and also all general ant systems have this part.
–Rizzo's cycle loop
for each cycle
for each ant
ant(i).chooseNextMove(){
use heuristic calculation to decide next move
for each adj edges to ant(i).currentVertex
evaporate a small amount of pheremone
end
}
end
for each ant
ant(i).moves(){
deposite a small amount of pheromone to the edge from current to new vertex
set ant(i).currentVertex = newVertex;
}
end
//local optimization
end cycle loop
–Analysis:
ant(i).moves() is NOT expensive, it takes O(n) where n is the number of ants.
ant(i).chooseNextMove() is an VERY expensive operation. The expensive heuristic score function calculates the number of ants, pheromone on each adj vertices, edges... The pheromone evaporation loop is also very expensive since each vertex can connect to many other vertices. In short it's around O(v^2) where v is the number of vertices , n is the number of ants (which = v * a multipler)
–Also using gnu profiler verifies that indeed this chooseNextMove() function takes the most calculation time of the whole program.
My parallel part
Rizzo's code is totally depending on each other, i.e ant(i+1) decision based on ant (i). So I redesigned so ant works can be done in parallel by having all ants make their decisiosn not based on each other but based on the current situation, (imagine they all make their decision at the same time, based on the current pheromone etc info).
Will it work and give answers equivalent to Rizzo's ? Yes, My reason is that it's just another method to move the ants, Rizzo's method is to have each ant's decision based on the previous ant. Mine is to have all ant's decision not based on each other.
Is either one of these 2 methods ideal or better than the other? Not likely since in nature ants neither move sequentially completely nor parallel completely. (Note in my code there are some probabilities that ants don't move or move randomly, same as Rizzo's).
Experiments show that this part have no effect on the quality solution.
–This part was done before you went to Gecco, it gives speed up around 1.4 on 2 processors , etc.
for each cycle
for each ant
ant(i).chooseNextMove(){
use heuristic calculation to decide next move
}
end
for each ant
for each adj edges to ant(i).currentVertex
evaporate a small amount of pheremone
end
ant(i).moves(){
deposite a small amount of pheromone to the edge from current to new vertex
set ant(i).currentVertex = new vertex; #determined in the chooseNextMove()
}
end
//local optimization
end cycle loop
The main changes is that the ant choose its nextMove (which contains only the heuristic funtion) independently. The heuristic calculation will based on the pheremone, population etc at the current cycle (time), not based on previous ants.
Using this method I can now parallel the cyan region. i.e if the number of ants is n and p processors, I will fork p threads and each threads handle n/p ants. Recall that the heuristic calculation is an expensive calculation thus this will give a noticeable speed up (1.4) !
While you were at Gecco I was able to get an account on a 16 proc's processor and when run the program on it, the speed up max out around 2 (i.e exec half of sequential time with 10 processors or so). Usually if this is the best can do, then it doesn't seem worth doing it. So I continue breaking the sequential code to parallel.
–This part was done while you were at Gecco, it gives a speed up around 9-10 on large graph (Man a45, phat1500, hamming 10-2 , and other medium sized graphs can achieve speed up around 7)
for each cycle
for each ant
ant(i).chooseNextMove(){
use heuristic calculation to decide next move
}
end
for each ant
for each adj edges to ant(i).currentVertex
evaporate a small amount of pheremone
end
ant(i).moves(){
deposite a small amount of pheromone to the edge from current to new vertex
set ant(i).currentVertex = new vertex; #determined in the chooseNextMove()
}
end
//local optimization
end cycle loop
The above is the current the parallel part done before and it will not be altered. It's easy to see the orange region is expensive especially when the graph has large #of vertices and they are not sparse (i.e each vertex connects to many others).
However this part cannot be done parallely since 2 ants can update (evaporate) pheromone the same edge(s) simultaneous if these 2 ant are on the same vertex. Note I also have implemented only locking the edges that are being read or write. But this shared lock-critical-region way slows down the process (easy to see why since it is used so frequently).
Further investigation into the code shows Rizzo’s evaporation() section is as follow
for each adj edges to ant(i).currentVertex
if (currentTime > adj_edge(j).lastUpdate){
evaporate a small amount of pheromone on adj_edge(j)
adj_edge(j).lastUpdate=currentTime;
}
end
The above code simply means *each* edge adj to each ant’s current vertex should only update once each cycle. By this information, I redesign the algorithm so that it allows parallelism.
bool edgeToUpdate[numEdges]; //global info
for each cycle
//init everything edgeToUpdate array to false
//ant decision using heuristic, unchanged, exactly same as above
//PARALLEL part, each proc handles num_ants/p ants.
for each ant
for each adj edges to ant(i).currentVertex
if (currentTime > adj_edge(j).lastUpdate){
edgeToUpdate[adj_edge(j)] = true;
}
end
end
//this loop goes sequentially
for each edge
if (edge(j)==true) {
evaporatePhermone(edge(j));
edge(j).lastUpdate=currentTime;
}
end
for each ant
ant(i).moves(){
deposite a small amount of pheromone to the edge from current to new vertex
set ant(i).currentVertex = new vertex; #determined in the chooseNextMove()
}
end
end cycle loop
It’s easy to see code is equivalent to Rizzo’s sequential code. As a matter of fact, it requiresan extra (for each edge) loop. What is the whole point of having the array edgeToUpdate and the extra for each edge loop ? It’s for parallelism !
Imagine 2 ants (each on a processor) attempt to evaporate the same edge simultaneously, since the edge pheromone info is a shared info, they can (and will) have problem with data synchronization.
Now the new code will have the edgeToUpdate as a shared part, the ants can (and will) write info to it simultaneously, for example if 2 ants (each on a proc) wants to update edge(j), then both will write “true” to edge(j) . This is a dirty write (i.e blinding write to a shared info variable simultaneously). But it is fine in this case since each adj edges to each ant’s current location should be update only exactly once, so by writing multiple “true” to edge(j) does not mean anything.
Finally the sequential for all edge loop executes and updates all the edges that are set to be updated. Note this part can be done parallelism too however not worth doing it since it’s only a small for all edge loop.
This speed up helps (a lot) since the sequential (Rizzo) code costs O(num_ants*num_vertices) = O(v^2) , assuming the graph is not parse, most vertices connect to another and num_ants = num_vertices * a multipler (>=1). By applying parallism, we reduce this complexity to O(v^2 / p) (still O(V) technically) where p is the number of processors. And of course, this time reduction is multipled by the number of stages and cycles since this code happens within each code cycle.
---- More info if you’re interested
The moveAnts() are not updated for 2 reasons: 1) it’s not worth it, it’s just a “for all ants, addPheromone(edge(currentVertex,newVertex , ants(i).currentVertex=newVertex )” loop, there’s no expensive operations inside 2)synchronize problem , unlike the evaporate pheromone which reduces pheromone on edge only once each time, the addPheromone in moveAnt will add multiple pheromone on an edge multiple times, based on how many ants going by that edge. Again effort has been made to use critical region lock to block the pheromone region that’s been read, but it shows that it has no benefit and the run time is even worse.