Mandeep Singh Sawhney

PROJECT 2

Submitted

Mandeep Singh Sawhney

Project Description:

Problem
In 3D, N particles are arranged, initially, in a 101010 cube, i.e., any triplet of random numbers in [0, 10] can be the coordinates of a particle. These particles interact under the so-called Lennard-Jones potential

where, Vij is the pair-wise potential between particles i and j with distance rij. Write a parallel program to minimize the total (Lennard-Jones potential) energy of the particle system by adjusting particle positions. Please do the following three parts:

(1)Compute the minimal energy for N=16 particles by any method of your choice by 1 node. You will invest the best but practical efforts for this part of calculation (e.g., run your code for a couple of hours on one node.)

(2)Now, use P=2, 4, 8 nodes to minimize the system energy, individually. You terminate the minimizations when you reach at a minimal energy similar to the minimal energy obtained by P=1 (as in (1) above). Results within 5 percent of relative errors are considered the similar.

(3)Plot the speedup curve.

Algorithm Description and Analysis:

The algorithm is implemented using simulated annealing approach. The algorithm for which is as follows:

TEMPERATURE = INIT_TEMP;

X = INIT_POSITION

OldCost = COST(X)

While (TEMPERATURE > 0.1)

{

While ( INNER_LOOP_CRITERION = TRUE)

{

New_position = GENERATE(X);

NewCost = COST(New_position);

DeltaCost = NewCost – OldCost ;

If (DeltaCost < 0)

{

X = New_position ;

OldCost = NewCost;

}

else
{

r = RANDOM(0,1);

y = exp( -DeltaCost / TEMPERATURE);

if(r < y)

{

X = New_position ;

OldCost = NewCost;

}

else

REJECT(New_position);

}

TEMPERATURE = ALPHA * TEMPERATURE;// ( 0 < ALPHA < 1 )

}

The computation of the cost function is done as follows:

/* The forces between the n = N/p local particles on each processor are computed. This computation can proceed fully in parallel.

The local particles are copied to the Rotating particles on each processor,

for (i = 0; i < particlesPerProcess; i++)

for (k = 0; k < 3; k++)

Rotating[i][k] = r[i][k];

The following steps are repeated p - 1 times:

The rotating particles are shifted to the right neighbor processor in the ring Topology. The movement of the p sets of traveling particles can proceed in parallel.

for (process = 0; process < numberOfProcesses - 1; process++)

{

// Copy traveling particles to send buffer

for (i = 0; i < particlesPerProcess; i++)

for (k = 0; k < 3; k++)

sendBuffer[i][k] = Rotating [i][k];

// interleave send/receive to prevent deadlock

if (my_rank % 2 == 0)

{

// Even processes send/receive

// Send traveling particles to next process

MPI_Send(sendBuffer[0], 3 * particlesPerProcess, MPI_DOUBLE,

nextProcess, 0, MPI_COMM_WORLD);

// Receive traveling particles from previous process

MPI_Recv(Rotating [0], 3 * particlesPerProcess, MPI_DOUBLE,

previousProcess, 0, MPI_COMM_WORLD, &status);

}

else

{

// Odd processes receive/send

MPI_Recv(Rotating [0], 3 * particlesPerProcess, MPI_DOUBLE,

previousProcess, 0, MPI_COMM_WORLD, &status);

MPI_Send(sendBuffer[0], 3 * particlesPerProcess, MPI_DOUBLE,

nextProcess, 0, MPI_COMM_WORLD);

}

The forces between the n local particles and the n traveling particles on each processor is computed. This computation can proceed fully in parallel.

- The GENERATE(X) function :- This function changes the positions of the particles by moving them and assigning new co-ordinates randomly.

- The Simulated Annealing algorithm parameters :-

Initial temperature = 10000

Freezing temperature = 0.1

Cooling Schedule parameter “alpha” = 0.8

Inner loop count = 100

PROGRAM CODE :-

#include<stdio.h>

#include<math.h>

#include<stdlib.h>

#include<mpi.h>

double compute_potentials();// To compute the potential on each processor

#define Num_particles 16// Specify number of particles over here.

#define alpha 0.8// Specify value of cooling schedule parameter alpha.

// Global Declarations

int my_rank,p;

int part_per_proc, proc, min,max, Rotate, nextprocess, previousprocess, target_processor, target_particle;

int z,i,j,k,x,change,move_partnum, INNER_LOOP,offset, Recv_processor,change_particle;

double potential, starttime, communicationtime, temp,r;

double Rotating[Num_particles][3],sendbuffer[Num_particles][3];

double particle[Num_particles][3],temp_particle[Num_particles][3], newcod_particle[Num_particles][3], temp_potential[Num_particles];

double Temperature,oldcost,newcost,delta_cost,m,n,y,lowestcost=0.0;

double starttime=0.0,endtime=0.0,totaltime=0.0,timeelapsed=0.0,time_taken=0.0;

MPI_Status status;

main(int argc, char *argv[])

{

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Comm_size(MPI_COMM_WORLD, &p);

Temperature = 10000;// Initial temperature value for SA.

printf("\n I am %d", my_rank);

part_per_proc = Num_particles/p;// Particles per processor.

printf("\n PArticles per Process = %d", part_per_proc);

if(my_rank==0)

{

nextprocess = my_rank+1;

previousprocess = p-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

else

{

if(my_rank==(p-1))

{

nextprocess = 0;

previousprocess = my_rank-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

else

{

nextprocess = my_rank+1;

previousprocess = my_rank-1;

printf("\n Nextprocess = %d and Previous Process =%d", nextprocess, previousprocess);

}

if(my_rank==0)

{

offset = 0;

//CALCULATING THE INITIAL POSITIONS OF PARTICLES

potential = 0.0;

//srand();

for(i=0;i<Num_particles;i++)

{

for(x=0;x<3;x++)

particle[i][x]=rand()%11;

}

// Processor 0 sending the co-ordinates of n = N/p particles to every other processor

for(proc=1;proc<p;proc++)

{

offset = offset+part_per_proc;

printf("\nSending %d particles to processor %d offset=%d ",part_per_proc,proc,offset);

MPI_Send(&offset, 1, MPI_INT, proc, 1, MPI_COMM_WORLD);

MPI_Send(&particle[offset][0], 3*part_per_proc, MPI_DOUBLE, proc, 1, MPI_COMM_WORLD);

}

for(i=0;i<part_per_proc;i++)

{

//printf("\n******************");

for(k=0;k<3;k++)

{

temp_particle[i][k] = particle[i][k];

//printf(" %f\t",temp_particle[i][k]);

}

else

{

MPI_Recv(&offset, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);

MPI_Recv(&particle, 3*part_per_proc, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, &status);

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

temp_particle[i][k] = particle[i][k];

}

//COMPUTE THE POTENTIAL FOR ALL THE PARTICLES

oldcost = compute_potentials();

// START THE TIMER

starttime = MPI_Wtime(); //starting timer

// OUTER LOOP FOR SA

while(Temperature>0.1)// FREEZING TEMPERATURE = 0.1

{

INNER_LOOP = 100;// No. of interations for INNER_LOOP = 100

while(INNER_LOOP!=0)

{

for(z=0;z<Num_particles;z++)

{

Recv_processor = z/part_per_proc;

change_particle = z % part_per_proc;

if(my_rank==0)

{

srand(rand());

for(k=0;k<3;k++)

{

temp_particle[z][k] = rand() % 11;// Move one particle at a time randomly

// printf(" %f\t",temp_particle[z][k]);

}

if(p>1 & Recv_processor!=0)

MPI_Send(&temp_particle[z][0],3, MPI_DOUBLE, Recv_processor, 1, MPI_COMM_WORLD);

}

if(p>1 & Recv_processor!=0 & my_rank==Recv_processor)

{

MPI_Recv(&temp_particle[change_particle][0],3, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, &status);

}

// COMPUTE THE POTENTIAL AS THE NEW COST FUNCTION.

temp_potential[z] = compute_potentials();

if(my_rank==0)

{

for(k=0;k<3;k++)

{

newcod_particle[z][k] = temp_particle[z][k];

temp_particle[z][k] = particle[z][k];

}

if(p>1 & Recv_processor!=0 & my_rank==Recv_processor)

{

for(k=0;k<3;k++)

{

newcod_particle[change_particle][k] = temp_particle[change_particle][k];

temp_particle[change_particle][k] = particle[change_particle][k];

}

// COMPARISON OF THE COST FUNCTION AND ACCEPTANCE OF THE NEW STATE

if(my_rank==0)

{

change = 0;

move_partnum = 0;

newcost = temp_potential[0];

for(i=0;i<Num_particles;i++)

{

if(temp_potential[i] < newcost)

{

newcost = temp_potential[i];

move_partnum = i;

}

//CALCULATE THE DIFFERENCE BETWEEN THE COST FUNCTIONS

delta_cost = newcost - oldcost;

if(delta_cost < 0)

{

oldcost = newcost;

change = 1;

}

else

{

m = (float)(rand())/RAND_MAX;

// printf("\nvalue of m is %f",m);

n = delta_cost/Temperature;

y = 1/exp(n);// ACCEPTANCE PROBABILITY FOR THE UPCLIMB MOVES.

if(m<y)// = EXP( - DELATCOST / TEMPERATUER)

{

oldcost = newcost;// MAKE THE MOEV PERMANENT

change = 1;

}

else change = 0;

}

if(oldcost < lowestcost)

if(change == 1)

{

for(k=0;k<3;k++)

particle[move_partnum][k] = newcod_particle[move_partnum][k];

//printf("\n The move_partnum = %d",move_partnum);

}

target_processor = move_partnum/part_per_proc;

target_particle = move_partnum % part_per_proc;

MPI_Bcast(&target_processor,1, MPI_INT,0, MPI_COMM_WORLD);

MPI_Bcast(&target_particle,1, MPI_INT,0, MPI_COMM_WORLD);

}

if(my_rank!=0)

{

MPI_Bcast(&target_processor,1, MPI_INT,0, MPI_COMM_WORLD);

MPI_Bcast(&target_particle,1, MPI_INT,0, MPI_COMM_WORLD);

}

if(my_rank==target_processor)

{

for(k=0;k<3;k++)

particle[target_particle][k] = newcod_particle[target_particle][k];

}

INNER_LOOP--;

}// END of INNER_LOOP

Temperature = alpha*Temperature;// COOLING SCHEDULE

}// END OF OUTER LOOP

printf("\n Lowest Cost = %f ", lowestcost);

endtime = MPI_Wtime();//stopping the time

time_taken=endtime-starttime;//computing time taken

// TIMING CALCULATIONS

for(j=0;j<p;j++)

{

if(my_rank==j)

MPI_Send(&time_taken,1,MPI_DOUBLE,0,j*3,MPI_COMM_WORLD);

}

if(my_rank==0)

{

totaltime=time_taken;

for(j=1;j<p;j++)

{

MPI_Recv(&timeelapsed,1,MPI_DOUBLE,j,j*3,MPI_COMM_WORLD,&status);

totaltime+=timeelapsed;

}

printf("\n\n Time taken for execution of the program is %f seconds\n",totaltime/p);

}

MPI_Finalize();

}//End of Main

/*************************************************************************************************************/

// COMPUTING THE POTENTIALS ON EACH PROCESSOR PARALLELY

double compute_potentials()

{

potential = 0.0;

// CALCULATING THE LOCAL POTENTIALS

for(i=0;i<part_per_proc;i++)

{

for(j=0;j<part_per_proc;j++)

{

r = pow(temp_particle[i][0] - temp_particle[j][0],2)+ pow(temp_particle[i][1] - temp_particle[j][1],2) + pow(temp_particle[i][2] - temp_particle[j][2],2);

if(r!=0)

potential+= (1.0/pow(r,6)) - (2.0/pow(r,3));

}

if(p>1)

{

//COPY THE POSITIONS OF THE LOCAL PARTICLES TO ROTATING PARTICLES

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

Rotating[i][k] = temp_particle[i][k];

//The following steps are repeated p - 1 times:

for(Rotate=0;Rotate<p-1;Rotate++)

{

// Copy traveling particles to send buffer

for(i=0;i<part_per_proc;i++)

for(k=0;k<3;k++)

sendbuffer[i][k] = Rotating[i][k];

// Interleave send/receive to prevent deadlock

if(my_rank%2 == 0)

{

// Even processes send/receive

// Send traveling particles to next process

MPI_Send(&sendbuffer[0], 3*part_per_proc, MPI_DOUBLE, nextprocess,0, MPI_COMM_WORLD);

MPI_Recv(&Rotating[0], 3*part_per_proc, MPI_DOUBLE, previousprocess, 0, MPI_COMM_WORLD, &status);

}

else

{

MPI_Recv(&Rotating[0], 3*part_per_proc, MPI_DOUBLE, previousprocess, 0, MPI_COMM_WORLD, &status);

MPI_Send(&sendbuffer[0], 3*part_per_proc, MPI_DOUBLE, nextprocess, 0, MPI_COMM_WORLD);

}

//CALCULATING THE CONTRIBUTIONS TO LOCAL FORCES FROM ROTATING PARTICLES

for(i=0;i<part_per_proc;i++)

{

for(j=0;j<part_per_proc;j++)

{

r = pow(temp_particle[i][0] - Rotating[j][0],2)+ pow(temp_particle[i][1] - Rotating[j][1],2) + pow(temp_particle[i][2] - Rotating[j][2],2);

if(r!=0)

potential+= (1.0/pow(r,6)) - (2.0/pow(r,3));

}

// SENDING THE POTENTIAL VALUES FOR EACH PROCESSOR TO NODE 0

if(my_rank!=0)

MPI_Send(&potential, 1, MPI_DOUBLE, 0, my_rank*10, MPI_COMM_WORLD);

if(my_rank==0)

{

for(j=1;j<p;j++)

{

MPI_Recv(&temp, 1, MPI_DOUBLE, j, j*10, MPI_COMM_WORLD, &status);

potential+=temp;

}

return potential;

}

/*************************************************************************************************************/

RESULTS AND ANALYSIS:

Timings Table:

Particle Size / Time required by processors in Seconds
P = 1 / P = 2 / P =4 / P =8
N = 16 / 39.169 / 27.598 / 70.751 / 348.064
N = 64 / 2320.18 / 971.05 / 775.24 / 672.29

Speedup curve table:

P =1 / P =2 / P =4 / P =8
N = 16 / 1 / 1.419 / 0.554 / 0.1125
N = 64 / 1 / 2.389 / 2.993 / 3.451

Speedup curve for N = 16 particles

Speedup curve for N = 64 particles

Analysis of results:

The speedup curves for 16 and 64 particles are plotted for different number of processors (1, 2, 4 & 8) as shown above;

- For 16 particles the speedup is obtained for 2 processors whereas after that the speed up goes on deteriorating for more number of processors. This can be attributed to the fact that for more than 2 processors the message passing time becomes dominant over the computational time and hence the speedup deteriorates.

- For 64 particles significant speedup is obtained. This is due to the fact that for more number of particles the computational aspect becomes dominant which is handled better by more number of processors and hence significant speedup can be obtained.

Thus the main overhead incurred by the parallel algorithm relative to the serial is the communication of O(N) data values (the coordinates of the traveling particles) p-1 times over the interconnection network. Thus the wall time of a parallel run is expected to behave like;

[(N*N)/p]*Tcomp + [(p-1)*(N/p)]*Tcomm

where Tcomp is the time required to process one pair of particles, and Tcomm is the time

required to move a rotating particle from one process to another.

Usually, Tcomm>Tcomp, so for small N a serial program will run much faster than a parallel program.

However, for a given number p of processes and large enough N, the computational time will dominate and the parallel program will run almost p times faster than the serial program.

======