Final “Project”
Grid Computing, Spring 2007
This “project”provides experience in distributed systems using Condor.
- Implement the 128-bit code cracker on a single machine in the lab, and time how long it takes to complete this run.
- Submit this same job to Condor, but only request one machine. Time this run. How much difference is there between these runs? What do you attribute this difference to (assuming there is one)?
- Rewrite your program to function in a master-server paradigm, and submit this program to the Condor pool in the lab, using all four worker machines (and all 8 processors). Time this run. How well did the program scale (that is, what percentage speedup did you obtain)? Was this more or less than you expected? (For more information on this see below.)
- If the runtimes are quite excessive, let me know.
- We will have added more nodes to the pool in the next couple of days. When more nodes are added, please first request ¾ of the available nodes and time the run. Then request all available nodes and time the run. How well does the code scale?
Project Details
These will be individual rather than group projects, although you are welcome to talk among yourselves and help one another. However, each individual is responsible for his own project submission. This project is experimental and analytical in nature. Thus a final project report is due that addresses each of the issues discussed in items 1 – 5. This report should contain your experimental results and analysis, as well as your application code.The report should be worthy of graduate-level work. You may either write the master-worker node yourself (from scratch), or use the MW middleware. This “project” is due Friday, May 4th, and, unless otherwise notified,the final exam will consist of making a presentation of your work. Even though the report is due on Friday, you may take up until the time of the final exam to complete the work without penalty.
Homework
Develop a program that executes on Condor and uses all eight CPUs.
Master-Worker: There are three ways to implement the master-worker paradigm. First, you can use sockets to communicate between the master and the worker. Second, you can use MW. Third, your master node can be outside of the Condor pool (but executing on the submit node), and control the execution of the application by using information contained in the workers output files. For example, once one worker has found a factor, all workers should be terminated and provided with a different number.