10

Beowulf Clusters

By Matthew Doney

Computer Science

University of Wisconsin – Platteville

Abstract

Technological advances over the years have drastically changed the computer industry. High-performance computing (HPC) was once restricted to large institutions (universities, governments, large companies) that could afford expensive supercomputers. With these technological advancements, a new style of supercomputers started to take shape – cluster computers. With the advent of cluster computing, more specifically Beowulf clusters, more users are now able to afford higher performance computing than has previously been available. By using COTS (Commercial Off The Shelf) components, such as desktop PC’s and network switches, a Beowulf Cluster design with the same performance can be constructed for a fraction of the price of a traditional supercomputer.

Introduction

Since the early 1990’s, one of the most important developments in high performance computing is the cluster computer. A cluster computer is formed by connecting several computers together in order to accomplish a desired parallel computing task. While there are several methods of connecting computers together to construct a cluster computer, this paper will concentrate on only one method, the Beowulf cluster. A Beowulf cluster is a method of constructing a cluster computer that uses COTS components and commercially available software to build a high performance machine. In order to better understand what makes a Beowulf cluster the important development it is, it is important to understand the technological and software developments that have led to the advent of cluster computing.

History of Cluster Computing

Since the introduction of the computer, there has always been the desire to be able to accomplish more in a shorter amount of time. This led to the creation of supercomputers in the 1960’s, the first of which is said to be the CDC 6600 created by Seymour Cray [1]. Cray is considered by many to be the father of supercomputing. He would eventually leave CDC in the early 1970’s to form Cray Research Inc, a name that has become synonymous with supercomputers over the years.

Many of these early supercomputers relied on vector processors to increase performance. Vector processors implement an instruction set that contain operations that perform the same instruction on an array of data, today known as SIMD (Single Instruction Multiple Data). These supercomputers usually had very few of these processors, typically only one. One of the most successful of these early vector supercomputers was the Cray-1.

Figure 1: Cray-1 Supercomputer [2]

This trend of supercomputers using vector processors continued until the 1990’s when the commercial PC market developed. This brought about the availability of commodity hardware that could be easily acquired, and at a much higher performance/cost ratio. This led to a greatly increased interest in developing computer clusters as a means of constructing the next generation of supercomputers to further push the high performance computing envelope.

Early Cluster Computing Development

In order to better understand Beowulf clusters, it is necessary to discuss the history of computer clusters. One of the first clusters developed was SAGE by IBM in the 1950’s based on MIT’s Whirlwind architecture. It was built under an Air Force contract for NORAD to perform early warning detection for the North American airspace. [4]

The 1970’s brought about many technological advances that would lead to cluster computing. The first was VLSI (Very Large Scale Integration) which led to the first integrated microprocessors, eventually leading to commercially available PC’s. Another was the development of Ethernet, the first widely available local area network connection technology, enabling high speed data transfer between local computers. A third crucial development from this decade was the creation of the UNIX operating system at AT&T’s Bell Labs. The combination of these developments would set the stage for increased interest in cluster computing. [4]

The 1980’s brought further development in the area cluster computing. With the technological advancements of the 1970’s came several experimental computer clusters. One example of this was by the NSA, which connected 160 Apollo workstations to complete various computational tasks. Another advancement in the hardware side came from DEC with the introduction of the VAXclustering which initially consisted of interconnected VAX 11/750 computers, and became the first commercially available cluster computing product. In addition to the hardware achievements that were made in the 1980’s, several advancements were also made in the software front. One of the most notable software developments was the Condor software package for job scheduling on computer clusters [5]. Another software development from this era was the creation of the PVM (Parallel Virtual Machine) library of linkable functions. The PVM library of functions allowed processes running on separate computers to exchange data between them, in effect creating a single distributed parallel processor. [4]

Cluster Computing in the 1990’s

The 1990’s led to a major increase in the number of projects experimenting with computing clusters. In 1993, UC Berkeley started the NOW (network of workstations) project to research the development of computing clusters, leading to the first cluster on the TOP500 list of fastest supercomputers. Another project located at NASA’s Lewis Research Center used a cluster of IBM workstations to simulate the behavior of jet aircraft engines [4].

In 1994, the Beowulf project started at NASA’s Goddard Space Flight center by Thomas Sterling and Donald Becker, with the goal of delivering a complete working solution, and not just a research test-bed as many projects were up to that time. The system was developed using the Linux operating system and the PVM library for message passing [4]. The system used a cluster of 16 PC’s with the following specifications [4][6]:

-  100 MHz 80486 class processors

-  16 MByte DRAM

-  500 MByte hard drive

-  2 – 10 Mbps Ethernet controllers

An important note about this system was that it was the first to use no custom components, everything was fully COTS hardware.

The year 1994 also saw the introduction of the MPI (Message Passing Interface) standard. The MPI standard was developed by the MPI Forum, consisting of representatives from industry, government labs, and universities, with the goal of creating a powerful and flexible specification to handle the complex nature of message passing between processes. Over time it became a widely accepted standard and is now used by many of the TOP500 supercomputers [3][4].

By the end of the 1990’s, several significant milestones in cluster computing had been achieved. In 1996, a Beowulf systems constructed by Los Alamos National Lab, California Institute of Technology, and NASA’s Jet Propulsion Lab were all able to obtain 1 GFlops (Giga Floating Point Operations per Second) of performance for under $50,000 [4]. In 1997, several Beowulf systems were implemented with performances of 10 Gflops or greater, and by the end of the decade 28 clusters were on the TOP500 list. This trend of increasing popularity of cluster computing has continued, and as of June 2013, 417 out of the top 500 supercomputers are now based on cluster configurations [7]. Below, Fig. 2 show the increase in the popularity of cluster supercomputers over the years as tabulated from the TOP500 supercomputers list.

Figure 2: Graph showing percentage of cluster based computers (light blue) on TOP500 list.

The Beowulf Cluster

As described above a Beowulf cluster is a form of cluster computing that uses COTS (commercially available components) to construct a high-performance computer. Since COTS components are being used, the end user has the ability to configure the system to their liking, making it suited for the task to be accomplished. While Beowulf clusters are gaining popularity due to their high performance/cost, there are also several other types of computer clusters available. In order to understand what makes a cluster a Beowulf cluster, it is important to understand these other types of clusters.

Distributed Computing

One type of computer cluster is called distributed computing. This form of cluster uses computers that are widely separated and connects them over the internet. Over this internet connection, the computers are typically linked to a central host that performs the scheduling of tasks for the cluster. Since the distances between the computers are great, the communication rate between them is severely limited. Since this is the case, this type of computing cluster is typically used by volunteers that install a program on their machine that uses spare CPU cycles to perform computational tasks for a host program such as SETI@home or GIMPS. [4]

Workstation Clusters

In contrast to distributed computing, the individual machines of a workstation cluster are typically located in the same building, all connected to a LAN (local area network). Since the computers are located closer geographically, it allows the use of faster communication technology, and the user to have greater control over the cluster. It also share the similarity with distributed computing that the end user does not have to purchase and construct a new cluster computer, but can use available computing resources to perform the desired calculations. Both of these cluster types however suffer from the same downfall in that due to the limited communication rates, in order to be effective the problem being solved must be highly parallelized with little communication between the individual nodes required. [4]

Beowulf Clusters

Beowulf clusters solve most of the downfalls of its more distributed cousins by centrally locating all of the computer in the same location, typically the same room. By arranging the computers in this fashion, the end user gains numerous advantages over other forms of cluster computing and traditional supercomputers.

Advantages

Beowulf clusters offer several advantages over supercomputers of the past that have made them a highly desirable avenue for many users. The main advantage of a Beowulf cluster can be seen in its definition, it uses commercial off the shelf components. By using these COTS components the cost to the end user is greatly reduced by the economics of scale in purchasing parts that are used for more than supercomputing alone. The use of COTS components also leads to the benefit of being able to scale the capability of the cluster. Since commercial components are available in a wide variety of sizes and packages, the user is able to configure the system with the amount of memory, storage capacity, and processing power that is needed to complete the desired task. As new components and technologies become available the user is also able to upgrade the system as they see fit, without the need to completely replace the system. [4]

Another advantage to Beowulf clusters over the older generations of supercomputers is the standardization that they have brought about. With the supercomputers of the 1960’s and 1970’s, each generation brought about significant changes to the operating system, instruction sets, and communication protocols used. This left users often needing to rewrite large portions of their code every time they upgraded their computers, leading to additional costs. With the advent of cluster computing, manufactures and users realized that a set of standards was needed for the high-performance computing world. One of the most notable of these standards was the MPI standard for message passing between processors.

Disadvantages

While Beowulf clusters have many advantages, they are not a cure all solution for the world of high-performance computing. To efficiently use a Beowulf cluster, programs must be able to be written in a highly parallelized manner. Other disadvantages are related to the memory storage architecture of a Beowulf cluster. Since each node of a Beowulf cluster contains its own memory, any program data must be exchanged over the network interconnection. If the task being completed requires a large amount of data to be passed back and forth, this can quickly clog the communication channels. Another disadvantage to the distributed program memory is that once all of the nodes complete their assigned data processing, the results must be sent back to a master node which then compiles the results. If a large amount of work is needed to recompile the results, it can negate any benefit that was achieved in distributing the computing to begin with. All of these disadvantages however are not unique solely to Beowulf clusters, but computing clusters in general and must be considered when choosing computational resources for a specific project.

Beowulf Architecture

In general a Beowulf cluster consists of individual computer nodes and the interconnection between them. Typically there are two different types of nodes in a Beowulf cluster, the master and slaves. The master node is responsible for scheduling jobs, and monitoring and controlling the resources of the cluster. The slave or compute nodes receive tasks from the master node and process their assigned task in conjunction with the other compute nodes to accomplish the desired task [4]. Fig. 3 shows the organization of a simple Beowulf cluster.

Figure 3: Diagram of a simple Beowulf cluster [9]

Node Hardware and Software

Each node of a Beowulf cluster consists of a typically available PC with its own operating system, memory, storage, and processor. By using commercially available components the user has the opportunity to choose a large number of specifications to sculpt the system to their individual needs. The individual nodes can vary from traditional desktop PCs and laptops to highly available blade servers to even more exotic systems that use PS3’s, case-less motherboards, and even one that used Raspberry Pi single board computers.

Since each node of a Beowulf cluster is its own computing entity, it contains its own operating system. This can be either Windows or Linux based, examples of both of which are widely available. In addition to the operating system, a resource management software package must be installed on all of the nodes so that it can receive jobs from the master node. There are several packages available that can perform these tasks, including Condor, PBS (Portable Batch System), and Maui.

Condor

Condor was one of the first widely available job schedulers for computing clusters. It was developed by the University of Wisconsin – Madison, and has been continuously improved over the years to include a wide variety of features. One of the most notable features of the Condor package is the distributed job submission feature which allows remote users to submit a job to the cluster from their own PC. Other features include job and user priority settings that schedule compute jobs based on priority levels set by the user and cluster administrators. It also includes features to suspend and restart a job, so that it can be installed on any computer and only run when the computer is not being used, making it an effective tool in grid computing projects. [4]