Grid Performance Evaluation and Modelling
The Grid is a highly heterogeneous environment that can potentially provide seamless, fast and efficient access to range of resources that are distributed over a wide area. At the moment there are no commonly accepted ways to systematically measure and understand the types of metric that can make Grid performance evaluation and modelling an engineering discipline, rather than an ad hoc exercise as is the case currently. After reviewing some ongoing efforts, we first break down the Grid into manageable components. Then, for each constituent component, we describe it characteristics and define metrics that can be used for understanding its performance.
Performance evaluation and modelling of computer-based systems has always been, to say the least, a contentious and problematic exercise. A tension often arises due to the varying stakeholder viewpoints. For example, application scientists are normally only interested in the fast and reliable execution of their application. Whereas, the systems operators typically desire a system that is easy to configure, manage, and maintain. Alternatively, the vendor often wants to highlight the good aspect of their machine and minimise any bad facets. Other stakeholders, such as a funding body may have other criteria, like costs, or the reliability of the vendor, and so on. Consequently, attempts to create “standard” performance measurements and methodologies to date have only been partially successful. It should be noted here that these attempts have only been made on sequential or homogeneous parallel systems. The evaluation and modelling of the Grid will introduce a raft of new concerns and issues, as well as providing an effective way to address the difficult issues around the functioning of grid applications and middleware.
1.1The Necessity of Performance Evaluation and Modelling: Motivation
In the past performance evaluation and modelling of computer system was performed, in the main, for one of three reasons:
- To help purchase the best system to execute a suite of applications,
- To understand architectural concerns and look at ways of enhancing future systems,
- To help optimise an applications performance, based on knowledge of how the application should execute on that architecture.
The evaluation and modelling of the Grid, in some way changes our fundamental reasons for undertaking this task; as we are no longer looking at a single system, but rather considering potentially large collections of resources, both hardware and software, working together to provide services to an application. In addition, we are no longer considering a quiet and controlled system in which to do our performance evaluation; now we are forced to cope with a wide-area distributed system, where we may not have exclusive control over the components, but also they may fail or become a bottleneck while the evaluation tests are being carried out.
2.1The GGF Grid Benchmark Research Group
The GB-RG  plans to advance the efficient use of grids by defining metrics to measure the performance of grid applications and architectures and rate the functionality and efficiency of grid architectures. These metrics should assist good engineering practices by allowing alternative implementations to be compared quantitatively. The defined tasks will be specified as paper-and-pencil benchmarks that can be implemented, in principle, using any of the existing and future grid environments. The BWG also aim to provide some reference implementations of the tasks that can be used by grid users and developers as starting points for assessing grid implementations. It appears that this working group has made little real progress since its inception.
2.2IETF Benchmarking Methodology Working Group (BMWG)
The goal of the BMWG  is to make a series of recommendations concerning the measurement of the performance characteristics of various internetworking technologies; further, these recommendations may focus on the systems or services that are built from these technologies. The BMWG are focussed on primarily on Internet networking technologies, and are not on Grid technologies at the moment.
2.3Grid Assessment Probes (GRASP)
As a means of attempting to provide an insight into the stability, robustness, and performance of the Grid, the GRASP project  have developed a set of probes that exercise basic grid operations by simulating simple grid applications. The probes can be run on a grid testbed, collecting performance data such as compute times, network transfer times, and middleware overheads. The GRASP system is currently designed to use only Globus infrastructure and tests the following activities:
- Check for a valid grid proxy,
- Perform a basic authentication to all nodes involved in the probe,
- Validate the configuration file, if necessary,
- Check directory sizes to ensure that target directories can accommodate files to be transferred,
- Query the Globus MDS to and find available information on all nodes involved in the probe.
The GRASP project is at an early stage of development; the developers aim to produce a suite of reprehensive Grid application benchmarks to test and evaluate Grid environments.
GridBench  is work package of the CrossGrid project . The goal of GridBench is to propose a set of performance metrics to describe the performance capacity of the Grid and its applications. The work package aims to develop and implement GridBench, a suite of benchmarks that are representative of typical Grid workloads. The benchmarks will be used to estimate the performance of different Grid configurations, to identify factors that affect end-to-end application performance, and provide application developers with initial estimates of expected application performance.
2.5Integrated Performance Analysis of Computer Systems (IPACS)
The IPACS Project  is funded by a German 'High-Performance-Computing' programme. Within this project, grid benchmarks and the technologies for grid-adaptive applications will be developed. The grid benchmarks will be used to analyse and to parameterise the grid environment in the first instance. These benchmarks will be the basis for the further development and optimisation of grid products and applications. IPACS started in the summer of 2002, no results or downloads are available yet.
Most of the efforts described above are adapting existing techniques for benchmarking from various disciplines to the Ggrid, without taking into account the unique features of the Ggrid. While this is understandable, since it makes sense to leverage existing knowledge, it is unclear whether such benchmarks can provide answers to the questions that are posed by grid applications and grid systems. In the following sections, we will briefly review the state of the art in computer system benchmarking (Section 3), the architecture of the Grid, particular features that are unique to the Grid (Section 4), and some recommendations for grid benchmarking and performance modelling (Section 5).
3A Historical View of Performance Evaluation and Modelling
A variety of benchmarks have been used as the means of assessing the performance of computer architectures. Typically, the benchmarks can be classified into three categories:
a)Low-level, these determine the rates at which a machine can perform fundamental operations, such as MIPS, flops/s, and memory I/O.
b)Application kernels, these are typical core application algorithms, such as matrix operations or maybe an FFT.
c)Full applications, these would have all the major components of full applications, such as the NAS Parallel Benchmarks, which are based on computational fluid dynamics applications, or the GAMES  code.
4The Architecture of the Grid
4.1Characteristics and Components
One simple way to look at the Ggrid is as a collection of computing resources connected by network links. In this model, the computing [A1]resources are endpoints in a graph representing the connections between the resources; the network connecting them makes these separate resources into a grid.
There are two major types of grid:
a)An intra-organizational grid; this is a grid within a single administrative domain. An example is a tool to make use of the desktops within a single organization.
b)An inter-organizational grid;this is a grid that crosses administrative domains. In addition to the issues faced by software for an intra-organization grid, software for an inter-organizational grid must deal with the complex issues of authentication, trust, and security.
The Ggrid differs from classical computer systems in several respects:
a)Grids are typically physically large, often spanning thousands of kilometres. This implies a speed-of-light delay of milliseconds (10-3 seconds), compared with the clock rate of a typical personal computer that is now measured in fractions of nanoseconds (10-9 seconds). This leads to a different application mix than is appropriate for a single parallel computer or computing resource.
b)Grid rResources of a grid are shared. Even if the computational services are dedicated, the network connecting them is typically shared with many users (it if is the Internet, with millions of users). Because resources are shared, the sort of experimental reproducibility based on the use of dedicated resources so common in computer system measurement is rarely feasible or realistic in the Ggrid.
c)Grid resources are heterogeneous. While it is possible to build a grid out of homogenous parts components (e.g., the same computing platform at all endpoints), this is uncommon.
d)Faults are a fact of life on the Ggrid. Grid software and algorithms must deal with faults; this differs greatly from the single-system case (at least in most areas of technical computing) where faults are rare and software and algorithms are designed under the assumption that faults are exceedingly rare. Benchmarks, particularly for grid usability and productivity, must be sensitive to how the system responds to faults.
e)Grid applications need to operate securely on distributed resources using various security measures (such as firewalls or using technologies such as Kerboros). The extra security needed in a grid, has potentially an impact and consequences on an applications performance, which would not affect an application on a traditional single-site platform. It is important that we understand the impact that of different levels of security has on grid applications.
A grid consists of heterogeneous hardware and software components linked together over, potentially, a wide area via the Internet. We can categorise a grid as consisting of hardware and software services… something generic here as to why we want this!
Can’t say much about this side Basically stuck with what’s out these. May be worth thinking about the means of characterising these statistically at a later stage…
Software layers and their affect… e.g. GT + WS == GS, SOAP engines,…
Client – client side processing used to initiate some action!
Information Services – registration, lookup, update, and remove services and associated information!
- Security – authorisation, authentication, assertion…
- Communications – put/get information + data
- Batch Systems -
- Spawning – starting, stopping and removing job.
- Servers – apache, tomcat….
- Runtime libraries and language…
5Measurements and Metrics
Thoughts: may be advocate…
SI units -
- Hockney’s ideas -
- Is there an IETF effort?
- May be look at IETF Benchmarking Methodology (bmwg) ,
Test set up
Existing definitions - use IETF where possible.
The Meaning of
Common Definitions and discussion
6Goals and Requirements for Grid Benchmarks
As described in Section 3, there are many well-understood techniques for understanding the performance of a single computer system. Thus, a grid benchmark should be measure the features of the Ggrid that are (roughly) independent of the performance of a computational endpoint. That is, the characteristics measured by a grid benchmark should be (nearly) orthogonal to those measured by existing benchmarks for single systems.
Grid benchmarks should be reproducible. Reproducibility is a hallmark of good experimental science. Since the grid is a shared resource and in most cases cannot be completely controlled by the benchmarker, good grid benchmarks will need to use statistical techniques to provide valid results.
Usability benchmarks are much harder to quantify (and to reproduce). However, such benchmarks are needed to help improve the robustness and reliability of grid software.
7Summary and Conclusions.
The SPEC Benchmarks,
Grid Benchmarking Research Group,
IETF Benchmarking Methodology (BMWG),
Grid Assessment Probes (GRASP),
Integrated Performance Analysis of Computer Systems (IPACS),
M.A. Frumkin and L. Shabanov, Arithmetic Data Cube as a Data Intensive Benchmark NAS Tech Report NAS-03-005,
, R.F. Van der Wijngaart, R. Biswas, M. Frumkin, and H. Feng, Beyond the NAS Parallel Benchmarks: Measuring Performance of Dynamic and Grid-oriented Applications, Workshop on the Performance Characterization of Algorithms, July 2001.
, B. Plale, C. Jacobs, Y. Liu, C. Moad, R. Parab, and P. Vaidya, Benchmark Details of Synthetic Database Benchmark/Workload for Grid Resource Information, Indiana University Technical Report 583, August 2003, 27 pp.
C.A. Lee, C. De Matteis, J. Stepanek, and J. Wang, Cluster Performance and the Implications for Distributed, Heterogeneous Grid Performance,
Heterogeneous Computing Workshop 2000, pp 253-261,
[A1]1Shouldn’t this be “computer-based resources”, as they may be satellite feeds, databases, routers etc…?