Task 3.3 Grid Monitoring

Santa-G Design Document

WP3 New Grid Services and Tools

Document Filename: / CG3.3.2-D3.2-v1.2-TCD020-SantaGDesign.doc
Work package: / WP3 New Grid Services and Tools
Partner(s): / TCD, CYFRONET, ICM
Lead Partner: / TCD
Config ID: / CG3.3.2-D3.2-v1.2-TCD020-SantaGDesign
Document classification: / PUBLIC
Abstract: This document specifies the detailed software design for the SANTA-G monitoring tool, part of CrossGrid Task 3.3 ‘Grid Monitoring’.
CG3.3.2-D3.2-v1.2-TCD020-SantaGDesign / PUBLIC / 1 / 1
/ Task 3.3 Grid Monitoring
SANTA-G Design Document
CG3.3.2-D3.2-v1.2-TCD020-SantaGDesign / PUBLIC / 1 / 67
/ Task 3.3 Grid Monitoring
SANTA-G Design Document
Delivery Slip
Name / Partner / Date / Signature
From / WP3, Subtask 3.3.2 / TCD / Aug 8th, 2002 / Brian Coghlan
Verified by
Approved by
Document Log
Version / Date / Summary of changes / Author
1-0-DRAFT-E / 02/08/2002 / Draft version / Brian Coghlan, Stuart Kenny
1-0 / 08/08/2002 / Brian Coghlan, Stuart Kenny
1-1 / 20/08/2002 / Added query engine description, added security issues, altered class listings. / Brian Coghlan, Stuart Kenny
1-2 / 30/08/2002 / Changed with regard to reviewers comments. / Brian Coghlan, Stuart Kenny

Contents

1. EXECUTIVE SUMMARY

2. INTRODUCTION

2.1. Purpose

2.2. Definitions, Abbreviations, Acronyms

3. REFERENCES

4. SYSTEM DECOMPOSITION DESCRIPTION

4.1. The grid monitoring system

4.2. NON-INVASIVE MONITORING (SANTA-G)

4.2.1. The Publishing Module

4.2.2. The Viewer Module

4.2.3. The Query Engine

5. DEPENDENCY DESCRIPTION

6. INTERFACE DESCRIPTION

7. Security issues

8. DETAILED DESIGN

8.1. Class Definitions

8.2. Query Engine Classes

8.3. Sensor Information Table Descriptions

1.EXECUTIVE SUMMARY

The following document provides the design description for the SANTA-G Grid monitoring tool, which forms part of Task 3.3 ‘Grid Monitoring’. The monitoring system to be designed within Task 3.3 will provide information from the three major sources of performance data: applications, instruments, and infrastructure. An invasive monitoringtool, OCM-G, will obtain application information. SANTA-G, a non-invasive monitoring tool will obtain monitoring information from instruments. While a further tool, based on the new Jiro technology will obtain information from the Grid infrastructure. Please refer to Section 2.1 of the Task 3.3 SRS for a description of the use and interaction of the three monitoring tools. This document provides a detailed description of the design of the SANTA-G tool. For clarity the design document for Task 3.3 has been divided into three separate documents, each dealing with one of the three components, i.e. OCM-G, SANTA-G and Jiro.

Section 2 provides a brief description of the task and each of the components that make up the Grid Monitoring system. It also gives the definitions of the abbreviations used within the document. Section 3 provides a list of references.

Section 4 provides a decomposition of the SANTA-G tool into modules. UML component and sequence diagrams are given to show the construction and interaction of the components within these modules.

Section 5 is a description of the dependencies of SANTA-G to modules, or components, developed in other tasks. It describes the required input or communication from these.

Section 6 is the interface description. This is a detailed description of the external interface provided by the SANTA-G system, including APIs and GUIs.

Section 7 provides the detailed design description. UML class diagrams show the internal structure of the components identified in Section 4. It also contains a detailed description of each of the classes’ methods and attributes.

This design description is a work in progress, and will certainly be subject to change as the implementation phase progresses. The class structures and descriptions of methods should therefore be taken as a guide of the current implementation intention, rather than as a description of the completed system.

2.INTRODUCTION

2.1.Purpose

This task will extend the Grid information system content to include three of the major sources of performance data: applications, instruments and infrastructure.

The products of Task 3.3 are:

(a)an OMIS-based application monitoring system, OCM-G,

(b)additional services, SANTA-G, for ad-hoc non-invasive monitoring, and

(c)Jiro-based services for Grid-infrastructure monitoring.

OCM-G is a distributed monitoring system for obtaining information on and manipulating parallel distributed applications. The purpose of this system is to provide a basis for building tools supporting parallel application development

SANTA-G services are a specialized non-invasive complement to other more intrusive monitoring services. The application of these services will be in validation and calibration of both intrusive monitoring systems and systemic models, and also for performance analysis. The objectives are to allow information captured by external monitoring instruments to be introduced into the Grid information system, and to support analysis of performance using this information

The Jiro-based services for Grid-infrastructure monitoring are intelligent components for obtaining information from and manipulating Grid hardware devices. The application of the software is to gather information from hardware devices, make autonomous decisions based on this information, and take necessary actions. The objectives are to allow the user to specify desirable logic for managing hardware. The addition of these three components will greatly expand the quality and quantity of the Grid information system content.

Please refer to Section 2.1 of the Task 3.3 SRS for a more detailed description of the use, and interaction of these three components.

2.2.Definitions, Abbreviations, Acronyms

CACertificate Authority

CoGCommodity Grid Kits

CrossGridThe EU CrossGrid Project IST-2001-32243

DataGridThe EU DataGrid Project IST-2000-25182

GSIGrid Security Infrastructure

GUIGraphical User Interface

HTTPHypertext transport protocol

HTTPSSecure hypertext transport protocol

ICMPInternet Control Message Protocol

IPInternet Protocol

JDBCJava Database Connectivity

JiroSUN Jiro, Implementation of the FMA specification

OCM-GGrid-enabled OMIS-Compliant Monitor

OGSAOpen Grid Services Architecture

OMISOn-line Monitoring Interface Specification

RDBMSRelational Database Management System

R-GMADataGrid relational Grid monitoring architecture

SANTASystem Area Network Trace Analysis

SANTA-GGrid-enabled System Area Network Trace Analysis

SCIScalable Coherent Interface

SNMPSimple Network Management Protocol

SQLStructured query language

SRSSoftware Requirements Specification

TCPTransmission Control Protocol

UDPUser Datagram Protocol

XMLExtensible markup language

3.REFERENCES

CrossGridCrossGrid Project Technical Annex CROSSGRIDANNEX1_V0.1.DOC

DataGridDataGrid Project Technical Annex DataGridPart_B_V2_51.doc

EthernetIEEE-802

GSI

HTTPIETF RFC 2616,

ICMPIETF RFC 792,

IPIETF RFC 791,

Java CoG

JDBC

Jiro

OGSAThe Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. I. Foster, C. Kesselman, J. Nick, S. Tuecke, January 2002.

OMISOMIS – On-line Monitoring Interface Specification. Version 2.0. Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik (LRR-TUM), Technische Universität München.

R-GMADataGrid Project Deliverable 3.2 DataGrid-03-D3.2-0101-1-0

SANTAPrototype Trace Probe and Probe Adaptor and Prototype Trace Software, Esprit Project 25257, SCI Europe, Deliverables, Trinity College Dublin, 1999

SCIIEEE 1596, IEEE Standard For Scalable Coherent Interface (SCI), IEEE Std 1596 – 1992, IEEE Computer Society, Aug. 1993

Spitfire

SQLANSI SQL 99 Standard

Task3.3 SRSTask3.3 Grid Monitoring Software Requirements Specification

CG-3.3-SRS-0013

Task2.4 SRSTask2.4 Interactive and semiautomatic performance evaluation tools

CG-2.4-DOC-0001-1-0-DRAFT-A

TCPIETF RFC 793,

TCPDump

UDPIETF RFC 768,

XMLFallside, D.C. XML Schema Part O: Primer. W3C, Recommendation, 2001,

4.SYSTEM DECOMPOSITION DESCRIPTION

4.1.The grid monitoring system

As stated the Grid Monitoring system will provide information from the three major sources of performance data: applications, instruments and infrastructure.

Application information will be obtained by OCM-G. Specialised application monitors embedded in the application address space will provide dynamic application data, such as lists of running processes, CPU loads etc. It will also allow for the manipulation of applications, starting and stopping processes etc. Monitors embedded in the application address space in this way, will consume some of the host system resources. This can affect the monitoring data obtained. The SANTA-G system avoids this, as it is a non-invasive monitoring system. SANTA-G can be used for ad-hoc experiments, which monitor the behaviour of grid components from the outside. The data can be used to both validate and calibrate the OCM-G system. The Jiro tool will monitor infrastructure components. Jiro is a new technology that aims to simplify the management of extremely large networks by providing intelligent management services for networked devices. Jiro components will be used to provide information on Grid Infrastructure components, such as routers, switches (via SNMP) and computers (via kernel interfaces). In this way Jiro will allow the monitoring of manufacturers equipment from the inside, i.e. providing access to ‘built-in’ information such as that supplied by SNMP.


Figure 4.1.1 shows the modules that make up the Grid Monitoring system. The following section describes the SANTA-G module in detail, breaking it into further modules, and each module into its individual components,

Figure 4.1.1 The Grid Monitoring System

4.2.NON-INVASIVE MONITORING (SANTA-G)

SANTA-G will non-invasively monitor Grid components using software, and create a relational trace database of this information. As stated in the SRS there are three main functions associated with this activity:

  1. Allow a user to initiate non-invasive tracing of grid resources. Collect the trace data, and provide access to the data through the Grid information system.
  2. Allow a user to select the required subset of trace data, by way of the Grid information system, persistently store this subset of data in a relational trace database, and provide access to this stored data for further analyses.
  3. Provide information required by dependent subsystems within the Grid services and Tools system i.e.: Task 3.2, and Task 3.4, and to external subsystems i.e.: Task 2.4.


The SANTA-G system can be broken down into two main modules (see Figure 4.2.1) that provide these functions: a publishing module, and a viewer module.

Figure 4.2.1 SANTA-G System Modules

4.2.1.The Publishing Module

The purpose of the publishing module is to take the trace data created by the trace instrument and to import the data into the Grid Information System. It essentially provides the three functions listed above by making the trace data available through the Grid Information System. Once the data is entered in the system, users (including dependent tasks) can access it for further analyses by using other R-GMA components such as Consumers and Archivers. The publishing module can itself be broken down into several components, as can be seen in Figure 4.2.2:


Figure 4.2.2 Components of the Publishing Module

In the case of network monitoring a Sensor will be run on a separate network tracing instrument. These instruments will be ordinary PC’s with network monitoring cards. Each node that is to be monitored will require a separate network monitoring instrument, or alternatively, a switch can be monitored, if it has the required monitoring facilities. In the general case the placement of the Sensors will be determined by the ad-hoc experiment being conducted. A Sensor is configured by changing entries in a sensor configuration file. This file contains entries such as, traces archive directory, and JDBC username and password (for accessing the File and Trace information tables maintained by the DBProducer). The Sensor must be run with the privileges necessary to start TCPDump. For example with Linux it must be run as root, or alternatively TCPDump can be installed setuid equal to root. The Sensor is started at boot up and runs continuously, collecting network traffic. If a failure occurs it is restarted. On initialisation the Sensor reads in the configuration values from the configuration file and instantiates the DBProducer; it also instantiates the necessary Canonical Producers (one for each table) to represent the database structure. The Canonical Producers are instantiated by specifying a SQL create table statement, that defines the table the producer provides. Each Canonical Producer shall register itself with the Canonical Producer Servlet. The R-GMA will then enter each CanonicalProducer in a R-GMA registry, which shall allow consumers to locate it. This action is hidden within the R-GMA.

A main directory, the Traces_Archive, contains a number of subdirectories (see Figure 4.2.3). Each of these subdirectories is a Trace Directory, containing a circular queue of files. It is these files that actually store the raw trace data. The number of files in a Trace Directory is the queue length, controlled by a variable stored in the sensor’s configuration file.

Figure 4.2.3 Traces Archive Directory Structure

The Sensor maintains two tables of information relating to the traces available. The first is the Trace Information table. It contains the URI of the directory used to store the trace circular queue, the start time of the trace, the number of slots used in the circular queue and a description of the trace. The second table, the File Information table contains information relating to the files used to store the raw trace data within the trace directory. This table stores again, the URI of the directory used to store the trace circular queue, a file ID, which identifies an individual file within the trace circular queue, the start time of this file, the number of packets within the file, and the status of the file, either open or closed, repeated for each file in the queue. The combination of these two tables forms an index into a trace circular queue. These two tables are stored in the DBProducer.

The Sensor initiates a trace. The Trace Directory to use within the Traces Archive, and the number of slots (or files) to use for the circular queue is specified and a new thread is started to run the trace. Once the trace is started an entry is made in the Trace Information table for the new trace. Entries are also made in the File Information table. The File Information table contains an entry for each file within the circular queue for this trace. For example a trace with a slot length of ten will have ten files within the circular queue, and a corresponding ten entries in the File Information table. In the CrossGrid demonstrator, where the external instrumentation example is a network packet tracer, the Sensor starts TCPDump, which begins writing to the indicated file, the first file in the queue. The row, within the File Information table, corresponding to this file is updated by the Sensor to reflect the current state, i.e. the start time of the file is updated to the time at which TCPDump started writing to the file and the status of the file is set as open, indicating that it is being written to. Thus the status field also provides a pointer into the queue - the file currently being written to is the head of the queue. When the file is full (a predetermined level is reached, either number of packets or file size) the Sensor stops TCPDump and restarts it writing to the next file in the queue. The previous file entry in the File Information table is now updated with the number of packets and the status is changed to closed. The new file’s start time is updated and its status set to open. A separate thread carries out the table updates. If the previous file was the end of the queue then the queue is wrapped around and TCPDump is set to write to the start of the queue once more. The sequence diagram, Figure 4.2.4, shows the interaction between the components in the Publishing Module during initialisation of the Sensor and during the creation of a new trace.

Note that SANTA-G is specifically designed to NOT depend on modifying the instrument software. Instead it provides a generic template for introducing information from third-party instruments into the grid information system. While here the focus is on instruments that monitor grid components, the instruments could of course be of any type (e.g. fish sonars or PCR analysers). SANTA-G is particularly oriented to those instruments that generate large volumes of data.


Figure 4.2.4 Publishing trace data process

4.2.2.The Viewer Module


The viewer module allows a user to view the traces stored in the Grid information system. Interaction with the viewer module will be through the Viewer GUI. For the net tracer demonstrator, the Viewer will have a number of controls for direct or query based navigation through a trace and selection of a particular trace or packet (see section 6.2). The Viewer module is composed of several components, see Figure 4.2.5:

Figure 4.2.5 Components of the Viewer Module

A user can choose a trace to view. This information is gathered from the Trace Information table maintained by a SANTA-G Sensor, as described above. The trace can be viewed on a packet-to-packet basis. In the spirit of R-GMA, a SQL query statement shall be used to define what data is to be acquired. The Viewer, by using its own Consumer interface, will contact a Consumer Servlet, which in turn contacts a R-GMA registry in order to locate the required producers of the information. The information is returned through the same mechanisms to the Viewer in the form of a ResultSet. The individual fields of the packet are then extracted from this ResultSet and displayed graphically in the Viewer. The sequence diagram below, Figure 4.2.6 shows this process. Refer to Section 6.2 for a description of the Viewer GUI.


Figure 4.2.6 Viewing data, Interaction between the Viewer and ConsumerServlet