Performance Measurement Architecture

26 March, 2002

Editor: Russ Hobby, Internet2

Introduction

A primary component of the Internet2 End-to-End Initiative is Performance Measurement. Measurement is necessary to determine the health and capabilities of the network connection as well as to locate performance problems when they occur. Currently each computer system administrator or network operator has some measurement capability within their own domain. There are also simple tools to probe the performance along specific paths. However, there are no consistent measurement capabilities across domains that can provide detailed information on the performance along a path.

This document lays out the various aspects of an End-to-End Performance Measurement Architecture. End-to-end performance in the context of this document will not only include the traditional network portions of the path, but also the host computer/operating system and applications that make up the complete path as a performance bottleneck in any of these portions will affect the overall performance as seen by the und user.

This paper will look at the measurement architecture as three main sections, the Measurement Infrastructure, the Measurement Analysis and the Presentation. The paper will also consider some of the common tools and support needed by all three sections.

The Measurement Infrastructure is the instrumentation of the end-to-end path that provides the basic data needed to determine performance capabilities and locate problems. This includes measurement devices, data from existing devices in the path and the data with their formats that are generated by these devices. This section will also look at the storage of the data to provide a performance history of the various portions of the path and security issues such as access control.

The Performance Analysis section addresses how the basic data can be analyzed to provide meaning to the collected data.

The Presentation section looks at the needs of various types of users of the measurements and the presentation modes that might be made available to them. Finally the Tools and Support section will examine some basic tools that might be made available to more easily create and support all parts of the Performance Measurement Architecture.

Measurement Infrastructure

The measurement infrastructure collects and provides all the basic measurement data used to determine performance. The infrastructure consists of all the measurement devices that collect the basic measurements and store them for use in analysis.

2.1.Measurement Attributes

Measurement attributes are the data that is associated with a particular measurement. This includes not only the particular measurement value but also information that indicates where and when the measurement was taken.

2.1.1.Measurement Type

This attribute is main aspect of measurement collected. For this document the top ten being discussed by the GGF DAMED Working Group are included. This list will be changed as discussion progresses but the end result should have a detailed description of each measurement such that everyone implementing a measurement devise for that attribute will get the same results from the same input data.

2.1.1.1. Network Measurements

2.1.1.1.1. Packet Loss

2.1.1.1.2. Bandwidth

2.1.1.1.3. Round Trip Time

2.1.1.1.4.Trace Route

2.1.1.1.5. Queue Length

2.1.1.2. Host/OS/Application Measurements

2.1.1.2.1.CPU Load

2.1.1.2.2.Available memory

2.1.1.2.3. Disk space

2.1.1.2.4.Process Load

2.1.1.2.5.Server status

2.1.1.2.6.Application status

[I waffle on if the measurements should be divided between network and host. Some things would go under both, like packet loss, but others are unique to the area - Russ]

2.1.2.Measurement Scope

This attribute indicates over what scope the measurement applies. For example a packet loss measurement in a router could be for a particular flow, for an interface, or for the whole router. Measurement Scope may have different definitions for different parts of the path. For the Host/OS/Application the scope could range from the resources of the entire grid to a subsystem on a particular computer. Each scope attribute needs to be define so that an implementer is clear on the boundaries of each scope domain.

2.1.2.1. Network Measurement Scope

2.1.2.1.1.Entire Device

2.1.2.1.2.Interface

2.1.2.1.3.Flow

2.1.2.2.Host/OS/Application Measurement Scope

2.1.2.2.1.Entire Grid

2.1.2.2.2.Single System

2.1.2.2.3.Individual Subsystem

2.1.3.Measurement Time

Measurement Time indicates the time or range of time over which the measurement was taken. All times should be reported in (universal time). There should also be an indication of the accuracy of the clock source.

2.1.3.1. Begin Time

2.1.3.2. End Time (-1 if a point sample)

2.1.3.3. Time Source Accuracy

2.1.4.Measurement Location

The Measurement Location attribute indicates where the measurement was taken. In some cases it may be desirable to know the geographic location, for example to determine if the latency is due to a network problem or just the speed of light. It is also desirable to know where the measurement was taken relative to logical network map or along a particular network path. This would assist those trying to find information for particular paths or flows.

2.1.4.1. Geographic Location

Geographic location could be a latitude and longitude, or it could be a named location such as a city. The lat/long could be more accurate but perhaps not easily determined by the device operator who would need to enter it into the device configuration. Named locations do not provide precise location but are easily known by the equipment operator.

2.1.4.2. Path location

When investigating end-to-end performance it is important to understand where along the path the measurement are taken. It might be sufficient to provide an IP address that can be compared to a network map or traceroute result. On the other hand would be nice to have some standard nomenclature to specify the location of segment or device on the path. The name of the operator may als0be included in the name. Perhaps the standard nomenclature could be used as the DNS name to make traceroute more informative. The same could be applied to circuit names as specified by the device interface. Below is an example of such a nomenclature.

2.1.4.2.1.Host Computer

2.1.4.2.2.Edge Device

2.1.4.2.3.Campus Core Router

2.1.4.2.4.Campus Border Router

2.1.4.2.5.Gigapop Connector Router

2.1.4.2.6.Gigapop Core Router

2.1.4.2.7.Backbone Router

[Actually these names need to align with the general Internet so more generic names should be used, but you get the idea – Russ]

2.2.Measurement Method

There are different methods of obtaining measurements. This section will look at some of those method and present some pros and cons of each method.

2.2.1.Active Measurement

Active Measurement sends sampling data along a path to determine the performance capabilities. Active measurement can be an accurate measure of the performance since it can send data flows that are equivalent to a real application. However active measurement also consumes the resources that it measures. To prevent all network bandwidth being consumed by active measurements, they need to be limited in use. If results from active measurements can be made available to all, then others can use those results rather than repeating tests that would result in the same result.

2.2.2.Passive Measurement

Passive measurement is performed by observing existing traffic. Passive measurements can easily be used to determine gross performance characteristics, for example measuring packets delivered through an interface. Passive measurements become more difficult as detail is increased. It is also more difficult to select the appropriate data for passive measurement as the speed increases. However passive measurements have the advantage that they do not affect the network performance.

Ideally one could use passive measurements to observe a single flow as it passes through each device along the path. By comparing attributes such as the time that each packet passed though each device, one could tell exactly where and when there were delays or packet loss. However detail at this level is well beyond the capabilities of any current measurement system. Current systems typically look at measurements from only one device and some form of aggregation of data is necessary to make the amount of data manageable. Comparison of flow data from consecutive devices along a path is seldom done but is an area of study that should be considered.

Another issue with passive measurement is the preservation of privacy of the information being transmitted. Most people agree that the content of the data should not be recorded but some also have concerns that recording the envelope with the source and destination addresses may be used to invade a person’s privacy by disclosing patterns of network use. Aggregation of information help to mask an individual’s specific information, but it also limits the assistance that can be provided in solving an individual’s performance problem.

2.2.3.Beacon

A Beacon is a form of active measurement but the measurement is view from the perspective of a particular application. A standard application, for example an H.323 client, might connect to a beacon to determine the performance to the location of the beacon. The beacon will be specific to a particular application. The beacon will collect performance information and report it to the user of the application. A beacon can be used as a divide-and-conquer point along a path of desired use. A beacon can also focus on the particular attributes that are important to the application. For example if an application if particularly sensitive to jitter but not with packet loss, the beacon will provide more information on the jitter measurements.

2.3.Information Schema and Exchange Format

The Information Schema and Exchange Format is the specification of how the Measurement Attributes will be presented when transferred to a device requesting the information. The Schema and Format could also influence the way the data is stored in Data Repositories. Note that definition of the Schema is independent of the definition of the Measurement Attributes. Although the Measurement Attributes will use the Schema for representation, the Schema should be designed to be flexible enough to accommodate all future Measurement Attributes.

One option for the schema would be to adopt the format used by standard Internet MIBs since it is already established in the industry. However ASN.1 used in the MIBs is somewhat cumbersome and other methods may be considered.

2.4.Measurement Information Exchange Protocol

It may not be desirable to follow the MIB model in using SNMP for the Information Exchange Protocol since SNMP would have problems scaling for use by many requestors. However, SNMP might be used to collect MIB data from various devices that already provide such information. Information gathered via SNMP by a collection station can then be presented to the rest of the Measurement Infrastructure via the standard exchange protocol. One method of exchange that has been proposed is XML.

2.5.Data Repository

A data repository is a collection of measurement data from an area of the measurement infrastructure. It may contain information from all methods of measurement, active, passive, beacons, etc. A data repository may also collect information via non-standard or proprietary method. However all data presented from the data repository to clients will be represented in the standard information architecture format.

A data repository may also do some aggregation or analysis on the data and store the results to later be presented to clients.

2.6.Accesses Control and Security

The measurement architecture is a resource that needs to be protected from abuse and overuse. It is a resource where some portions will be used by many people from a wide variety of organizations. For example any end-user may want to determine the performance potential of a particular path. For some highly intrusive tests it is desired to limit access to a set of specialists. However even this group could consist of people from many organizations. Therefore there is a need for different groups with different access capabilities and a means to authenticate them. Since these groups will be large and from different organizations, maintenance of a single authentication database is not feasible. Rather each organization will maintain its list of people and their authorization levels. Inter-realm authentication will be used to verify that a person has the appropriate access level when he requests measurement architecture information.

2.6.1.Access Levels

Each access level group specified below defines the community of people that belong in that group and the access capabilities of each group.

[I am not entirely sure that this method of grouping will work. If each organization is responsible for assigning the people that are at each level, will other organizations honor that? How does this scale? Would a one man business be able to assign himself access to all levels and have the world honor it? This area needs some discussion – Russ]

2.6.1.1. General End-user

This group encompasses almost all network users. As such this level may not require any authentication. However information available to this level will primarily be pre-collected data and will generally not generate a test based on the request. If the information does not already reside in the data repository, then a test may be run to fill the gap.

2.6.1.2.Computer System Administrator

This group includes personnel that operates and configures computer systems. This group may need access to information on other computers with which their computer will interact. This group will need to be authenticated before they can have access to information on other computers. One approach would be to have all certified Computer System Administrators to have access to information from all computers. Another approach would be to have communities of Computer Systems Administrators that would have access to information from the computers in the community but not to other computers.

2.6.1.3. Network Operator

This group includes personnel that operates the network within a particular domain. They need to have access to information in other domains to location performance problems and bring them to resolution

2.6.1.4. Network Engineer

The Network Engineer solves problems that others can not and does design of new networks and changes to existing networks. Therefore the Network Engineer needs access to most all information and be able to run tests.

2.6.2.Authentication Methods

2.6.2.1. Local User/Password

This is a means of accommodating authentication for sites that have not implemented Inter-realm Authentication. This can consist of a simple user/password list on the individual device or tie into a local authentication system such as a local Kerberos, Radius or Tacacs.

2.6.2.2. Inter-realm Authentication

Inter-realm Authentication allows an organization to assign individuals to various permission groups such as those in 2.6.1. Inter-realm authentication also allows other organization to verify that an individual has the proper credentials to perform activities allow by membership in a group. Principal to the Measurement Architecture is the Internet2 Shibolith Project to create an inter-realm authentication infrastructure.

2.7.Toolkits for implementing the Measurement Infrastructure

This section outlines toolkits available for creating the Measurement Infrastructure.

Measurement Analysis

While the Measurement Infrastructure will collect data from all modes of measurement, analysis is required to make this data meaningful. Sometimes data items themselves will be the results of analysis of other data. For example, it may be desirable to create some aggregated totals that can be made available rather than having everyone pull up all the data and aggregate them their selves.

There can be many things that one would want to determine from the analysis of the measurement data. Below is a start on a list of analyses that may be done on the data.

3.1.Current Network Status Capabilities

Network operators, engineers planning network upgrades, administrators paying for the network, and others want to know how the network as a whole is performing. Analysis in this area will help network planning, network budgeting, and the selection of network services.

If detailed information is available from the Measurement Infrastructure, this area of analysis can also help end-users determine the capabilities of a communications path. This could would allow an application to adapt to the performance levels available at the time.

3.2.Performance Problem Location

When performance is not what is expected, people want to know the location of the problem. Through analysis of the Measurement Infrastructure, performance problems can be isolated to specific spot along the path. The appropriate people can then be notified and the problem resolved.

3.3.Attack Detection and Location

The Measurement Infrastructure can also provide data to track down security attacks. With a historical record of various network attributes it should help to locate the source of an attack.

Presentation

While it would be good to have the measurement data and analysis available to the various groups that need it, it would also be nice to have it presented in a consistent way so that the person reviewing the information would have some familiarity with how it looks. This does not mean that all presentations have to look exactly alike, but like a car’s dashboard, one would immediately know how to interpret the information. As with the Access Levels there are different audiences for different sets of data. Likewise there will have to be different presentations of that data. There are also different avenues of presenting the data. Some possibilities are listed below.

4.1.User Groups

4.1.1.1.Researchers of Networks

4.1.1.2.Network Engineers

4.1.1.3.Network Operations

4.1.1.4.Application Developers

4.1.1.5.End Users

4.2.Presentation Modes

4.2.1.1.Web/Java

4.2.1.2.Client Software

4.2.1.3.Application Interface

Common Tools and Support

This section provides information on tools and support available that can be used to help build the Measurement Architecture.

5.1.Authentication

5.1.1.1.Shibolith

5.2.Web Interface Tools

[I need help with this list – Russ]

Diagrams
Current Map of Measurement Tools in Location vs User

Areas of Work

This section outlines work items needed to fulfill the Measurement Architecture in order of priority

[TBD – What do you think? – Russ]