1 Department of Computer Science and Engineering, Hong Konguniversity of Science and Technology

Developing a Distributed e-Monitoring System for Enterprise Website and Web Services:An Experience Report with Free Libraries and Tools

Frank K.W. Cheong1, Dickson K.W. Chiu2,4, S.C. Cheung1,4and Patrick C. K. Hung3

1Department of Computer Science and Engineering, Hong KongUniversity of Science and Technology

2Dickson Computer Systems, 7AVictory Avenue, Kowloon, Hong Kong

3Faculty of Business and Information Technology, University of Ontario Institute of Technology, Canada

4Senior Member, IEEE,

email: , , ,

Abstract

Enterprises value monitoring as it provides dependable e-services, whether it is an interactive web site or programmatic Web service. However, this task becomes non-trivial when enterprisesbegin to require supportfrom thousands of serversacross geographical areas. How can the communications between a monitoring systems and remote servers be minimized? Could the task be achieved easily based on a readily available technology such as SNMP? How can wemonitorthousands of servers powered by, say, Tomcat, which lacks SNMP support? How may a poorly responsive site be identified prior to being reported as a failure by SNMP?In this paper, we propose a unifiede-monitoring system that enables system administrators toremotely monitor the health of distributed e-services in both the form of website and Web services. We further discuss our implementation experience based on a pragmatic prototype.

Introduction

Due to e-services over the Web, the annual transaction volume is expected to grow enormously within the next few years[1]. Major vendorsof information technology like BEA, IBM, Microsoft, and SUNhave been playing an active role in pushingnew standards on Web computing (such as Java, JSP, Servlet,and EJB) and createtimely products to synchronize with those standards. Due to the nature of Web applications and their relatively short product lifecycles, the level of testingprior to production is often inadequate. This is particularly the case for the handling of exceptions and irregular scenarios and may result in un-deterministic and poor performance of the corresponding e-services.

In ane-monitoring system,a number of remote monitoring agent(s) are deployed to closely monitor a list of pre-defined e-services (in both the form of websites and programmatic Web services) from various points of presence in different networks, subnets, or even different geographical regions. Although someproducts (like Load Runner from Mercury Interactive[2]) belong to these categories, they are priced higher and are limited to the few organizations that can afford it.

In this paper, we propose to implementsuch ane-monitoring system based on standard and extensible technologies.The rest of this paper is organized as follows: Section 2 reviews some background and related work. Section 3 presents the logical design of the e-monitoring system. Section 4 details our design and implementation choice. Section 5 discusses some of our experiences in before we conclude the paper with future research and development directions.

Background and Related Work

Figure 1.Simplified network diagram

Figure 1 illustrates a simplified view of the possible components that exist between an end-user and a target e-service server. With the help of this diagram, we can imagine the possible list of partners and even equipment that exist in various locations.

The root cause of the irregularities(incomplete, corrupted, or delayed responses) may be caused by different componentssuch asnetwork problems with local network exchanges, network problems with upstream providers, network problems with co-location service providers, abnormal behavior ofreverse proxy servers, abnormalfirewallsor network switches in between, the server hardware or software, and so on. In order to accurately provide reportson the healthof an e-service suffering from any problematic area mentioned above, more than one monitoring agent must be deployed in various network segmentsand even various geographicallocations.

To reduce the cost of ownership in deploying new remote agents, the remote agent should work in a stand-alone mode (say, in a diskless station or even a small set-top-like box) and be able to communicate with central administration servers in an effective way. This allows for an updated list of targets to be sent to any remote monitoring agent easily. In addition, the performance statistics of the e-services collected during the health-check provides useful hints fortrouble shooting or performance tuning. To achieve this, the remote agent should be able to send statistics information back to the central server in a timely mannerso that the administrators can instantly acquiresuch information and make quick decisions as appropriate.For instance, the remote agent should allow for early identification of performance bottlenecks and faults by the administrator. With this information, the administrator can take corresponding actions or properly propagate the findings to the right party.

Apart from the free trace-route community[6] that provides network latency information, there are commercial services like Internet Seer[5], WebsitePulse[3],and Siteseer[4],whichfacilitate remote website monitoring from different points of presence in different geographical locationswith added capabilities such as simulating end-users’ request and performance monitoring. These servicesare priced at the upper portion of the enterprise level, which simply means it is inaccessible by a large percentage of organizations. On the other hand,service providersofferingmoderately priced services only provide simple services like issuing a single HTTP request on a particular URL without checking on the content nor simulating user requests through a list of URLs. Therefore, these services cannot accurately detect when a website goes down.

To accurately obtain the health status of a website, we will need to simulate user interaction (e.g., putting item into shopping cart, proceed to checkout, and then cancel, etc.). Verifying the content returned for each (or at least some) HTTP request is equally important in order to ensure the website is replying the user in full. Similarly, the sequence of Web services defined by a BPEL specification should be traced accordingly for programmatic interactions. Along with the above checkpoints, we can also ensure there areno man-in-the-middle attack exerted between our e-service and the user located in different geographical location. Furthermore, checks can be completed to determine if if the target e-service is being tempered with or even being hijacked to avoid the content being replaced by hackers.

Conceptual Modeling

In this section, we highlight the conceptual model of the information sent from the administration server as well as that from the monitor agents. We also highlight the processthat facilities the over e-monitoring system design and implementation.

3.1.Information sent from the administration server

Figure 2. ER diagram of server-end information

Firstly, we have to clearly define what information the remote monitoring agents need to know in order to execute their tasks. At the top level, we need to define the entity “site” which consists of at least one “step”. Then, we need to specify what information needs to becommunicatedto the server and the ideal response. We also define the interval, on a global scale, that the remote monitoring agent should be performing during the health check. Moreover, we need to define the email address of the corresponding personnel that the remote monitoring agent should report to in case of problems or irregularities.

Further, each site is assigned a unique ID and a unique name for identification. Step is an entity to record the activities and results of each physical step that the monitor takes. Each step is identified by a unique ID and a name. The StepPosition records the position of the step in the e-monitoring process.For instance, step position 1 denotes login. A Step entity also contains the URL of the server at which the remote monitoring agent targets. The method (e.g., GET, POST, and HEAD) that the remote monitoring agent performsis specified at the RequestMethod. For websites, FormValue contains the form variable and value that the remote monitoring agent should be posting to the server (e.g., user id and password for a login request). ExpectedResponseCode states the expected HTTP Response Code by the remote monitoring agent. ExpectedMD5 holds theexpected MD5 checksum of the response messageso that the remote agent can verifythe content integrity. For Web services, the service, port, operation, input / output parameters and their values are used instead.

3.2.Information sent from the remote agent

Figure 3. ER Diagram of remote-agent information

Apart from what is mentioned above, the remote monitoring agentscollect valuable information (cf. Figure 3) to be transmitted back to the central administration server. To help performance tuning, the roundtrip time for each of the steps (i.e., each HTTP request and response) is the most important information. In addition, the HTTP header for each response contains hints on whether there is a reverse proxy between the website and the client. The HTTP header can also help identifyif the user is actually getting the reply from the target server or third parties who might have intentionally hijackedit. Note that each server addsin a HTTP header about the product (e.g., Weblogic, Websphere, or Tomcat) and its version in use. In addition, we can also tell if the content is being poisoned by cross checking against the MD5, calculated with the information sent from the central administration server. Moreover, we can send the response code, messages, and MD5 calculated back to the central server for recording purposes. Finally, we add the IP address of the remote monitoring agent so that we know which agent has sent back such information and in turn know the possible performance bottleneck in different network segmentsand even geographical locations.

The remote monitoring agent would need to make verbose (de-normalize) some of the information before sending back to the central server in order to avoid confusion caused by the discrepancy of information being stored in the central administration server and the remote monitoring agent(s). So, the central administration server only needs to store information that is retrieved from remote monitoring agent into one single entity. With such de-normalization we can also increasethe speed byperformance monitoring in the central administration where the need to join large tables is eliminated.

3.3.Processes

The e-monitoring system is subject to three main processes: monitoring, statistics updating, and site list refreshing, which are depicted by Figures 4 to 6, respectively.

System Design and Implementation

In this section, we will first illustrate the high level architecture of the overall design. Then we will discuss each of the components and layers of the system. Throughout this section, we evaluate different alternative technologies and explain why we selectedit for the implementation.

Figure 7. Deployment Diagram

4.1.Overall Design Criteria and System Architecture

In this project, we aim toselect as many free components as possible in order to reduce the cost. We avoidedcoding as much as possible in order to reduce the time for development. The coding selected is publicly available libraries. However, we also avoid using librarieswithout open sources to reduce the overall complexity and hence interoperability problems or even the dependency on a specific Java Development Kit (JDK) version. Figure 7 is a deployment diagramdepicting the distribution requirement of this system, which is helpful for thefollowing sub-sections

4.2.Front End Monitoring Agent

The core function of a front end monitoring agent is to first obtain a list of targets to be monitored from the back-end central administration server. It then checksthrough the list of URLs obtained with the supplied information. Finally, the monitoring agent sends any statistical information collected back to the backend central administration server. We have chosen to delay the alert back to the central administration server in order to avoid false alarms. Normally, this will not cause information loss unless the central administration server also goes down simultaneously.

Language chosen - One of the core requirements of the monitoring agents is the ability to be deployed to as many platforms as possible without the need of rewriting any portion of code. Therefore, we have chosenJava, which is a platform independent language.Moreover, Java is increasing its popularity where more and more free components are available so that development time and effort can be reduced.

Multi-threaded design - Monitor agents mustmonitor multiple targets at the same time. If the HTTP requests are issued one by one in a single threaded serial mode, it would be difficult to promptly check each target in a pre-defined time interval. This is because the response time of each of the HTTP requests would affect many factors and it would inturn push back the start time of subsequent monitoring tasks. In addition, the statistics collected from different monitoring agents could not be used to compare and visualize the network latency at the particular time slot as they neither happenedat the same, nor around the same timeslot. In order to collect statistics information for apple-to-apple comparison, we would need to issue variousHTTP request at the same time, or at least try to reduce the start time between issuing the first HTTP request for different sites.

Since, the time required for creating and forking a new thread is quite significant, we still need to carefully design a proper thread pool or to select a freely available thread pool implementationin order to minimize the time delay between each HTTP request. A simple Google search ofthe keyword “free Java thread pool” returns more than 200,000 hits with many examplesand even free Java multi-thread pooling library.However, many of them are either under maintenance[10], or are already outdated [9], or commercial products[8]which do not satisfy our requirement.Fortunately, there is a thread pool implementation available in JSDK 1.5[11] where we can choose and it is free. In addition, it has all the required features for a multi threading system including thread pooling, atomic variable, locking and semaphore, etc.

Checking the targets - In order to accurately check the healthy status of as manytargets as possible, the remote monitoring agent should have the following capabilities.

Issue HTTP/HTTPS GET and POST request
Send HTTP variable and value together with HTTP/HTTPS POST request
Simulate user request by issuing a list of predefined HTTP request together with HTTP variable and value
Memorize and handle HTTP session of various language (e.g., with sites written in Java, session information will be identified by a cookies named JSESSIONID)
As Web services invocations are just SOAP messages sent over HTTP protocol, the communications are also adequately supported. However, to increase the performance, the SOAP message translations and templates can be performed beforehand and stored for repeated transmission.

HTTPRequest layer- We have developed a prototype using the HTTPURLConnection from Java and found that the default HTTPURLConnection is not working very well with multi-threading environment. In addition, it lacks some sophisticated featureswith HTTP session management, cookies support, etc. Further, in comparison toOakland software[12], we found that the anther free HTTPClientcompletedmost of the tasks like the HTTPClient innovation [14]. Unfortunately, the library is a bit outdated (last modified in 2001). There are commercially available libraries like the one offered by Oakland software [13]with rich features and active support.But again, as we are trying to develop the system with minimal cost, we shall avoid any commercial library unless we have no other options.Luckily, the HttpClient from Apache Jakarta Common project [15]has come out on February 2005with all the features that we needed and it is releasing RC2.Most importantly, it is free and being backed by the Apache Jakarta Project Common where support can be sought easily and the project quality is quite good.Thus, we have chosen to implement the HTTPRequest layer using this library.

User interface (UI)layer - In order to reduce the possible rate of change, we implemented the front end monitoring agent with Java GUI. There are two mainstream Java GUIs available, namely, Standard Widget Toolkit (SWT) [16] and Swing[21].SWT is a sub-project of the Eclipse project initiated by IBM. Its main objective is to create a feature-rich GUI. In addition, its core design objective is a close binding with the underlying operating systemnative API [17]. With this, they attempt to boost the performance of the Java GUI and create an outlook as native in that platform as possible.Swingis the second generation of the Java GUI (Successor of AWT) of Sun Microsystems. The overall design has a clean separation between the GUI components and model objectsaccording to the Model-View-Controller (MVC) design pattern. In addition, it tries to emulate different platform-specific components with lines, rectangles, and text. Thus, it runs a little bit slower than native APIs. It its early history, before JDK 1.4, it was quite slow, ugly, and buggy,. However, both the outlook and the speed have been greatly improved in JDK 1.5. Swing is chosen for the UI Layer because we found that SWT and Swing perform roughly the same. In addition, Swing seems to be the only alternative when we need to reduce the cost of ownership as it is bundled in the Java JDK 1.5 distribution.

Database Persistence Layer - The support of“container managed persistence” is common to Enterprise Java Been(EJB) applications whereby the EJB container can help in managing database persistence and dramatically reduce the amount of work required. Unfortunately, one of the core requirementsof the monitoring agent is the ability to be run in a stand-alone mode and thus no container (neither J2EE nor web container) will be installed. There are some Java libraries which can provide similar functionality. The ideal case would be to reuse the same database persistent related code on both the server and client application to reduce the learning, development, and maintenance curve.We identified the following requirements for the data persistence layer and studied some free products: support transaction management; facilitate mappingsbetween Java objectsandrelational data models (ORM); provide connection management; exempt the need to write low level Java Database Connect (JDBC) code; and allow codewritten for one application to be portedto another application under different environments.