Web Page Caching in Java Web Applications

David A. Turner

Department of Computer Science

CaliforniaStateUniversity,San Bernardino

San Bernardino, CA92407

()

Abstract

Many Web applications serve pages that change infrequently, such as catalog pages within a shopping application or a calendar of events on a university web site. Such pages change only when a staff member logs in, and updates data on which the pages are based. Without caching, requests for such pages result in redundant read operations on the database and generation of HTML from a collection of HTML generating components. In this paper, we explain how we decrease this processing overhead in Java Web applications by caching infrequently changing Web pages.

Keywords: Web page caching, patterns for software design, software architecture

1. Introduction

The Computer Science Department at California State University San Bernardino has been increasing the number of paid internship projects in the area of Web application development for its undergraduate and master’s degree students. We developed an online application system for the University’s College of Extended Learning, and a grant management system (similar to NSF fasttrack) for the Office of Technology Transfer and Commercialization. Currently, we are developing a new project for a quasi government organization called Active Capital, whose mandate is to promote angel investing in small businesses. Through these project experiences, we have honed our abilities to solve Web development problems. The purpose of this paper is to describe a technique of Web page caching that we developed to improve the performance of our applications. To do this, we provide an overview of architecture, and then explain specifically how we implemented page caching in this context.

There are three principle benefits to caching Web pages. First, the response time is greatly increased, so that site users experience a user interface with minimized latency. Second, site capacity is increased without adding new hardware, because returning cached web pages reduces consumption of processor resources. Third, caching infrequently changing Web pages supplants the need to cache persistent objects to eliminate the bulk of redundant read operations on the database.

In our systems, we use the data access object pattern [1], which allows us to classify some objects as persistent, and provides us with a consistent interface to persistent services needed to manage them. We also follow a Model/View/Controller architectural design pattern [2].

The remaining content of this paper is organized as follows. Section 2 describes how we use the data access object pattern for implementing object persistence. Section 3 describes how we implement the view logic (of the model-view-controller architectural pattern). Section 4 describes how we implement the controller logic. Section 5 describes how we implement the business logic. Section 6 describes how we extended the persistence system and the view logic to accomplish web page caching. Section 7 describes a concurrency issue, and section 8 provides a conclusion.

2. Persistence Logic

We use the data access object design pattern to isolate interaction with the database to singleton instances of subclasses of a DataAccessObject parent class [1]. Each DataAccessObject provides persistence services for managing instances of a single persistent class, which is a subclass of PersistentObject. The parent PersistentObject class contains protected member variable called id of type Long, which is used to identify a persistent instance of the class. Subclasses inherit this member and its getter and setter methods. For example, UserDAO extends DataAccessObject, which manages instances of User, which extends PersistentObject. The methods provided by UserDAO are typical of all DataAccessObjects, and are as follows:

static UserDAO getInstance()

User find(Long id)

User findByUsername(String username)

Collection findAll()

Collection findAllByRole(String role)

void update(User user)

void create(User user)

void delete(User user)

The benefit of the data access object pattern is that all interaction of the database is isolated to the DAOs, so that other parts of the code can deal with persistent objects. A persistent object is an object instance that can be retrieved from the store by providing its id, or by asking for it in some other way, such as the UserDAO’s findByUsername method. The state of a persistent object is persisted by passing a reference to the object into the update method of its DAO.

To reduce redundant read operations on the database, we cache recently used persistent objects in their DAOs, and return references to these objects in response to the find operations. However, page caching also reduces redundant reads on the database, and so page caching supplants much of the benefit of persistent object caching.

3. View Logic

We generally follow the model-view-controller pattern [2], which separates program logic into a model (which maintains the state of the system), a view (which renders various views of the state of the system to the user), and a controller (which applies the appropriate business logic in response to user input). The previous section on object persistence comprises the model logic of the system; in this section, we describe the view logic.

The view logic is spread across the JSP files and subclass instances of a PageHandler class. Each PageHandler object is responsible for delivering a single Web page to the user. When processing of the user request has completed, the system passes execution to one of the PageHandlers. The PageHandler in turn sets up an environment for a particular layout JSP, and then forwards execution to it. Among its duties, a PageHandler does the following: (1) obtains references to persistent objects needed by the JSP to generate an HTML document, (2) constructs a vector of MenuItem objects that will be used by the JSP to construct a menu, and (3) specifies which content components are to appear in the laid out page.

As an example, VisitorHomePage subclasses PageHandler, and is responsible for returning the home page of a non-logged-in user. The VisitorHomePage contains a process method, which is called either by AdminController or by an ActionHandler. The visitor home page is a page that is frequently requested and infrequently modified, so it is a candidate for page caching.

Figure 1: PageHandler classes

The last action performed by a PageHandler is to forward to a RequestDispatcher that passes control to a JSP responsible for layout of the page. The purpose of the layout JSP is to build the menus and layout the page. The layout JSP passes control temporarily to one or more content components as it completes its task of page generation. This process allows us to isolate the layout code in a few layout JSPs, so that global changes to the site’s look and feel can be accomplished easily.

5. Controller Logic

Typically, users of Web applications are classified into roles, which define the types of operations that can perform at a site. We use a separate ControllerServlet for each user role. For example, if a Web application has the two roles of admin and visitor, then we have the two controllers: AdminController and VisitorController. The Web container first determines a context path from the request URI, which it uses to route the request to the appropriate Web application. Following the context path is the role name of the user, which maps to a corresponding ControllerServlet.

Figure 2: Controller Classes

When the ControllerServlet gets the HTTP request, it looks for a parameter embedded in the URL that identifies a Handler. If the Handler is a PageHandler, there is no business logic to execute: we simply need to return a Web page. If the Handler is an ActionHandler, we need to execute business logic, which changes the state of the system, before we return a page. Both PageHandlers and ActionHandlers extend Handler, which has a method called process that the ControllerServlet invokes to pass control to it.

5. Business Logic

We use the term business logic to represent code that changes the persistent state of the system, which is the aggregate of all persistent objects. We use the term persistent object to represent an object that exists outside of a particular instance of the JVM in which the application runs. Although the data comprising the state of persistent objects are typically stored in a database, this is not a concern of the business logic. However, the business logic is concerned with managing the life cycles of persistent objects (retrieving, creating, modifying, deleting and saving) through their corresponding DataAccessObjects.

The application’s business logic is contained in the subclass instances of ActionHandler. For example, we have a class called EditStoryAction, which extends ActionHandler and is responsible for modifying persistent instances of the Story class. When the process method of an EditStoryAction completes, it passes control to the ViewStoryPage, which renders a view of the newly modified news story.

6. Web Page Caching

When a non-logged-in user (a visitor) visits the main page of the site, she sees a summary of news items. The class responsible for returning the HTML comprising this Web page is a VisitorHomePage. Because this page is infrequently changed and frequently requested, and because it contains no user-specific information in it, we maintain a cache of the HTML document that it sends to the user. This avoids the processing overhead of invoking the findAllStory of the StoryDAO and executing the JSP.

All pages that we wish to cache implement UpdateEventListener. (See Figure 3.) We describe how caching works by explaining what happens when the process method is called on the VisitorHomePage.

Figure 3. CachablePage extends UpdateEventListener

VisitorHomePage contains a variable called stale,which is initialized to true. When the process method of VisitorHomePage runs, it checks to see if stale is true. If it is true, it wraps the HttpServletResponse object passed into it with an instance of the ResponseInterceptor class, and then forwards this to the JSP. The ResponseInterceptor extends the HttpServletReswponseWrapper class, and overrides its getWriter method by returning its own instance of a PrintWriter. In fact, the PrintWriter that it returns wraps a StringWriter, which simply accumulates in a StringBuffer all output written into it. In this manner, we intercept the HTML document that the JSP would normally write into the OutputStream comprising the HTTP response message. When the JSP returns, the process method extracts the contents of the intercepted HTML from the ResponseInterceptor, and stores a reference to it in a variable called page. It also sets the value of stale to false.

Regardless of whether the value of page has been refreshed or not, the process method of VisitorHomePage ends by writing the String page into the PrintWriter of the HttpServletResponse object that was passed to it.

The value of stale is set by the DAOs that the VisitorHomePage depends on by calling the updateEvent method that VisitorHomePage implements. The VisitorHomePage ensures this will happen by calling from its constructor the registerUpdateEventListener method of the StoryDAO, passing into it a reference to itself. The StoryDAO maintains a vector of UpdateEventListeners, which it iterates through, calling their updateEvent methods, whenever the update or create methods of the DAO complete. StoryDAO inherits the registerUpdateEventListener method from the UpdateEventGenerator class. (See Figure 4.)

Figure 4. StoryDAO implements UpdateEventGenerator

Some PageHandlers generate several different versions of a page based on a parameter that is passed to them. If the parameter takes on a small number of values, then the PageHandler uses a HashMap to map all possible values of the parameter to cached versions of the page. If the parameter takes on a large number of values (such as a user id), then the PageHandler will limit the number versions of the page that it caches by purging least recently used pages after reaching full capacity. This can be easily implemented by sub-classing the LinkedHashMap data structure provided in the core Java API.

7. Concurrency Issues

Because Web applications are multi-threaded, concurrency issues need to be considered in general. For page caching, it is possible that a stale version of a page is added to the cache under the right conditions. To see this, consider the following scenario. Thread A checks the stale variable and sees that it is true, therefore thread A branches into logic that will update the page cache. Thread A generates the new page, and then sleeps before setting stale to false. Thread B updates the data on which the page depends, and sets stale to true. Thread A wakes ups and sets stale to false. At this point, the PageHandler contains a stale version of the page, but the stale flag is false. To avoid this outcome, we synchronize on an object so that a thread that sets the stale flag to true can not run while another thread is generating the page (and setting the stale flag to false).

8. Conclusion

We added the page caching mechanism described in this paper to improve the response time and capacity of our Web applications. The technique appears to make our Web applications run better. In the near future, we will try to quantify the performance gain by measuring response times and throughput under simulated traffic.

References

[1] Deepak Alur, John Crupi, Dan Malks. Core J2EE Patterns. Prentice Hall PTR, June 2001.

[2] Trygve Reenskaug. The Model-View-Controller (MVC) Its Past and Present Java Zone, Oslo, September 2003.

[3] Jason Falkner. Two Servlet Filters Every Web Application Should Have, OnJava.com, Nov 19, 2003.

[4] Mehmet Altinel, et al. DBCache: database caching for web application servers.Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 2002.