Collaborating with Petaobjects:
The TJNAF Virtual Experiment Environment
Executive Summary
This proposal seeks to develop a key collaboratory environment for a central DOE application –modern sophisticated high-energy and nuclear physics experiments. The primary goal of this project is to develop a collaborative, problem-solving environment so that everything associated with the Hall D experiment being planned for the Thomas Jefferson National Accelerator Facility — the entire complex, evolving network including the detector, experimental measurements, subsequent analyses, computer systems, technicians, and experimenters—can be integrated into a simple, collaborative fabric of information resources and tools. This Virtual Experiment Environment (VEE) will make it possible for groups of distributed collaborators to conduct, analyze, and publish experiments based on the composition and analysis of these resource objects. The scientific focus of this project is an experimental search for gluonic excitations among hadrons produced in photoproduction with the ultimate goal of understanding the nature of confinement in quantum chromodynamics, i.e., why are quarks are forever bound in the hadrons of which they are constituents.
Our approach has several key and novel features and is designed to address issues coming from both previous research and a detailed analysis of major commercial tools in the collaboration and object management area. The foundation of this system is the Garnet Collaborative Portal (GCP), which uses an integrated distributed object framework to specify all the needed object properties, including their rendering and their collaborative features. We support both desktop and hand held interfaces. We build on our Our existing Gateway system is being integrated into GPC to providefor a computing portal supporting collaborative job preparation and visualization. GCP is implemented in a Grid Service framework that includes some key ideas including the systematic use of small XML based objects just containing the necessary meta-data to allow scalable management and sharing of the quadrillion objects. This is implemented as a Web environment MyXoS controlled by XML scripts initially built using RDF. We address high performance at several levels from the design of the object system to the use of a reconfigurable server network to support the Grid message service and peer-to-peer network on which MyXoS is built. A single publish subscribe message service extending the industry standard JMS supports synchronous and asynchronous collaboration. A hierarchical XML schema covers events, portalML (user view) and resourceML (basic resources). The proposal combines innovative research into these issues tested by a staged deployment allowing for careful evaluation and user feedback on the VEE functionality, preformanceperformance and implementation.
This project is critical to the Hall D scientific and computing efforts and needs to begin as soon as possible. Not only does itIt will help establish an overall structure to the organization of the Hall D computing efforts, it makes it practical for the Hall D collaboration to create an efficient grid-computing environment that reduces computing costs and attracts collaborators. The early application of the VEE concept to simulations and first-pass analysis within Hall B will allow us to make an important step in improving Hall B’s computing environment. The current computing and data management effort in Hall B at TJNAF faces challenges similar to Hall D, (at roughly 1/10 the expected Hall D rate). except that it is currently in operation and generates experimental and simulation results at the rate of approximately 300 Tbytes per year, roughly a factor of 10 below the expected rate for Hall D. The similarity of the computing and collaboration needs in Halls B and D provide an opportunity to support both efforts simultaneously. The early application of the VEE concept to simulations and first-pass analysis within Hall B will allow us to make an important step in improving Hall B’s computing environment.
The FSU, Indiana University principle investigators include the physics (Alex Dzierba) and computing leadership (Larry Dennis) of the Hall D experiment with expertise of Geoffrey Fox, who leads the design and implementation of the collaborative environment and has participated in several major high-energy physics experiments. This team is working in partnership with Jefferson Lab personnel to define and create the Hall D computing environment at Jefferson Lab. Florida State University has provided significant matching support (~10%$240,000) for this project.
The effort required to successfully complete this three-year project is significant. We have adopted a very aggressive schedule in order to provide this problem-solving environment as early as possible. There are several reasons for this: this system needs to be in place early enough so that physicists can develop additional software tools that work with it, ; there is a significant computing effort that needs to take place before the experiments begin and the effectiveness of this effort relies on feedback from scientists using the VEE.
Collaborating with Petaobjects:
The TJNAF Virtual Experiment Environment
1. Introduction
This proposal seeks to develop a novel collaboratory environment for a central DOE application – a modern sophisticated high-energy nuclear physics experiment. The work will contribute directly to computer science research – in particular the nature of collaboration services required for Grid based applications. Further there will be direct benefit to nuclear physics as it will develop new approaches to both experimental control and analysis software and indeed to the operational model for the large worldwide teams that are needed today. Finally there will be contributions from the integration of the computer science and physics research; we believe that the application requirements are critical input to research in collaborative systems – we need to know what objects to share and in what fashion. Here we note that although this is a research proposal, we will develop a collaboratory that is robust and functional so that the physicists can and will use it. Lessons from this use will be a major driving force for the computer scientists.
The FSU, Indiana, Jefferson Lab team brings together the physics and computing leadership of the Hall D [1] experiment. Dzierba (Indiana) is the scientific leader of the Hall D project and Dennis (FSU) is a member of the Hall D collaboration board and leader of the Hall D computing group. Further Fox, who leads the design and implementation of the collaborative portal, participated in several major high energy experiments including two at Fermilab (E110 and E260) where as a physicist he led the analysis and simulation activities, writing most of the software and collaborating with Dzierba while they were both at Caltech. Riccardi, a computer scientist who has been a member of the Hall B collaboration for approximately 10 years, was instrumental in creating databases for recording online, analysis, simulation, and calibration information. Erlebacher from FSU’s new school of Computational Science and Information Technology will lead the work on hand-held interfaces and collaborative visualization; he will be a major participant in developing XML infrastructure.
1.1 Experiments at Jefferson Lab
The physics experimental program is a search for gluonic excitations among hadrons produced in photoproduction with the ultimate goal of understanding the nature of confinement in quantum chromodynamics, i.e., why are quarks are forever bound in the hadrons of which they are the constituents. This search is being planned for Hall D at the Thomas Jefferson National Accelerator Facility (JLab), which like many modern scientific endeavors will produce large volumes of complex, high-quality data from many sources. These data include experimental measurements, information on the state of the experimental conditions, information describing the status and results of data analysis, and simulations of the detector required to understand its response during the experiment. The exploration of physical phenomena with Hall D depends critically upon our ability to efficiently extract information from the large volume of distributed data using a complex set of interrelated computing activities. This experiment is brand-new but relatively near-term; it can be designed to use new and different methodologies and we expect to see immediate benefits and feedback on this. The experiment will not take data for about 5 years but already simulations are being run and hardware and software decisions being made. The lessons from this work will be broadly applicable to physics experiments in the nuclear and high-energy areas and we will demonstrate this by applying some of technology to the existing Hall B.
Hall B at TJNAF faces similar challenges, except that it is currently in operation and generates experimental and simulation results at the rate of approximately 300 Tbytes per year, roughly a factor of 10 below the expected rate for Hall D. The similarity of the computing and collaboration needs in Halls B and D provide an opportunity to support both efforts simultaneously and give us the opportunity to refine our middleware services in a production environment.
This project will develop a set of middleware services that create an environment in which everything associated with the Hall D experiment—the entire complex, evolving network including the detector, experimental measurements, subsequent analyses, computer systems, technicians, and experimenters—will be integrated into a simple, collaborative fabric of information resources and tools. The resultant Virtual Experiment Environment (VEE) will make it possible for groups of distributed collaborators to conduct, analyze, and publish results from experiments based on the composition and analysis of these resource objects.
One advantage of a new project like Hall D is that one can decide to build the entire infrastructure around new ideas without worrying too much about legacy concepts. Applications within the VEE will accept only properly described distributed objects as input and produce corresponding objects as output. While it is possible to achieve this from the start for Hall D, the conversion to distributed objects in Hall B will focus on particular applications, such as simulations and first pass analysis. We estimate that the Hall D activities will create about 1015 distinct objects for each year of its operation with micro-objects like detector signals and processed information like track segments and particle identifications being grouped into larger event objects. In addition there will be objects describing reports and presentations and the information needed to specify input and output of simulations initiated by any of geographically distributed researchers involved in the experiment.
1.2 Overview of the Technical Approach
Our approach has several key and novel features that have designed to address issues coming from both previous research [2] and a detailed analysis [3] of major commercial tools in the collaboration and object management area. We are building a system that provides a web interface to access and manipulate the Hall D objects and further allows this to be done collaboratively. This capability is formulated as the Garnet Collaborative Portal (GCP), which uses an integrated distributed object framework to specify all the needed object properties including both their rendering and their collaborative features. This builds on our existing Gateway system [4] for a computing portal where this is being integrated into GCP with collaborative job preparation and visualization. Sharing a complex object is difficult and systems not designed from scratch to integrate all object features will not be as effective. Including rendering information in an object’s description allows one to customize to different clients and so build collaborative environments where one share the same object between hand-held and desktop devices.
We describe the technical approach in section 3 but give highlights here. We assume that we are building on a computational grid infrastructure and so can layer our high level services on top of the capabilities under development by projects such as the Particle Physics Data Grid [5] and GryPhyN [6-7]. Users, Computers, Software applications, Sessions, and all forms of information (from physics DST’s to recording of audio/video conferences) are all objects, which can be accessed from GCP. We estimate that after aggregation of the logged events into runs, we will need to handle around several tens of millions of explicit objects. These will all self-defining; namely make explicit all the necessary metadata to enable GCP to perform needed functions such as searching, accessing, unpacking, rendering, sharing, specifying of parameters, and streaming data in and out of them. This metadata is defined using a carefully designed XML schema GXOS and exploiting the new RDF framework. Typically GCP only manipulates the meta-objects formed from this metadata so that we build a high performance middleware that only performs control functions. This idea has been successfully used in our Gateway computing portal. The XML meta-objects that define the GCP point to the location of the object they define and can initiate computations and data transfers on them. Objects can be identified by a URI and referenced with this in either RDF resource links (such as <rdf:description about=”URI”..) or fields in the GXOS specification. Three important URI’s are the GXOS name such as gndi://gxosroot/HallD/users/…, and the web location of either the meta-object or object itself. All objects in GXOS must have a unique name specified in a familiar (from file systems) hierarchical syntax.
Our software is largely written in Java (using Enterprise Javabeans in the middle tier) but Java/XML is only the execution object model of the meta-objects; one can load persistently stored meta-objects or control target base objects formed by flat files, CORBA, .net (SOAP) or any distributed object system to which we can build a Java gateway. Our successful Gateway computational portal has used this strategy already; here all object interfaces are defined in XML but CORBA access is generated dynamically. Further this system also only uses meta-objects and invokes programs and files using classic HPCC technology such as MPI. This strategy ensures we combine the advantages of highly functional commodity technologies and high performance HPCC technologies.
1.3 Collaboration Technologies
GCP uses the shared event model of collaboration where these events use the same base XML schema as the meta-objects describing the entities in the system. The uniform treatment of events and meta-objects enables us to use a simple universal persistency model gotten by a database client (shown in figure 1) subscribing as a client to all collaborative applications. Integration of synchronous and asynchronous collaboration is achieved by the use of the same publish/subscribe mechanism to support both modes. Hierarchical XML based topic objects matched to XML based subscribing profiles specified in RDF (Resource Description Framework from W3C) control this. Topics and profiles are also specified in GXOS and managed in the same way as meta-objects. These ideas imply new message and event services for the Grid, which must integrate events between applications and between clients and servers. This GMS (Grid Message service) will be one major focus of our research. One extension of importance GMSME (GMS Micro Edition) handles messages and events on hand held and other small devices. This assumes an auxiliary (personal) server or adaptor handling the interface between GMS and GMSME and offloading computationally intense chores from the handheld device. Currently we use JMS (Java Message Service) to provide publish/subscribe services for events in our prototype GCP but have already found serious limitations that we will address in GMS. The event based synchronous collaboration model handles both the classic microscopic state changes (such as change in specification of viewpoint to a visualization) but also the transmitted frame-buffer updates for shared display which our experience has shown to be the most generally useful sharing mode for objects. We also support shared export where objects are converted to a common intermediate form for which a powerful general shared viewer is built; shared PDF, SVG, Java3D, HTML and image formats are important export formats.