DOE/ER/40712

Relativistic Heavy Ion Experimental Physics

Supplemental proposal

August 2007

Department of Physics and Astronomy

Vanderbilt University

Nashville, Tennessee 37235

Prepared for the United States Department of Energy Under

Grant No. DE-FG05-92ER40712

  1. Introduction
  2. Current Roles in the PHENIX Detector Collaboration
    The Vanderbilt University group in relativistic heavy ion physics is a charter member of the PHENIX experiment collaboration operating at the Relativistic Heavy Ion Collider (RHIC) located in Brookhaven National Laboratory. Our members continue to make important contributions to the PHENIX program with leadership roles on the Executive and the Detector Councils, and as Conveners in the Physics Working Groups. For the baseline detector operation we brought online the Pad Chamber sub-detector and the simulation software system. Most recently, one of us led the team that commissioned the highly successful TOF-West subdetector for Run7, making use of an innovative and cost-effective technology to extend dramatically the momentum range of particle identification in PHENIX.
    Also in Run7 members of the Vanderbilt group demonstrated for the first time in PHENIX the capability to do at a remote site the near real-time event reconstruction of large volumes of minimum bias raw data transferred over the Internet with GridFTP tools. This project has enabled the offline reconstruction for Run7 to proceed at a more rapid pace in advance of next year’s Quark Matter meeting.
    Taken together, all of these efforts confirm that our group is committed to ensuring that PHENIX maintains its outstanding record of achievement in the RHI field of physics. Indeed, this supplemental proposal is largely motivated by the goal of preserving the research productivity of the PHENIX physics program.
  3. Outline of Supplemental Request
    This supplemental proposal arises out of the special circumstances in place during December 2006 at the time of our three-year proposal renewal. In fact, the idea of writing a supplemental proposal in 2007 to address these special concerns was first mentioned in those discussions. At that time the Department of Energy was operating under a continuing resolution that mandated cuts in all university programs. These cuts were applied just as our responsibilities in the PHENIX program were increasing in two significant areas, as mentioned above. We took on the vital tasks of building and commissioning the TOF-West sub-detector in time for Run7, and of reconstructing large amounts of PHENIX Run7 raw data as early as possible. The TOF-West detector was urgently needed in PHENIX so that the experiment would remain competitive with, and even exceed, the new TOF capabilities of the STAR detector.
    Likewise, the recent great success, which the PHENIX DAQ system has shown in taking advantage of the increased RHIC luminosity, has led to an enormous growth in the volume of raw data acquired, close to 650 TBytes in Run7 and a factor two more data compared to Run6. These volumes of data have placed a serious strain on the offline reconstruction system such that this system is in danger to falling behind in processing the acquired data. Hence, the PHENIX management has recognized that it should take advantage as efficiently as possible the offers of help from remote site computing centers such as the one at Vanderbilt University.
    While we are pleased to say that we fulfilled these new roles in PHENIX with admirable results during Run7, we also realize that we cannot continue to operate in this manner without rectifying the consequences of the budget situation in December 2006. The travel budget that we were given for 2007 was cut to an amount that was inconsistent with our maintaining a presence at RHIC during Run7 for the commissioning of the TOF-West detector. Similarly, no extra capital funds were provided at all for the new task of reconstructing substantial amounts of minimum bias raw data from Run7. That work was made possible only by the generous contributions of disk space (45 TBytes), CPUs, and extensive technical help from the computer center at Vanderbilt. This free support was donated with a view to proving that Vanderbilt University was completely capable of providing this new service to PHENIX, and that future efforts along these lines could be funded by the Department of Energy in a cost-effective manner.
    In brief then, this supplemental proposal is intended 1) to restore our grant’s travel budget to a level more consistent with the role we have for the TOF-West detector, and 2) to provide new capital and operating funds that will enable us to carry out an expanded role in the near real-time reconstruction of PHENIX data in the next three years.
    There are other developments worth noting with respect to this supplemental proposal. Two of our graduate students, who worked on the Run7 near real-time data reconstruction project, have signed on as deputy production managers in the PHENIX offline reconstruction group. As such, one of these students will be required to be at BNL one week per month for the next near. This additional travel that was not anticipated in our three-year proposal.
    Next, we have formally committed to be in the CMS-HI-US collaboration and we envision that Vanderbilt University will bid to host the site of the collaboration’s HI Computing Center. While the specifications for that HI Computing Center have not yet been finalized, we do believe that Vanderbilt will be able to make an attractive bid to CMS-HI-US, and eventually to the DOE, to be the host site. In part that bid will rely on the experience that we developed in PHENIX in 2007 and will propose to apply that expertise to CMS-HI-US starting in late 2008. In this context, the present proposed expansion of our group’s role for data reconstruction in PHENIX for the first half of 2008 and afterwards would lead straightforwardly to being able to fulfill an equivalent mission for the CMS-HI program in the years to come.
  4. Synopsis of TOF-West commissioning and Results from Run7
  5. Extra expenses incurred and amount requested to maintain group’s effort in FY2008 and FY2009
  6. Proposal for expanded use of off-site computing resources in phenix
  7. Description of current PHENIX computing model for analyzing the data
    The first stage in the PHENIX computing model for data analysis begins with the raw data initially being written by the data acquisition system to buffer disks in the 1008 counting house at RHIC. There are four of these buffer disk systems which can each hold about one day’s worth of acquired data. During the time when these data files are disk-resident in 1008, calibration modules process them in order to extract run-dependent calibration constants for the subdetectors. These calibration constants will be used in the eventual reconstruction of the data. After four days, in order to make room for newly acquired data, the data on a buffer box is drained into the HPSS archive at the RHIC Computer Facility (RCF). The data will then not be read for event reconstruction by the offline software system until after the run is over, and perhaps not even until many months thereafter. The constraint is that during the run the tape bandwidth for writing new data files cannot be compromised by having a significant amount of read requests. Moreover, the CPUs for the offline reconstruction are typically fully occupied during the run with processing raw data or analyzing reconstructed output from prior years.
    It is now well recognized within the PHENIX collaboration that the disk and CPU resources at RCF are insufficient for the reconstruction phase to keep up with the ever-increasing volume of raw data to be available from RHIC in the future. Therefore more effective use must be made of the computing resources offered by off-site PHENIX institutions. As the above description of the raw data flow makes clear, the only way to do reconstruction of the raw data at remote site facilities is to first transfer the data files to the off-site computer farms while the data is still on the buffer boxes. Once the data are in the HPSS it is too late to transfer the files to remote sites while the run is in progress.
  8. History of off-site event reconstruction in PHENIX
    Transfers to off-site facilities of PHENIX raw data began in a major way in the Spring of 2005 with transfers of the p+p data to the PHENIX Computer Center at Riken in Japan (CCJ). The raw data files were put into an HPSS facility at CCJ, and then re-read over the course of the next several months for event reconstruction and analysis. By October 2005 the analysis was mature enough to show in preliminary form at an important spin physics conference.
    There was a second set of more limited data transfers done for the raw data in 2005, this time to a small (60 CPUs) computer farm located at the Oak Ridge National Laboratory. This set of data corresponded to Level 2 filtered files that were rich in events containing the decay of the J/ particle. Moreover, the analysis of these files could be completed in near real-time, even on a small computer farm, because there were relatively few events to process. So for the first time, the managers of the PHENIX experiment and the managers of the RHIC accelerators, were able to see the growth of the J/ peaks in the spectra reconstructed by the PHENIX offline software. In turn this capability enabled these managers to make near real-time decisions affecting the direction of the experiment while the run was still in progress.
    The Level2 J/ reconstruction project was continued in 2006 at the much larger Vanderbilt computer[1] farm since there were insufficient resources with which to continue the work at ORNL. For the 2006 work the Vanderbilt group introduced the use of automated GridFTP software tools for the purpose of returning the reconstructed output back to BNL for inspection by the experiment managers.[2]
    For the Au+Au data from Run7 in 2007 the Level2 project was picked-up by the Ecole-Polytechnique institution in PHEINX, which makes use of the PHENIX Computer Center in France (CCF). There is an HPSS in the CCF facility, and the raw data are being processed over an extended amount of time, much in the same manner as was done at CCJ or at RCF itself.
  9. Run7 event reconstruction preparations at Vanderbilt
    For Run7 the Vanderbilt group requested the opportunity to reconstruct a 10% fraction of the minimum bias data in near real-time, which had never been attempted before at RHIC. This project would involve the transfer of much larger amounts of data, tens of TBytes, and the return to BNL soon afterwards of almost comparable amounts of reconstructed output. Although there are tape archiving facilities at Vanderbilt, we did not want to rely on those for a slow, months’ long reconstruction of the raw data.
    In order to get this proposal approved in PHENIX, we had to secure from the Vanderbilt computer center a loan of up to 70 TBytes of disk space located on a high-speed internal network. This disk space would be used partially as a buffer for incoming raw data files, and partially as a buffer for the outgoing reconstructed files, somewhat analogously to the role of the buffer disks in the PHENIX counting house at RHIC. Furthermore, we needed to set up a dedicated GridFTP server within the computer farm’s firewall to allow for secure transfers from BNL. The computer center at Vanderbilt, which also serves research in the University’s Medical School, operates under somewhat stricter security protocols than do the computer systems at BNL. Lastly, we had to receive a commitment from the computer center that we would have sufficient CPU resources available to use in the Spring 2007 with which to carry out the near real-time reconstruction.
    All of these preparations were in place at the beginning of April 2007 when Au+Au data from Run7 began to be acquired by PHENIX. After a few initial difficulties arising from faulty vendor-supplied disk management software on the GridFTP server, all the hardware and software systems located at Vanderbilt worked without any significant flaws to carry out this project. We received approximately 35 TBytes of raw data from RHIC, and were able to reconstruct most of these files in near real-time as predicted. In fact, our records show that we could have easily processed three times as much data with our Vanderbilt resources had we received that much data.
  10. Expertise gained in the Run7 reconstruction project at Vanderbilt
    The Run7 minimum bias events reconstruction project evolved to be wonderful proving ground for this kind of work. Since the near real-time condition placed a premium on having high processing efficiency, the entire project was completely automated at Vanderbilt. This automation was achieved by means of writing approximately 100 separate PERL scripts operating in tandem at four separate computer facilities: the PHENIX counting house computers, the computer systems in the RCF, the computing systems at the large computer farm at Vanderbilt, and the computer systems in the smaller Departmental farm which are owned by the local PHENIX group. These latter systems performed the monitoring of all the inter-related tasks, culminating in a real-time WWW display of the performance of all the hardware and software components involved in the project.
    Despite the advance level of automation that we developed for this project there were indeed several inefficiencies that occurred. Principle among these was the delay in receiving the offline reconstruction software libraries with which to process the raw data files. Although the raw data started to arrive before mid-April, the first working set of reconstruction libraries did not arrive from Brookhaven until after mid-May. Fortunately there was sufficient input buffer disk space at Vanderbilt to accommodate all the raw data files that arrived in that intervening month’s time. The final set of libraries did not arrive until the first week of June, at which time all the files that had arrived up until that time had to be re-processed.
    This delay in obtaining working reconstruction libraries was something of a special circumstance for Run7. In Run7 four new detector subsystems (including TOF-West) were introduced into the PHENIX detector. The software to reconstruct events using data from these subsystems had not been fully tested before the Run7 data started to arrive. So the months of April and May were largely devoted to debugging offline reconstruction software with the assistance of the output received from the Vanderbilt site. Nonetheless, there is a clear lesson from this experience. If the PHENIX detector collaboration wants to take the best advantage of off-site computer facilities for rapid event reconstruction, then enough time needs to be devoted in advance of the run to assure that the software libraries are not defective.
    A second serious flaw in the process was the lack of a dedicated GridFTP file server at BNL with which to receive the reconstructed output from Vanderbilt. Related to this fact there was not a good organization of the PHENIX file space in RCF to store the output files once they were received. Both of these problems led to a degradation of the transfer rate back to RCF, which in turn caused the automated processing system at Vanderbilt to decrease the throughput of the reconstruction jobs. Before this project began we specified that we would need a 30 Mbytes/second transfer rate of raw data files into Vanderbilt and likewise another 20 Mbytes/second transfer rate of reconstructed output into RCF[3]. While there was no problem in attaining the input benchmark at Vanderbilt, there were repeated difficulties in getting good transfer rates for the output files into RCF. There was simply too much I/O competition on the disks at RCF when we attempted to write output files. These difficulties became so bad that they caused the GridFTP transfer software to sometimes fail despite internal recovery mechanisms in that software to protect against transient outages in the file writing speed. We at Vanderbilt overcame this RCF-associated problem to some degree by writing fault-tolerant transfer scripts that keyed on the failure messages written by the GridFTP software. To avoid this problem for future projects will require some modest repositioning of resources at RCF: setting up a PHENIX-only dedicated input GridFTP server, and reserving an appropriate allocation of PHENIX disk space at RCF to receive the output files in the same manner that we reserved large blocks of space at Vanderbilt to receive the input files. There are other, technically more advanced solutions to this problem, which we will discuss later in this proposal.
  11. Plans for expanded use of ACCRE farm for PHENIX in 2008-2010
    The previous discussions demonstrate that the PHENIX collaboration is in urgent need of expanded computing resources for reconstruction of its raw data, and that expanded capability is being offered by a capable PHENIX institution. In fact, as of this writing (late August 2007), the full-scale offline reconstruction at RCF has processed less than 10% of the Run7 data. And it should be pointed out that because of the Run7 reconstruction project at Vanderbilt in April-June, the full-scale effort at RCF effectively has had at least a two-month head start, including the benefit of better calibrations derived from inspecting the Vanderbilt output files. For this reason, hopes had been diminishing at BNL that there would be the usual number of PHENIX analyses making major impacts at the February 2008 Quark Matter international conference.[4]