Monte Carlo dose calculation algorithm on a distributed system

Stéphane Chauviea,b^, Matteo Dominonic, Piergiorgio Marinia, Michele Stasia, Maria Grazia Piab, Giuseppe Scielzoa

a Medical Physics Division, Ordine Mauriziano-IRCC, I-10128 L.go Turati, 62 Torino, Italy

b INFN, Genoa and Turin, Italy

c DISCO, University of Milan-Bicocca, Milan, Italy

The main goal of modern radiotherapy, such as 3D conformal radiotherapy and intensity-modulated radiotherapy is to deliver a high dose to the target volume sparing the surrounding healthy tissue. The accuracy of dose calculation in a treatment planning system is therefore a critical issue. Among many algorithms developed over the last years, those based on Monte Carlo proven to be very promising in terms of accuracy. The most severe obstacle in application to clinical practice is the high time necessary for calculations. We have studied a high performance network of Personal Computer as a realistic alternative to a high-costs dedicated parallel hardware to be used routinely as instruments of evaluation of treatment plans.

We set-up a Beowulf Cluster, configured with 4 nodes connected with low-cost network and installed MC code Geant4 to describe our irradiation facility. The MC, once parallelised, was run on the Beowulf Cluster. The first run of the full simulation showed that the time required for calculation decreased linearly increasing the number of distributed processes. The good scalability trend allows both statistically significant accuracy and good time performances.

The scalability of the Beowulf Cluster system offers a new instrument for dose calculation that could be applied in clinical practice. These would be a good support particularly in high challenging prescription that needs good calculation accuracy in zones of high dose gradient and great dishomogeneities.

1

1. INTRODUCTION

The new radiological strategy in diagnosing and treating tumours requires a novel approach based on an advanced and integrated technological environment. Besides, the opportunity of using analytical and numerical methods to merge information from different radiological examinations is everyday more useful to define and localise targets [1]. This framework offers the possibility of treating these volumes with high-energy X-ray beams from linear accelerators using advanced radiotherapy techniques such as 3-Dimensional Conformal Radiotherapy (3D-CRT) and Intensity Modulated Radiotherapy (IMRT) [2-7]. Such an improvement in the target volume and organ at risk delineation ought to be supported by a major precision in describing the physical interactions in the human body. The dose distribution inside patient is calculated, in clinical practice, by Treatment Planning System (TPS). They use, as input, the dosimetric characteristic, obtained experimentally, of the radiation machine such as dose profiles and percent depth dose (PDD) curves. The patient anatomy is described in terms of electronic density through the Computed Tomography (CT) data set.

Through different algorithms [8-11] it is possible to calculate analytically the dose inside the irradiated regions. The oversimplification introduced in these methods permits to have a fast dose engine but locally inaccurate [12-14]. The accuracy of dose calculation algorithms is now the main feature of a TPS. MC algorithms [15-17] describe with the same accuracy the beam transport in regions with and without electronic equilibrium: while conventional algorithms work really well only when an electronic equilibrium exists. MC is so relevant in calculating the dose at the interface between different media such as between bone and tissue, tissue and air.

Geant4 [18,19] is a simulation toolkit designed for a variety of applications, including HEP, astrophysics and nuclear physics experiments, space science and medical physics. The Geant4 kernel handles the management of runs, events and tracks. Some characteristic of Geant4 is relevant to be used in a medical domain. A fast parameterisation framework is integrated with the full simulation, allowing independent and simplified geometrical description of the set-up and direct production of hits. Thanks to the power and flexibility of Geant4 geometry, beam sources and patients can be described in the same framework.

Low-energy Electromagnetic [20] is a Geant4 package specialised in low energy electromagnetic interaction of lepton, photon and muon, as well as hadrons and ions. Low energy processes assure good precision down to 250 eV for photons and electrons [21] and 1 keV for hadrons [22] and ions [23]. Geant4 offers a user support team, a quality assurance program based on a rigorous software process and use of standards; in addition a worldwide community of users validates it independently. A point of force is the transparency of the physics implementation that comes both from public domain manuals and evaluated data libraries without nasty hidden formulas. Moreover thanks to his OOT design is extensible and of easy interface to other tools for visualisation and analysis.

To be applied in clinical practice the time needed for calculations with MC methods is the critical point. Until recently the MC calculations on a good workstation required hours or even a day to provide statistically significant results. In the last 15 years the computer technology enhanced the speed of calculation by nearly a factor of 100. This computational power escalation has offered a solution to problems otherwise unaffordable. It is now possible to solve complex calculation on a desktop Personal Computer (PC) whereas in the past powerful mainframe or vector supercomputers were needed.

Moreover since MCs are intrinsically parallel their natural implementation is on parallel machines. The first idea of using an array of normal PCs, as an alternative to supercomputers, was born in the 90s at Goddard Space Flight Centre [24]. The idea was simple, and, with modern tools affordable. High performance networks of Personal Computer (namely Beowulf clusters) are now a realistic alternative with respect to high-costs dedicated parallel hardware.

In next sections we will describe the experience on MC calculations on a prototype of Beowulf cluster that offers parallel processing at a lower cost showing competitive performances. In particular we’ll focus our attention on time performances and cost of the system describing with some detail the overall process of construction so as to make this tool a routine system in a Medical Physics Division of a Hospital.

2. MATERIALS AND METHODS

In March 2002 we set-up a prototype of Beowulf cluster constituted of few nodes. We will go shortly trough the HW installation, SW configuration and the benchmarking of cluster parallelisation. Then we’ll describe MC simulation set-up. Finally we’ll go through the MC parallelisation and run onto the cluster offering, at the end, some preliminary points for discussion.

Work-station / Master / Slaves
CPU / Pentium III 450 / AMD Athlon XP 1500+ / AMD
Athlon XP 1500+
Memory / 256 Mb DIMM / 512 Mb DIMM / 512 Mb DIMM
Hard Disk / Present / Present / Present
Video Card / Present / Present
Ethernet Card / Present / Present / Present
CD-ROM / Present / Present
Floppy Disk / Present / Present / Present
Monitor, key boards, mouse / Present / Present

Table 1: Hardware configuration for the benchmarking machine. The cluster is made of 1 master and 3 slaves.

2.1 Hardware set-up

Each off the shelf PC is a good compromise between computer power and cost (about 1 k€). The requisite to be a slave is: a CPU, memory and fast Ethernet card. The master is more expensive since is equipped to be the interface of the cluster with the outside world and hence is the only one with monitor, double Ethernet card, key-board, floppy, CD-ROM and so on. In Tab. 1 it is possible to see the HW configuration of the 3 kinds of PC’s we used in this work. The PCs have been networked on a private LAN trough a switching hub and the master is also connected to the hospital LAN in order to exchange data and images with TPSs, CTs, Magnetic Resonances (MR) etc…

We installed Linux Red Hat 7.2 [25] with the xfs journaled file system [26]. Linux owns a collection of libraries for parallelisation and multiprocessor environment. We use the MPI/LAM parallel libraries [27] that enable to run a daemon of parallelisation on the cluster using secure shell ssh connections [28]. Message Passing Interface (MPI) is a complete interface for message-based interprocess communication. Local Area Multicomputer (LAM) is an MPI programming environment and development system for heterogeneous computers on a network. With LAM, a dedicated cluster or an existing network computing infrastructure can act as one parallel computer solving one problem. The cluster configuration was refined reorganising authorisation protocols on the nodes and sharing partition on networked file system (NFS).

After the HW set-up we were ready to start to run simple C++ programs on the cluster to both verify its correct behaviour and prove the scalability. In the Fig. 1 we can see the execution time versus the number of nodes. If we recall Amdhal's law we could calculate the speed-up Sup:

Figure 1 Execution time for a benchmarking C++ program with an increasing number of nodes of the cluster.

where T1 is the task execution time for a single processor, Tm is the execution time for the multiprocessing portion and T0 the execution time of the sequential and communication portions (overheads). This means that we have a gain of about a factor 4 in calculation time using the cluster. Normalising to the number of processors we obtain the efficiency E=Sup/N @ 0.997 proving that the architecture offers a good scalability trend.

We should now endeavour whether the same scalability will be achievable in case of a more complex MC calculation

2.2 Simulation

As regards geometry we carried out the simulation of a linear accelerator of electrons 600CD (Varian, Palo Alto, California) with a 6 MV potential applied. This machine is routinely used in our centre in particular for complex radio-treatments such as IMRT and Total Body Irradiation (TBI). The accelerator is equipped with a Millennium Multi-Leaf Collimator (MLC) (Varian, Palo Alto, California). The MLC is composed of 120 thin round-ended leaves of different size. The 40 central leaves are small, 0.5 cm at the isocentre, and they are able to produce a 20x20 cm2 field; the remaining leaves have a width of 1 cm, except for the 4 external ones, which are 1.4 cm wide. The scheme of the accelerators as it is described in the MC is given in Fig.2. The electrons at about 6 MeV impinge on a metallic target were bremmstrahlung photons arise, interacting with the flattening filter surrounded by the primary collimator. Below the monitor chambers, which are used for on-line monitoring of the beam, are mounted the jaws that could move to define the fields’ size during the run. The last device is the MLC collimator in which every single leaf could be moved and positioned accordingly to the MLC files defined for the machine.

The dose calculation could be done through software interfaces on different media such as homogeneous and anthropomorphic phantoms or real patients. The CT data sets are used as input of geometrical description of phantom and patient using Hounsfield number to recover electron density and hence reconstruct their materials composition.

Figure 2. Accelerator’s head. Components are described in the text.

As regards physics, in the Geant4 simulation we model electrons, positron and photons with a flavour of processes including both elastic and inelastic: multiple scattering, bremmstrahlung, ionisation, photoelectric, Compton and Rayleigh effects etc... We would highlight that Geant4 has no tracking cut but only production threshold and particle are followed down to zero range, unless otherwise specified by user: so it is not necessary to tune any parameter.

Special care has been devoted to software engineering allowing a fast and easy addition of different tools for data analysis, graphical user interface (GUI) and input data management. Moreover a general GUI has been created to manage the overall simulation: the compilation, make and link of the C++ code, the run in the parallel environment and the analysis of the data.

3. RESULTS AND DISCUSSION

Once set-up the simulation, to prove the correctness of the implementation, we reproduced PDD and dose profiles for different field at different SSD in water phantom. In Fig. 3 we could see the PDD curve for a 40X40 cm2 field at Source Surface Distance (SSD) of 100 cm. We obtained a general agreement of the order of 1% between exp and MC in the exponential tail while a 2-4% error was found in the build-up region, where the experimental measurements become less accurate.

Figure 3 Simulation versus experimental PDD curve for a 40X40 cm2 field size in water at SSD=100cm

In fig. 4 is shown the dose profile for a 10X10 cm2 field at SSD=100 cm. We have examined different field sizes and set-ups to compare MC and experimental measurements.

Therefore we parallelised our simulation using LAM/MPI parallel libraries. In Fig. 5 we show the number of events per seconds for the full simulation in several condition for the HW described in Tab 1. The comparison of the first 2 points gives indication on how computation power has developed in about 1 year. Then we could see how the number of events per second increases if we use 2, 3 or 4 parallel nodes. The efficiency E=0.987 proves the scalability of the system.

Figure 4 Simulation versus experimental normalised dose profile curve for a 10X10 cm2 field size in water at SSD=100cm

Figure 5 Number of events processed per second for the full simulation.

With this work we were therefore aiming at evaluating different advanced dose calculations engines against both experimental measurements and numerical simulation with a Monte Carlo algorithm. While experimental measurements provided an estimate of the accuracy of dose calculation in homogeneous and inhomogeneous phantoms, Monte Carlo proved to be a valid alternative in dose calculations where the ordinary algorithms fail. Monte Carlo calculations performed on patient CT data, would allow an evaluation of the error associated with the commercial algorithms for any patient of interest.

Due to the high statistical precision necessary, to the complexity of the geometry and to the physics involved, a lot of time is required for the simulation. This time is unacceptably high if compared to traditional algorithms. To lower the calculation time, the Monte Carlo has been parallelised and the simulation carried out on high performance computers, presently obtained at reasonable cost with a cluster of workstation. We hence proved the scalability of the parallel implementation. More intensive run on complex treatments and patient anatomy are in progress to understand whether it would be possible to create a tool to be used in clinical practice instead of less precise and more expensive analytical algorithms.