Metia CEP Windows Supercomputer Beats Linux in Quest to Identify Cancer Proteins

Microsoft HPC Server
Customer Solution Case Study
/ Windows Supercomputer Speeds Quest to Identify Cancer Proteins
Overview
Country or Region: Melbourne, Australia
Industry: Medical research
Customer Profile
The Melbourne-Parkville branch of the Ludwig Institute for Cancer Research Ltd conducts clinical-related laboratory research. It works closely with the Royal Melbourne Hospital.
Business Situation
The branch needs to analyse patient protein samples very rapidly. However, genetic analysis demands vast quantities of computing power, and the Institute’s budget limits expensive IT purchases.
Solution
Having experimented with Linux, the Institute’s research unit switched back to an all-Windows platform, deploying Microsoft HPC Server 2008 R2.
Benefits
· Sharp reduction in total IT spend
· Better, faster analysis
· Expanded scope of medical research / ”Switching from Linux back to a Windows parallel processing platform saved us some major expenditure – and as a funded body, our budget is always, always, under pressure.”
Eugene Kapp, Joint Proteomics Informatics Manager, Ludwig Institute for Cancer Research
The Ludwig Institute for Cancer Research Ltd is a global not-for-profit organisation that aims to improve cancer treatment by tightly integrating laboratory and clinical research. In Australia, the Institute works closely with the founding members of the Victorian Comprehensive Cancer Centre at the Royal Melbourne Hospital. One of the Institute’s analysis units – Proteomics – uses very high-powered computers to identify mutant proteins present in the samples taken from cancer patients. In the face of expanding demand and limited resources, Proteomics chose to run the system on an open source platform. However, it observed that the costs of providing the additional technical support required by Linux dramatically outweighed the licensing savings. In 2010, the unit switched back to an all-Windows environment, deploying Microsoft HPC Server 2008 R2. Proteomics now services more complex requests within a lower overall budget, expanding the scope of the research Proteomics conducts for the Institute.

Situation

The Ludwig Institute for Cancer Research Ltd is an international not-for-profit organisation that seeks to improve cancer treatment by closely integrating advanced laboratory research with clinical case management. It currently has ten branches in North America, South America, Europe and Australia.

The Institute’s two Melbourne branches employ approximately 140 research staff. The Melbourne-Parkville branch is co-located with the Royal Melbourne Hospital and specialises in laboratory research into colorectal cancer and how that research can be applied in a clinical environment (translational research). The Melbourne-Austin Branch primarily undertakes clinical research and associated translational research, but also conducts laboratory research that supports the Institute's clinical trials.

Proteomics is a small department within the Melbourne-Parkville branch and is jointly funded by the Ludwig Institute and the Walter and Eliza Hall Institute of Medical Research. Employing only five staff, it provides a critical service to the Institutes by analysing samples, including tissue samples from patients.

Staff use mass spectrometers to isolate individual proteins and peptides within the sample. Then, to help identify abnormalities in these proteins and peptides, they try to match the individual spectrometer outputs – called mass spectra – with DNA databases. Once they have identified the predicted genetic sequence of a protein or peptide in a cancer sample, they pass the information back to research or clinical staff at the Institute.

“The capability to analyse samples quickly is central to our ‘bench-to-bedside’ research philosophy,” says Eugene Kapp, Joint Proteomics Informatics Manager, Ludwig Institute for Cancer Research Ltd. “Our core objective is practical – to improve patient outcomes.”

However, Proteomics faces a never-ending challenge. Running the Proteomics algorithms that match DNA sequences can consume a vast amount of computing power and resources, straining the unit’s lean budget. Every component of its technology costs has to be funded out of a total budget of less than A$1 million, which includes staff salaries.

“Every year, the spectrometers generate more data, faster and faster, which we have to analyse,” says Kapp. “A single mass spectra interrogation or query takes 3 to 4 seconds. Four years ago, we were being asked to analyse 1,000 mass spectra in one hour, now mass spectrometers are producing 30,000 to 40,000 spectra per hour. We are struggling to keep up with the avalanche of data.”

Proteomics’ challenge was also becoming more acute. The algorithms used to identify the proteins were becoming more complex because the Institute wanted greater detail – for example, the difference between a protein’s characteristics before and after treatment. At the same time, the DNA database against which they were analysed was expanding daily, as more genomes were mapped and published.

“The upshot of all this is that the job of identifying – accurately – the proteins in any sample fast enough to make the technique applicable to patient care was actually becoming harder,” says Kapp.

Solution

To meet accelerating processing demand, Proteomics tried several network solutions. In 2005, the unit partnered with Dell, and DNA interrogations were executed on PowerEdge Blade servers running Windows Server, with a commercial algorithm called Mascot that directed individual calculations to particular server nodes.

“Our typical costs were about A$1,000 to license the operational software for each server node,” says Kapp. “But at the same time as processing demand increased, our budgets were being squeezed, so we knew we couldn’t keep on paying for more server nodes.”

At the beginning of 2009, Proteomics decided on a more radical solution.

“We moved half our operations – comprising about eight server nodes – onto the Linux network, and started running database interrogations using a separate algorithm,” says Kapp.

While this reduced server licensing costs, the Linux network required a significant increase in technical support.

“I observed that the Linux network required constant tweaking, and there was a lot of fiddling around with command lines,” says Kapp. “In the end we needed one full-time IT person just to keep that network operational, and even then we intermittently suffered downtime, which held up getting results.”

At the beginning of 2010, Proteomics investigated a supercomputing option. Microsoft’s HPC Server 2008 R2 provides parallel processing cluster computing across one or more Windows Server 2008 nodes, which meant that Proteomics could get far more computing power from its servers.

“It’s the same principal as a Formula One wheel change,” explains Boris Manitius, HPC Solution Specialist, Microsoft ANZ. “If you had one person lifting the car, unscrewing the bolts, replacing the wheel, refueling and wiping the visor, the job would take around 10 minutes. If you have a whole team, arranged to maximise efficiency, it takes 10 seconds.

“With HPC Server 2008 R2, you can run Windows-based programs in parallel, similar to the Formula One example. Depending on the amount of code running in parallel, you can run programs up to a hundred times faster. Or bring the calculation time from hours down to minutes or even seconds.”

At the same time, Microsoft adjusted its licensing policy. Previously, research institutions in Australia had to teach or issue doctorates to qualify for an academic licence. As part of the Association of Australian Medical Research Institutes (AAMRI), Microsoft allows members to purchase licences under charity licensing conditions. In June 2010, Proteomics took advantage of this and switched back to an all-Windows environment, using the HPC Server to manage database interrogation.

Benefits

By reverting to Windows, Proteomics incurred nominal extra licensing fees but slashed its support, scripting and porting costs. By eliminating the need for full-time technical support, Proteomics freed up more of its budget to invest in increasing computational power. This will help it keep pace with the accelerating complexity of the Institute’s analytics requirements, and will hopefully contribute positively to cancer research and clinical outcomes.

Reduced overall costs

The Microsoft HPC Server 2008 R2 supercomputing environment dramatically cut Proteomics’ technology costs.

“The all-Microsoft cluster is operational 24 hours a day, and we don’t have to touch it,” says Kapp. “So far it has never held up research requests.

“As a direct result, we avoided having to pay an additional A$90–$100,000 per year for extra technical support. Switching from Linux back to a Windows parallel processing platform saved us some major expenditure – and as a funded body, our budget is always, always, under pressure.”

Better, faster analysis

The core benefit to Proteomics is an increase in raw processing power, and this translates directly into better research. Because the Institute knows Proteomics can power through more interrogations, it ramps up the complexity of its requests.

“Our objective is simply to get more grunt, so that we can do more research at the same time,” says Kapp. “But now we have more grunt, researchers want more questions answered.

“For example, we are now being asked to find mutations. This is extremely complex, and the only answer is to find more server nodes. The algorithms on the HPC server are able to do this, and now we are able to answer these questions.”

Expanded scope for medical research

The ability to expand the Institute’s analytics capabilities has long-term implications.

“Cancer research is one of the areas in which Australia can excel,” says Kapp. “Even with limited resources, we can continue to turn out first-class research. We are now getting answers back that are as good or better than other analytics departments out there.

“In the longer term, the all-Windows environment would theoretically make it easier for us to start using the cloud. Genetic science is developing so fast, and ultimately, we will always be asked to do more. Cloud computing is a highly efficient way of accessing the pure computing power we need.

Microsoft Server Product Portfolio

For more information about the Microsoft server product portfolio, go to: