Customer Solution Case Study
/ / Life Sciences Company Uses SQL Server 2005 to Enhance Full-Text Searches
Overview
Country or Region: Germany
Industry: Pharmaceuticals
Customer Profile
Based in Winden, Germany, Metalife provides tools and services to assist pharmaceutical companies and scientists explore genetic databases as they search for ways to cure and prevent disease.
Business Situation
Metalife needed to provide increased capacity for concurrent, full-text searches, and provide easier data access, data snapshots for versioning, and easy-to-use reporting.
Solution
Metalife upgraded to Microsoft® SQL Server™ 2005 (64-bit) database running on the Microsoft Windows Server™ 2003, Enterprise x64 Edition, operating system.
Benefits
n Easier data navigation
n Better support for concurrent full-text searches
n Versioning with Database Snapshot
n Easy-to-use reporting / “SQL Server 2005 has given us the performance we needed to give scientists a single doorway into the world of genetics.”
Michael Schönemann, Founder and Chief Executive Officer, Metalife
Metalife is helping researchers who are seeking new treatments and cures for a range of diseases. The company’s PhenomicDB database provides a central repository for genetic information from the Human Genome Project and similar efforts. As the success of Metalife’s database has grown, so has demand for support for concurrent, full-text searches against its database. The company needed the ability to support more concurrent users. By upgrading to Microsoft® SQL Server™ 2005 (64-bit), the company now supports 3,000 users as they perform concurrent, full-text searches. And Metalife’s lab tests show SQL Server 2005 can handle more than 10,000. Metalife is also using the Database Snapshot feature of SQL Server 2005 to enable scientists to run queries against earlier versions of the database.
Situation
Ever since James Watson and Francis Crick solved the mystery of how genetic material was carried within the double helix structure of DNA, a discovery for which they won the 1962 Nobel Prize in Physiology, there has been a worldwide race to use the findings to benefit humankind. That race has intensified in recent years through efforts like the Human Genome Project, a multinational project that, after 13 years, succeeded in providing a complete mapping of the 20,000 genes that make up the human chromosomes that make our life possible.
But mapping the genome is only the beginning of a huge scientific inquiry. Each gene is essentially a cookbook, holding instructions for how to create the proteins that are the building blocks of life. Researchers around the world are searching for ways in which proteins can be used to treat and prevent a spectrum of diseases.
Metalife AG, a German life sciences company and a Microsoft Certified Gold Partner, is dedicated to solving a major problem scientists face—how to deal with the wealth of genetic data that has been generated in the decades since Watson and Crick’s discovery. Genetic researchers are interested in genotype (what the gene is made of) and phenotype (what the gene does) databases.
As a service to the scientific community, Metalife created its Web-based PhenomicDB multispecies relational database (www.phenomicdb.de ). PhenomicDB provides a single point of access for a number of genotype and phenotype databases. PhenomicDB compiles this wealth of data into a single, integrated resource by mapping of the phenotypic data fields and other methodologies. To allow for direct comparison of phenotypes of orthologous genes, from yeast to humans, these mapped data are uploaded together with a gene index-like database. Examples include the NCBI Gene database (Pruitt et al. 2001) and the associated orthology data HomoloGene database (Wheeler et al. 2004). The result is that PhenomicDB helps scientists to browse and compare known phenotypes for a given gene or a set of genes from different organisms.
Metalife provides a much larger database for its commercial customers. Called the Metalife Knowledgebase, this 1.4 terabyte database includes the 20 gigabytes (GB) PhenomicDB, in addition to a vast collection of genetic information (including articles from scientific journals from around the world), and data from the European Bioinformatics Institute (EBI), the UniProt protein sequence database, and a wealth of other sources.
Metalife has created a set of applications, the Metalife 2005 Suite, designed to help pharmaceutical companies and other organizations reduce the cost of research by giving them the ability to better predict outcomes through use of in silico (computer) rather than in vitro (live) experimentation. The Metalife 2005 Suite works in conjunction with the Metalife Knowledgebase.
As Metalife’s PhenomicDB database and the Metalife Knowledgebase grew in popularity, Metalife saw the need to enhance performance and features, including:
n Providing easier access to the data.
n Increasing capacity for concurrent full-text searches, especially on the public PhenomicDB.
n Providing data snapshots for versioning.
n Giving users an easy-to-use but powerful reporting tool.
Solution
Metalife upgraded its PhenomicDB database and its Metalife Knowledgebase to Microsoft® SQL Server™ 2005 (64-bit) from SQL Server 2000 Enterprise Edition (64-bit). The databases, deployed as a single instance of SQL Server, continue to run on the Microsoft Windows Server™ 2003, Enterprise x64 Edition, hosted on an 8-way server with Itanium 2 processors and 32 GB of RAM, and direct-attached storage. Windows Server 2003 and SQL Server 2005 are part of Microsoft Windows Server System™ integrated server software.
The company is using the full-text search feature of SQL Server 2005 to replace a custom indexing application Metalife had created after finding it needed even better performance than it had seen from the Full-Text Search feature of SQL Server 2000.
The Metalife Knowledgebase is supplemented by a suite of client applications that Metalife created by using the Microsoft Visual Studio® .NET 2003 development system and the Microsoft .NET Framework. The applications include:
n Metalife Predictor. Predictor is an application for in silico prediction of functional characteristics of biological objects and automatic annotation of unannotated sequences. Predictor is a combination of approved, state-of-the-art algorithms optimized for highest accuracy of produced knowledge. Some of the algorithms have been already implemented by Metalife by using Microsoft SQL Server 2005 common language runtime (CLR) stored procedures. Enhancements were made for distributing the algorithms on multiple machines.
n Metalife Analyzer. Analyzer technology dramatically enhances interpretation and understanding of complex structures underlying large sets of complex data. The application combines powerful algorithms to improve knowledge extraction by automatic classification of data and abstraction of common concepts, document clustering, and content similarity search. The Metalife analyzer module includes also a complete thesaurus management system with worldwide-and user-defined scientific ontologies.
n Metalife Explorer. Explorer applications automatically compile complete dossiers on defined objects—including proteins, genes, and low molecular weight compounds. Compiled dossiers of defined objects in Metalife Explorer contain information collected from all integrated, publicly available data sources. Dossiers include information extracted from scientific publications by Metalife’s text extraction procedure or generated by highly sophisticated machine-learning algorithms.
n Metalife Validator. Validator helps eliminate false positives by combining the strength of Metalife Predictor algorithms with background information from life science literature in the Metalife Knowledgebase. This validation results in predictions that are more valuable and verifiable.
n Metalife Array Analyzer. Array Analyzer provides a computational and graphical environment for storing, preprocessing, analyzing, clustering, and saving results from DNA microarray experiments. The program can organize and analyze the data in a number of different ways dedicated to helping researchers design and analyze expression profiling experiments to generate high-quality, statistically sound, and biologically interesting results.
Metalife deploys the Metalife Knowledgebase as a dedicated hosted server for each customer (typically global pharmaceutical companies). A copy of the Metalife Knowledgebase mirrors to a Dell PowerEdge 6600 4-way server with AMD 64-bit processors. Each customer, by hosting its own dedicated server, has the security of knowing that it is the only company that uses the database, and that all of its queries are confidential. The hosted solution also means that a pharmaceutical doesn’t have to open a firewall port to receive daily updates. Metalife uses the merge replication feature of SQL Server 2005 to push updates to its dedicated servers. Metalife also supports some of its customers by providing on-site databases that can be updated across a virtual private network.
Extract, transform, and load (ETL) procedures are handled by a custom application Metalife created by using Visual Studio .NET. The company uses the Database Snapshot feature of SQL Server 2005 for versioning, and SQL Server 2005 Reporting Services for generating reports.
Benefits
Since Metalife upgraded to the 64-bit version of SQL Server 2005, the company has realized a number of benefits, including easier data navigation, better support for concurrent, full-text searches, versioning with Database Snapshot, and easy-to-use reporting.
Easier Data Navigation
The 64-bit version of SQL Server 2005 helps Dr. Michael Schönemann, founder and Chief Executive Officer of Metalife, realize his goal of providing ease of access to a world of genetic data so that scientists can pursue their passions without having to become computer programmers and database administrators. “SQL Server 2005 has given us the performance we needed to give scientists a single doorway into the world of genetics,” Schönemann says. “We’ve brought together the essential, top 100 databases that, together, provide nearly a complete record of what is known about genotypes and phenotypes across several species. And using Visual Studio .NET and the .NET Framework, we’ve given our users powerful tools for exploring this genetic realm.”
Unifying databases and providing easy-to-use searching tools makes life easier for researchers. “If someone is looking for a protein interaction to help treat a disease, they should not have to be an expert on what databases they need to search, what algorithm they need to use, nor should they have to spend time away from science to learn how to write searching scripts,” says Schönemann. “The easier we can make it for researchers to explore, the greater are their chances for discovering applicable values from the wealth of genetic information that is being generated around the globe. Our client applications make it as easy to explore protein interactions as working with any Microsoft Office application. You can just ‘drag-and-drop’ to create your queries.”
The Metalife applications also support new ways of looking at data, including rich visual models, which are especially helpful in dealing with proteins where the folding shapes of the proteins can play key roles in their interactions with proteins of other biological entities. “We want to display data in ways that can speed the scientific process,” says Schönemann. “We want to help researchers more quickly identify the sequences and interactions they need.”
The images generated by the Metalife applications can be deeply explored because they are drawn from the SQL Server 2005 database. Click—selecting a protein or the line joining two proteins or a range of other points—and you can drill down into the journal articles and other resources that describe the research behind the relationships, and follow any supporting citations.
The Metalife 2005 Suite of tools, running against the 1.4 terabyte SQL Server 2005 database, could also save time and money for pharmaceutical companies as they research new treatments. “A pharmaceutical company can spend U.S.$800 million, and more than a decade attempting to bring a new drug to market,” says Schönemann. “We believe we can help researchers substantially reduce the time and money involved by giving them the chance to explore potential protein interactions and other factors within the complete biological pathway in silico—within our SQL Server 2005 database—rather than in vitro in the laboratory. If we did nothing more than help researchers identify dead ends more quickly, it would be a substantial contribution to science.”
Better Support for Concurrent, Full-Text Searches
With the growth in popularity of Metalife’s PhenomicDB came increasing performance demands. Much of the information within the database is in text documents, placing more pressure on search performance. “With more concurrent users performing full-text searches against the database, we found performance slowed noticeably,” says Nadia Pavlova, Director of Business and Software Development, at Metalife. “SQL Server 2005 has increased our ability to support concurrent, full-text searches by orders of magnitude. We are now supporting between 2,000 and 3,000 concurrent users performing full-text searches. And in our laboratory stress testing, we find we can support more than 10,000 concurrent full-text searches.”
In addition to being robust in its support of concurrent users, Metalife has found SQL Server 2005 full-text searches to be fast. “SQL Server 2005 is at least as fast as our custom code was,” says Pavlova. “We created a custom algorithm to return all full-text searches against our 1.4 terabyte database in less than a second, and that is what we’ve seen with SQL Server 2005.”
Before the company deployed SQL Server 2005, Metalife had to create its own custom indexing application and external indexes. “We stopped using our custom solution as soon as we saw the enhanced, full-text search performance of SQL Server 2005,” says Schönemann. “We were also happy to see that SQL Server 2005 backs up the full-text indexes at the same time it backs up the database. It is a relief to no longer have to support the external indexes and custom indexing application that we once had to use.”
Versioning with Database Snapshot
The company uses the merge replication feature of SQL Server 2005 to automatically update its mirrored servers that support Metalife Knowledgebase customers. The company also uses merge replication to synchronize the logs and the history between the different mirrors of the database.
Before doing a merge replication, Metalife uses the Database Snapshot feature, new to SQL Server 2005, to preserve an image of the database. The Database Snapshot feature gives Metalife the ability to generate and use a read-only, stable view of a database without encountering the overhead or storage requirements of creating a complete copy of the database.