Proteomics to identify novel biomarkers and therapeutic targets in cardiovascular disease
Markus Kubicek, Silvia M. Sanz-González, Francisco Verdeguer and Vicente Andrés*
Laboratory of Vascular Biology, Department of Molecular and Cellular Pathology and Therapy, Instituto de Biomedicina de Valencia-CSIC, 46010 Valencia, Spain
* Correspondence:
Vicente Andrés
Instituto de Biomedicina de Valencia
Consejo Superior de Investigaciones Científicas
C/Jaime Roig 11, 46010 Valencia (Spain)
Tel: +34-96-3391752
FAX: +34-96-3690800
e-mail:
KEY WORDS: Proteomics, Protein modification, Two-dimensional electrophoresis, Mass spectrometry, Cardiovascular disease.
Abstract
The proteome is described as the entirety of all proteins expressed within a cell at a given moment. In contrast to the stability of the genome, the proteome is highly dynamic and reflects the cell’s current status. Since proteins carry out almost all biological functions, the proteome stands in direct relation to cellular functions. Proteomic analysis (i.e., two-dimensional electrophoresis, mass spectrometry and bioinformatics) aims at identifying changes in the composition of the proteome associated to pathophysiologic events that affect basic cellular functions. Functional proteomics expands to understanding the connection between proteomic changes and the state of a cell, taken into account that the observed modifications can be either cause or consequence of the pathological state. Proteomics will not only improve our basic understanding of the factors and molecular mechanisms underlying cardiovascular disease, but also will help identifying novel diagnostic markers and fuel the rational design and discovery of new drugs for medical intervention. This review will discuss basic proteomic approaches relevant to cardiovascular disease, as well as their applications for the identification of biomarkers and drug design.
Outline
1. Potential of proteomic studies in disease research
2. Methodological aspects of proteomic analysis
2.1. Two-dimensional separation of proteins
2.2. Protein visualization and image analysis
2.3. Protein identification by mass spectrometry (MS)
2.4. Recent developments in proteomic technology
3. Applications of proteomics to the pathobiology of the cardiovascular system
3.1. Cardiovascular proteomic databases
3.2. Proteomic studies in cultured cells
3.3. Proteomic studies in animal models of cardiovascular disease
3.3.1. Non transgenic models
3.3.2. Transgenic mouse models
3.4. Proteomic studies relevant to human cardiovascular disease
3.4.1. Proteomic analysis of human arterial tissue
3.4.2. Proteomic analysis of human cardiac tissue
4. Concluding remarks
5. Acknowledgements
6. References
1. Potential of proteomic studies in disease research
The genetic information stored in a cell’s nucleus (the genome) is undoubtedly the underlying factor determining cellular phenotype. Nevertheless, despite the impact of genomics in molecular biology and medicine, the genome should not be seen as more than a rough outline of a cell’s body plan. Although a certain percentage of diseases have been linked to genetic changes, the majority of pathological disordersare associated with protein alterations. Since proteins not only carry out almost all bio-enzymatic functions, but they also respond to and integrate extra- and intracellular changes, it comes as no surprise that proteins serve as the main drug targets and biological disease markers. Thus, understanding proteomic changes in pathological situations will help to decipher the molecular basis of disease and to elucidate the relevant protein components, pathways and regulations.
In terms of genome size, humans are not so much superior to nematodes. So in order to understand what makes up the complexity we inherit we have to look somewhere else than solely at the genome. DNA is a rather rigid molecule that serves as a template for the expression of mRNA and proteins. Ultimately, it is mainly the entirety of expressed proteins (the proteome) what accounts for the functional flexibility and complexity of the human body. Different mechanisms contribute to phenotypic diversity (Fig. 1). First, transcription of a single gene can give rise to different messenger RNAs (mRNAs) through utilization of alternative transcription initiation sites and alternative splicing [1]. Specific mRNAs display highly different half-lives depending in part on their sequence determinants, but also on the state of the cell. Still, the correlation between mRNA abundance and the abundance of the corresponding protein has been found to be quite low [2]. This discrepancy may arise mainly from protein degradation, which is highly specific and subjected to cellular regulation. Additional complexity arises because protein function depends on the right folding (secondary and tertiary structure), on their interaction with other biomolecules (quaternary structure), and on reversible post-translational modifications [3]. To date, more than 200 different protein modifications have been described [4], amongst them phosphorylation, methylation, glycosilation, prenylation, and sulfatation. In addition, subcellular localization and compartmentalization greatly affects protein function. For example, a number of proteins exert distinct functions in different microcellular compartments. Finally, most proteins operate as parts of large protein complexes or networks. These networks themselves are highly interconnected in complex networks that are capable of sensing and reacting to intra- and extracellular changes.
Although genomic analysis provides important information towards susceptibility to acquire a certain disease, it is equally important to elucidate proteomic changes induced by environmental factors which determine disease severity and prognosis (for example in cardiovascular disease or cancer). Therefore, disease proteomics will allow us to define the pathological state of a cell or a tissue. From this information we will be able to derive the molecular mechanisms of the disorder, putative medical intervention targets and diagnostic markers.
Proteomics encompasses a growing set of different techniques aimed at functionally describing the whole proteome of a cell or a tissue in a given moment under specific conditions. Protein components of pathological relevance may be identified by comparing the proteome of patients and unaffected individuals. Consequently, the relevant questions we have to answer in order to ascertain the pathological relevance of a candidate protein include the following: i) when, where and to what level is the protein expressed during disease development, and what are the types of modifications it is subjected to; ii) what is the function of the protein and how is that function modulated during disease progression; iii) is the candidate protein involved in protein-protein interactions, and what can we learn from these interactions about the molecular mechanism of the pathologic process; iv) can functional information be derived from the structure of the protein and how is this linked to the pathology; and v) can the acquired information be used for diagnostic purpose and/or for the design of therapeutic agents.
2. Methodological aspects of proteomic analysis
2.1. Two-dimensional separation of proteins
Proteomic technology still is in its infancy days, but methodological advances havemade possible to attack the task of analyzing the whole proteome of a clinical sample, an ambitious project given the enormous number of expressed proteins and the diversity in potential post-translational modifications they can undergo. Most proteomic studies aimed at elucidating disease-dependent proteome alterations have so far taken advantage of two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) [5](Fig. 2). The first dimension (isoelectric focusing) is carried out using gels with a fixed pH gradient to separate proteins according to their isoelectric point (pI). Subsequently, the proteins are resolved according to their relative masses in a second dimension by SDS-PAGE. After completion, protein spots are visualized and subjected to mass spectrometry (MS) in order to determine their identity [6]. Despite certain limitations, proteomics has provided useful information on disease related changes in protein expression as we discuss below.
Inorder to obtain the profile of protein expression in disease, proteins have to be extracted from complex biological samples such as cell populations, tissues, or biological fluids. Sample acquisition and preparation is the first and very often most critical step in 2D-PAGE. Biological fluids such as whole serum or plasma are relatively easily accessible, however protocols for sample collection are not usually ideally suited for proteomics due to possible protein degradation or modifications during handling. The use of serum is complicated by the high abundance of albumin that interferes with 2D-PAGE. This problem can be overcome by prefractionating the sample. For example, Zuo et al. described a system that allowed sample prefractionation into a few well-defined pools using microscale solution isoelectric focusing (nusol-IEF) prior to 2D-PAGE [7]. At least 6- to 30-fold higher protein loads were possible for nonalbumin fractions on narrow pH range IPG gels. This method substantially increases the dynamic range of protein detection since higher protein loads can be applied to narrow pH range 2D-PAGE gels.
Whole tissue analysis from human patients is often required in proteomics. Tissue heterogeneity may further complicate the analysis of clinical samples obtained by biopsy. Tissue microdissection techniques, especially laser-capture microdissection, allow the isolation of defined cell types from whole tissue. These approaches greatly reduce tissue heterogeneity, however the amount of protein retrieved maybe insuficient for 2D-PAGE analysis [8]. Lymphocytes and mononuclear cells can be easily isolated from whole peripheral blood by differential centrifugation. Tissue recovery of cells can be circumvented by analyzing cultures of primary or immortalized cell lines, an approach that has been widely used in proteomic research because yields high amounts of protein and experimental variables can be tightly controlled. Although treatment of cultured cells with agents that mimic, at least in part, the pathogenic process of interest may shed significant insight into disease-related intracellular changes, it is noteworthy that the results obtained from cultured cells may not accurately reflect the situation in the living tissue/organism.
Regardless of their source, proteins need to be extracted using a solubilization protocol that it is suitable for isoelectric focusing. Extraction is accomplished using a chemical cocktail that has to be optimized for every different sample. Still, a single buffer will not solubilize all the proteins contained in the sample due to their chemical and physical heterogeneity (i. e., differences in hydrophobicity, a high range of molecular sizes, etc). This problem can be solved if differentially extracted portions of the sample are analyzed individually. An intrinsic limitation of gel-based proteomic techniques is that highly abundant proteins are preferentially displayed. One approach to overcome this challenge is to reduce sample complexity by analyzing protein subsets rather than whole cell extracts. The removal of highly abundant constitutive proteinsmay help detect low-abundance proteins that may be pathologically relevant. In addition, subcellular fractionation allows the identification of interesting proteins that only exist in certain organelles and may not be visualized in whole cell lysates. Moreover, “organelle proteomics” allows searching for protein changes in compartments that are of special pathological interest. For example, surface membrane proteins currently comprise a large fraction of the therapeutic targets and diagnostic markers due to their key role in disease development (i.e., cancer invasiveness andinflammatory infiltration in atherosclerotic plaques). Of note in this regard, proteins with extracellular domains can be biotinylated and easily extracted by affinity-purification [9].
2.2. Protein visualization and image analysis
In order to compare proteomic profiles between pathological and normal control samples, proteins separated by 2D-PAGE must be visualized by in-gel protein staining protocols. Despite its relatively low sensitivity, Coomassie Brilliant Blue (CBB) R-250 is the most conventional approach for protein staining. However, quantification of protein amount is problematic since proteins may be visualized to a different extent during destaining. Colloidal CBB G-250 has circumvented this problem by allowing background-free detection of proteins at a sub-microgram level [10]. Silver staining protocols are at least hundred times more sensitive than CBB-based techniques [11], but inherent oxidative modification of the stained proteins makes it incompatible with MS [12]. Recently, commercially available MS-compatible silver staining kits have successfully been applied in proteomics [13]. Like silver staining, fluorescent methods allow for the visualization of a larger number of proteins, which is especially crucial for samples available in limited amounts [12]. Fluorescent staining of proteins combines high sensitivity and compatibility with downstream analysis and additionally allows staining of proteins that are hardly stained by alternative methods (like glycoproteins, lipoproteins, low molecular mass proteins and metalloproteases). In addition, alterations between the disease and the control profiles can be easily detected using differential in-gel electrophoresis (DIGE), in which the two samples are labeled with different fluorescent dyes and analyzed in the same two-dimensional gel [13]. Conventionally, pattern comparison between pathological and control samples requires gel scanning and “software-supported” analysis. Nevertheless, improvement in software for automated spot comparisons is desirable.
2.3. Protein identification by mass spectrometry (MS)
The identity of spots exhibiting different patterns between control and pathological specimens is usually assessed by MS “fingerprinting” [15]. This technique analyzes the mass-to-charge ratio (m/z) of peptide fragments produced by proteolytic cleavage with specific proteases (i. e., trypsin, chymotrypsin). Every protein digested with a specific enzyme will give rise to a unique mass spectrum that can be compared against a hypothetical genome-derived peptide-mass database. BecauseMS is a highly sensitive technique, special care must be taken to prevent sample contamination. In order to be subjected to MS, the spot of interest has to be excised from the gel and treated with the desired protease. The resulting peptide mixture is volatilized and ionized by matrix-associated laser desortion (MALDI) or electrospray ionization (ESI) [6]. The real mass analyzer is usually time of flight (TOF) if coupled to MALDI, or ion trap and quadrouple if ESI is used. In either case, peptides are separated according to their specific m/z plotted against peak intensities. Computer comparison to databases will give rise to hit lists, which are the typical informational outputs of proteomic experiments. In some cases, however, protein identity maybe uncertain due to high background levels or impurities (spots separated by two-dimensional electrophoresis often contain several proteins). In either case, the peptides can be further defragmented, producing smaller sized charged fragments that are re-analyzed in a collision induced (CID) mass spectrometer[16]. In principle, CID results in a peak pattern that also contains information about the peptide sequence. In this way proteins are identified accurately and additional information on the localization and identity of post-translational modifications can be gathered. Additional information on the equipment and techniques used to analyze peptide sequence by fragmentation is provided by Aebersold and Mann [6].
2.4. Recent developments in proteomic technology
Despite recent improvements, conventional two-dimensional MS-based analysis still holds some limitations, especially low throughput and the need of a relatively large amount of sample. Both of these problems are quite inconvenient for disease proteomics, where large-scale assays are desired. Several strategies have been developed in order to enhance sample automation. For example, direct scanning of 2D-PAGE or trypsin containing membranes by MALDI–MS have been reported [17]. Moreover, gel-free approaches using liquid chromatography (LC) [18] or capillary electrophoresis [19] to achieve protein separation prior to MS have been developed, thus circumventing the tedious task of 2D-PAGE. Protein samples can be separated by coupling two different HPLC columns in tandem and connecting them directly to a mass spectrometer. Usually, proteins in the first chromatography are retained according to their charge using an ion exchange column. Then, in a coupled reverse phase column, proteins are further separated according to their hydrophobicity [20]. Although protein mixtures can be analyzed this way, enzymatic digestion of the sample prior to chromatography is more convenient. In this approach a thousand of different peptides are separated by the HPLC that can readily be analyzed by MS. Because subpicomolar amounts of sample are sufficient, peptides derived from low abundant proteins that usually are not detectable by gel-based methods can be identified using this approach. Moreover, the serial setup of tandem HPLC connected directly to the MS analyzer allows for high throughput automation. Isotope code affinity tagging (ICAT) of proteins further increases detection sensitivity for low abundance proteins analyzed by gel-free proteomics [14]. By using different tags, one of natural abundance and the other isotopically labeled, quantitative differences in protein abundance between two different samples can be readily detected. The proteome is first digested and the resulting peptides are purified and subjected to LC-ESI.
Imaging MS has been recently developed to perform in situ proteomic analysis of whole tissues without the need of previous protein separation [21]. In this technique, frozen tissue sections are directly applied to MALDI-MS analysis in a regular spatial manner to directly obtain mass profiles across the tissue section.
Following the success of DNA microarray chips, protein microarrays are being developed for protein profiling and medical diagnostics [22-24]. Protein chips contain defined sets of proteins arrayed at high density onto glass slides. Some approaches use peptides [25] or whole proteins encoded by libraries which are coated onto the chip in order to screen for interacting molecules, or to carry out functional screens. Recently, reverse phase protein arrays have been described that immobilize the whole protein content of a tissue on an array [26]to screen for protein modifications or in search for autoimmune disorders [27]. Ziauddin et al. have reported the generation of protein microarrays whose features are clusters of live cells that express a defined cDNA at each location [28].
Protein array techniques are especially appropriate for understanding drug intervention and for fine-tuning of drug design. For instance, screening a protein array against low molecular weight drug inhibitors may yield significant insight into both drug targets as well as the drug’s mode-of-action. Protein arrays are also suitable to test the affinity of new drug derivatives. One example of such use is a recent study on FKBP12 binding to different small molecules in yeast [24]. Another useful application of protein chips is the screening for specific enzymatic activities. Protein kinases are of high interest in drug intervention due to their key regulatory role in biological signaling pathways. Zhu et al. have shown that many kinases remained active when immobilized on polydimethylsiloxan (PDMS) chips and showed genuine substrate specificity [22]. As an alternative to protein arrays, antibodies that will specifically recognize their respective antigen according to their abundance can be immobilized onto the microchip surface [23]. Such antibody arrays are suited for protein quantification (by using a second antibody against the bound protein), and for the detection of specific post-translational modifications like phosphorylation. The main advantage of protein arrays is their capacity to simultaneously analyze a high number of proteins even among different cell and tissue types. Chips can be combined with direct MALDI analysis for read-out, thereby generating a highly sensitive and high-throughput automated system.