simplified two-dimensional capillary electrophoresis-mass spectrometry mapping: Analysis of proteolytic digests.

Guillaume L. Erny and Alejandro Cifuentes*

Institute of Industrial Fermentations (CSIC), Juan de la Cierva 3, 28006 Madrid, Spain

*Corresponding author:

Dr. Alejandro Cifuentes, Fax#: 34-91-5644853, e-mail:

Abbreviations:EIE, extracted ions electropherogram; TIE, total ions electropherogram; CYC-B, cytochrome c from bovine; CYC-R, cytochrome c from rabbit; CYC-H, cytochrome c from horse.

Keywords: 2D, mapping, CE-MS, proteomics, peptides, proteins, chemometric.

Abstract

Capillary electrophoresis-mass spectrometry (CE-MS) has demonstrated to be a very useful hyphenated technique for Proteomic studies. However, the huge amount of data stored in a single CE-MS run makes necessary to account with procedures able to extract all the relevant information made available by CE-MS. In this work, we present a new and easy approach able to generate a simplified two-dimensional map from CE-MS raw data. This new approach provides the automatic detection and characterization of the most abundant ions from the CE-MS data including their m/z values, ion intensities and analysis times. It is demonstrated that visualization of CE-MS data in this simplified 2D format allows (i) an easy and simultaneous visual inspection of large datasets, (ii) an immediate perception of relevant differences in closely related samples, (iii) a rapid monitoring of data quality levels in different samples and (iv) a fast discrimination between comigrating polypeptides and ESI-MS fragmentation ions. The strategy proposed in this work does not rely on an excellent mass accuracy for peak detection and filtering since MS values obtained from an ion trap analyzer are used. Moreover, the methodology developed works directly with the CE-MS raw data, without interference by the user, giving simultaneously a simplified 2D map and a much easier and more complete data evaluation. Besides, this procedure can easily be implemented in any CE-MS laboratory. The usefulness of this approach is validated by studying the very similar trypsin digests from bovine, rabbit and horse cytochrome c. It is demonstrated that this simplified 2D approach allows obtaining in a fast and simple way specific markers for each species.

1. Introduction

1.1 General aspects

Proteins are fundamental components of all living beings and include many substances such as enzymes, hormones, antibodies, etc, necessary for the proper functioning of any organism. Separation and identification of proteins became in the last years of great importance impulsed by the development of the Proteomics field and the seeking for a better understanding of some biologic functions. However, it is well known that to get a complete knowledge on the proteins content of any organism in a given moment is an extremely complex task since organisms usually contain thousands of proteins of very different concentration, size, hydrophobicity and charge. Moreover, enzymatic digestion of these proteins is usually required increasing enormously the amount of compounds to analyze, which illustrates the difficulty of this task [1].

Some recent developments have focused on the separation of proteins without aiming to identify all of them, but intending to provide protein profiles or fingerprintings under specific conditions. This fingerprinting (usually displayed in a form of a 2D-map using color coding for intensity) can be used to establish Proteomic patterns for diagnostic purpose or to easily obtain a biomarker specific for a particular disease. Moreover, these profiling techniques can be useful not only for clinical applications but also for food analysis including e.g., food adulterations, detection of genetically modified organisms, etc [2]. Alternatively, proteolysis patterns are sometime favored as the peptidic fragments are more soluble, stable, and usually easier to separate [3]. However, as each protein gives rise to numerous fragments, the pattern complexity is significantly increased in this case.

Protein or peptide maps are usually achieved using 2D separation techniques, 2D-PAGE being the most common procedure. Other 2D techniques have also been used such as HPLC/HPLC, HPLC/CZE, CZE/MEKC,[4-6] as well as hyphenated techniques with MS as second dimension (e.g., CE-MS or HPLC-MS) [7,8]. The great advantage of MS is that allows identification of a given compound based on its relative molecular mass (Mr). In this latter case data visualization of MS data in the format of a map (fraction number or retention time as y-axis and m/z as x-axis, with a color color-coding for signal intensity) seems to be suitable [9]. However, one of the main problems when using MS as second dimension is the increased complexity due to fragmentation of parent ions. Although the use of soft ionization procedures such as electrospray for HPLC-MS [10] and CE-MS [11] reduces significantly the internal energy during the ionization, and thus limits the fragmentation process, the ionization will never be “soft” enough to ensure that a particular detected ion does not result from the fragmentation of a parent ion.

Apart of the above mentioned limitations, the huge amount of data in different formats produced by Proteomic techniques has determined the urgent need for procedures able to extract the relevant information from the MS spectra [12]. Specific tools have already been developed to display m/z ratios in conjunction with data from a separation step [13-15]. These procedures cannot only be used for MS mapping, but also for visual analysis and comparison between various datasets through adequate normalization. The increasing activity in this field underlines the need for flexible data visualization tools that can easily be applied to a wide variety of experimental setups. As evident, all these technologies are still burdened with certain limitations. The most severe limitation, however, might not be the technical aspect of MS and/or separation (data accumulation) but rather the subsequent data evaluation.

The aim of this paper is to demonstrate the possibilities of a new and easy approach developed at our lab able to provide a simplified 2D mapping of CE-MS data. The approach is based on the automatic detection and characterization of the main peaks and m/z values from the raw data. The simplified 2D mapping will be obtained by performing a classical peak analysis for every m/z containing important information. To our knowledge, this is the first time that such approach has been proposed. The strategy proposed in this work does not rely on an excellent mass accuracy (like the one provided by more expensive MS analyzers as e.g., TOF-MS or FT-ICR-MS) as an attribute for peak detection and filtering since data obtained from an ion trap analyzer are used. Moreover, the methodology developed works directly with the CE-MS raw data, without interference by the user, providing simultaneously a simplified 2D map and a much easier and complete data evaluation. The usefulness of this approach is validated by analyzing the trypsin digests of cytochrome c from three different species, namely, bovine (CYC-B), rabbit (CYC-R) and horse (CYC-H). It is shown that this simplified 2-D procedure permits the detection of specific markers for each species even from very similar proteolytic digests.

1.2 Theoretical section.

The chemometric tool developed in this work allows carrying out the following three steps in an automatic way (see Figure 1). First, raw data from a given CE-MS run are automatically converted in a 2 dimensional matrix, namely, time m/z (step 1 in Figure 1). In the second step of Figure 1, the m/z values containing useful information are detected (vide infra), and a series of extracted ions electropherograms (EIE) are reconstructed based on those principal m/z ions. In the last step, each individual EIE is automatically analyzed to obtain the main m/z values together with their mass incertitude, peak area and analysis time (see table in Figure 1, step 3). These three steps provide an automatic and drastic reduction in the data size, making easier the following simplified 2-D representation and allowing a better study and visualization of the CE-MS results.

In order to automatically detect the m/z values of interest, the standard deviation (SD) of the ionic intensity values obtained for each m/z is calculated along the time scale using our approach. Logically, the most interesting m/z values will have the highest SD as a result of the large ionic intensity variation observed along the time. Therefore, m/z values of interest are selected as those with a SD higher than a certain threshold. EIEs are then reconstructed by summing for each time the ionic intensities obtained inside an m/z interval centered at the detected main m/z value plus twice the mass incertitude (0.5 m/z in our case). However, in the case where two detected m/z ions are close to each other (less than 0.4 m/z), they will be processed in a single EIE (see Figure 1, Step 2). The resulting data will be a series of array, each of them representing an EIE and indexed by the average m/z and mass incertitude.

Peaks are then detected in each EIE as a succession of data points whose signal is 10 times higher than the average noise calculated using the 20 first points of every EIE. For each detected peaks two electrophoretic parameters were measured, the peak area, A, and the peak migration time, tm, by

(1)

and

(2)

where the summation i is over every point that defines a particular peak for a given m/z, being Ii and ti the intensity and time respectively. As can be seen, the migration time has been calculated using the first statistical moment [16] and will slightly differ from the peak maximal for asymmetrical peaks. However, the use of this parameter allows a higher precision than the peak maximal whose precision can be limited by the sampling rate [17]. For each detected peak, area and migration time, as well as m/z and its incertitude are recorded in a table as indicated in step 3 of Figure 1. These data are next used to get the simplified 2D CE-MS representation.

2. Experimental section

2.1 Chemicals

Ammonia (30%) was from Panreac (Barcelona, Spain), methanol (HPLC grade) from Scharlau (Barcelona, Spain) and formic acid from Merck (Darmstadt, Germany). Trypsin and cytochrome cfrom bovine heart (CYC-B), horse heart (CYC-H) and rabbit heart (CYC-R) were from Sigma (St. Louis, MO, USA). Water was deionized with a Milli-Q system (Millipore, Bedford, MA, USA).

2.2 Protein hydrolysis

Cytochrome c from the different species were dissolved in a buffer solution containing 200 mM sodium acetate, 20 mM Tris and 0.2 mM calcium chloride at a concentration of 2 mg/ml. Trypsin was dissolved in water at a concentration of 2 mg/ml. CYC and trypsin were mixed at a ratio of 10 to 1, and the digestion was allowed to proceed for 16 h at 37°C. The enzymatic digestion was stopped by increasing the temperature to 80°C for 10 min. Proteins digest were stored at -4°C.

2.3 Capillary Electrophoresis-Electrospray-Mass Spectrometry (CE-ESI-MS)

CE-ESI-MS analyses were carried out in a PACE/5500 CE instrument (Beckman, Fullerton, CA, USA) coupled to a Bruker Daltonic Esquire 2000 ion-trap mass spectrometer (Bruker Daltonik, Bremen, Germany) using commercial coaxial sheath-flow interface. The separation method was adapted from Simo et al. [18,19]. Briefly, the MS was operated in the positive ion mode, and scanned from 200 to 1100 m/z at 13000 u/s. ESI parameters were: nebulizer pressure, 27579 Pa; dry gas flow, 8L/h; dry gas temperature, 120°C; and a sheath liquid made of methanol-water (50/50) at a flow rate of 4 μL min-1. Separation was performed in a 90 cm long capillary (50 μm i.d., from Composite Metal services, Worcester, England) using a buffer made of 0.9 M formic acid adjusted to pH 2 with ammonium hydroxide. Between runs the capillary was rinsed for 3 min with water and 1 min with buffer. CYC hydrolysates were injected without any dilution or purification step for 20 sec at 3447 Pa.

2.4 Data analysis and programming

For this work, different computer tools were used. The software integrated with the instrument (DataAnalysis version 3.0, Bruker Daltonic Bremen, Germany) was used to obtain the extracted ion electropherograms (EIEs) as well as to convert the raw data in ASCII format. Visual basic (Visual Basic 6.0, Microsoft) was used to program the different filtering routines, and the computation of the electrophoretic figures of merits. Results were recorded in an Excel spreadsheet (Excel 2000, Microsoft) for further analysis.

3. Results and discussion

As mentioned above, the usefulness of this new approach was validated by comparing the 2D mapping obtained after digestion with trypsin of cytochrome c from three species. Namely, bovine (CYC-B), horse (CYC-H) and rabbit (CYC-R) cytochrome c digested with trypsin were compared. An additional aim was to find a CE-MS marker for each species, which could be used as quality control to detect e.g., adulterations of minced meat [20-22]. Logically, this approach for 2D-CE-MS mapping can be useful in many other applications including the finding of biomarkers, the identification of therapeutic polypeptide targets, the establishment of patterns for diagnostic purposes [9], etc.

The total-ion electropherograms (TIEs) obtained by CE-MS of the three cytochromes digested with trypsin are shown in Figure 2. As can be seen, few differences can directly be detected from these CE-MS electropherograms. Indeed, although peak 1 could be used as a marker for CYC-R, no unique feature can be observed for CYC-B and CYC-H. For example, if peak 2 is no present in CYC-R, it is present in CYC-B and CYC-H, similarly peak 3 is not present in CYC-H but present in CYC-B and CYC-R and peak 4 is not present in CYC-R, but present in CYC-H and CYC-B. The same applies to the group of peaks labeled as 5 in Figure 2.

In order to obtain more information (including specific markers for each species), the classical procedure would be to analyze the full MS spectra for every peak and to compare these results among the different species. However, this procedure is labor intensive and time consuming. Alternately, a straight 2D mapping of the samples could be compared. An example of such representation for the hydrolysis of the CYC-B is shown in Figure 3. In our case, this 2D map was obtained from the original 2 dimensional matrix (step 1 in Figure 1), that was pasted in an excel spreadsheet. For size and speed consideration, the m/z values have been compressed by a factor of 40. As can be seen, much more information is obtained in this case. However, as evident from the wealth of data, it was impossible to evaluate the raw data using commercially available software. For example, with our MS set-up (mass scan from 200 to 1100) a 2D matrix as the one shown in Figure 3will easily represent more than 10 Mbyte. More importantly, such representation can provide an overloaded of information that can hide important differences [9].

Therefore, the usefulness of the new approach described under Theory for achieving a simplified 2D-CE-MS mapping was tested. The original TIE of a given trypsin digest analyzed by CE-MS is shown in Figure 4A, and its corresponding graph of the measured standard deviations (SD) for each m/z is shown in Figure 4B. As can be observed, the m/z values with high SD values agree with the most intense spots shown in Figure 3. For example, it can be seen in Figure 4b that the EIEs corresponding to m/z = 584.9 and 589.2 will have important information (highest SD in Figure 4B). Those two m/z values correspond to two of the most intense spots in Figure 3. Moreover, some ions that contribute in a large extent to the noise (e.g., m/z = 282.2 in Figure 3) do not give a high SD. The highest standard deviation can be found between m/z of 500 and m/z of 700 in good agreement with the results of Figure 3. The insert shown in Figure 4B corresponds to a zoom of the m/z values (x axis) between 500 and 510, and standard deviations (y axes) between 0 and 5000. As can be seen, each peak is extremely sharp with a peak width at half height well below 0.5 m/z. This will allow to automatically obtaining relevant extracted ion electropherograms with a suitable mass accuracy. Indeed, taking in Figure 4b a threshold of 2000, corresponding to 1% of the maximum SD, 735 different EIEs have been automatically generated out of the 9000 m/z possible. Figure 4C shows the TIC obtained by summation of the 735 selected EIE, as can be seen, no difference can be visually observed between electropherograms of Figure 4A and 4C, what is further corroborated by the electropherogram of Figure 4d that shows the differences between Figures 4A and 4C. Namely, Figure 4D shows that the residual between the two figures will never represent more than 10% of the full peak, and is usually below 5%. Figure 4E shows the total ion electropherogram after the step 3 of our approach (see Figure 1). To obtain this Figure, all points that were not detected as part of a peak were set to zero in the original matrix, points being part of a peak were baseline corrected. As can be seen, the elimination of the data that do not add information provides a significant increase in sensitivity, as also demonstrated by the insert of Figure 4E. In Figure 4F the differences between Figure 4E and Figure 4A are plotted. As can be seen, although a certain amount of information can be lost, this is basically due to a multitude of small peaks resulting from the fragmentation of the main ions that are not included.

As an example, the software has detected and measured 1628 peaks for the hydrolysis of bovine cytochrome c spanning four orders of magnitude (area between 3000 and 15000000). After applying the procedure proposed in this work, the resulting file contains less than 150 Kbytes, a reduction by more than 50 times from the original data set recorded by the MS instrument (7.5 Mbytes). The 2D mapping using this new set of data is shown in Figure 5. Comparing this mapping with the original one displayed in Figure 3, it can be seen that all important information have indeed been conserved. Moreover, a much higher resolution is observed in the time scale in Figure 5 than in Figure 3. This is striking when comparing in Figure 5 the alignments of the spots from peaks 7, 9, 11 and 12 with the one from peaks 1, 2 and 8. This result came from the uses of electrophoretic parameters. Indeed, peaks in the m/z dimension resulting from the ESI-MS fragmentation of the same parent compound will have the same peak shape. However, peaks in the m/z dimension resulting from different parent compounds will have different peak shapes. The accurate measurement of the electrophoretic parameters (migration time, but also peak variance, peak asymmetry…) allows highlighting small differences in the peak shapes. This is illustrated in Figure 6, where the full MS spectra of peak 10 (Figure 6A), and the EIEs obtained using the five more abundant ions from Figure 6A (Figure 6B) are compared with the full MS spectra for peak 8 (Figure 6C), and the EIEs obtained from the five more abundant ions in Figure 6C (Figure 6D) including m/z values higher than 300 (typically, m/z values lower than 300 have a high contribution to the noise signal). Using the ten most intense spots for peak 10, an average migration time of 26.648 min was obtained with a standard deviation of 0.008 min (i.e., a relative standard deviation of less than 0.05%), showing the very high precision of the procedure proposed to determine the peak center. Moreover, this result shows that our simplified 2D mapping can be of great help to differentiate CE comigrating polypeptides from those produced by ESI-MS fragmentation.