Optimization of the Extraction Solvent

Optimization of the Extraction Solvent

Supplemental material 1

Optimization of the extraction solvent

The following extraction solvents/solvent mixtures were tested on 10 randomly chosen leaves: methanol, dichloromethane, heptane, 1:1 methanol:dichloromethane, 3:1:1 methanol:chloroform:water, 3:3:2 acetonitril:2-propanol:water, 5:2:2 methanol:chloroform:water, and 2:2:0.5 methanol:chloroform:water. Based on GC-TOF (method parameters as stated in the manuscript) profiles of the test samples, 3:3:2 acetonitrile:2-propanol:water was used for sample extraction. This solvent mixture extracted a high number of metabolites and provided very reproducible results.

Supplemental material 2

Quality control of the GC-TOF data

Before acquiring the experimental data, six calibration curve samples spanning one order of dynamic range and consisting of 29 pure reference compounds, comprising a variety of different metabolites, e.g., amino acids, sugars and organic acids, were recorded as quality control (QC) samples to ensure instrument performance. Together with the QC samples, one blank sample prepared in the same manner was recorded to control for laboratory contamination. Furthermore, daily QC samples were used. These samples comprised one instrument blank and one method blank in addition to one freshly prepared QC sample (highest calibration level). A set of these QC samples was run for every 10 experimental samples and evaluated daily. The evaluation and basic principles of the QC scheme employed followed the principles outlined by Fiehn et al. (2008).

To evaluate the quality of the GC-TOF data, principal component analysis (PCA) was applied to all samples injected to ensure that the blank and QC samples were well separated from the biological samples (supplemental figure 2.1). Furthermore, the data were investigated for batch effects using a PCA of all the biological samples by coloring the scores plot according to the injection number and/or the batch number of the samples. No groupings of the samples could be observed, indicating that there were no batch effects (supplemental figure 2.2). Additionally, variations in the peak heights of the fatty acid methyl ester (FAME) retention index markers across the entire series of injections were evaluated and found to have an acceptable relative standard deviation (RSD%) of approximately 15%, with the exceptions of the C16 and C10 FAMEs. These had RSD%s of 18.5% and 17.2%, respectively which were also acceptable because these markers were located inside clusters of several larger, closely eluting peaks.

Supplemental Figure 2.1-PCA scores plot showing the clear separation of biological samples (leaves from ant hosting plants (red), leaves from control plants (green)) from method blanks (blue) and quality control samples (pink).

Supplemental Figure 2.2 -PCA scores plot colored according to the batch number of the samples.

Supplemental material 3

Initial analysis of the GC-TOF data

A total of 624 metabolite peaks were found in the C. arabica leaf extracts. Of these, 96 were annotated, whereas the remaining 528 were kept in the dataset as unknowns and identified only by their unique Binbase number. Inspection of the Binbase results for replaced values indicative of missing peaks and overlaying the chromatograms, indicated that the differences between ant-hosting and control plants were primarily quantitative because the same peaks were generally present in all the sample extracts. When the sampling times (1-14) were visualized individually by PCA, clear separations between ant-hosting and control plants were observed at all sampling times. However, when all sampling times were analyzed together, no clear separation of the plants was found. When this PCA scores plot was colored according to the number of AM spots on the individual leaves (data not shown), it was evident that leaves with a high number of spots were grouped more distantly from the other samples. From the analysis of pure AM (manuscript in preparation), AM contains a range of small common primary metabolites, such as amino acids, sugars, and organic acids, that are also found in C. arabica leaves. The fact that these compounds are being deposited on the leaf surface, and thus being extracted along with the leaf, complicates the distinction between an actual metabolic response and compounds simply present on the leaf surface. To remove all compounds present in both AM and C. arabica leaves from the data analysis was not a viable solution. This approach would have removed a large proportion of potentially important metabolites, e.g., several amino acids. Therefore, the separation of leaves with a high number of AM spots observed in the initial PCA warranted further scrutiny. To identify compounds that were positively correlated with the number of AM spots on the leaf, a PLS-DA model with the number of AM spots as the Y-variable was prepared. Details regarding the validation of this model are described in supplemental material 4.1. In the PLS-DA model scores plot colored according to the number of AM spots (supplemental figure3.1), it was observed that more spots present on the leaf surface created greater distance from the leaves of control plants and ant-hosting plants with no spots. To identify the metabolites responsible for this separation, the metabolites with the 50 highest loadings on the first latent variable (LV) were selected because the primary separation was along this direction in the scores plot. From the list of these 50 variables, non-annotated metabolites were removed, and eight annotated metabolites remained: β-alanine, glycine, urea, valine, citrulline, isoleucine, 1-kestose, and leucine.

Supplemental Figure 3.1 -PLS-DA model of leaves from ant-hosting plants with AM spots using the number of AM spots as the Y-variable. The scores plot, latent variable (LV) 1 versus 2, has been colored according to the number of AM spots on the leaf surface.

From the analysis of pure AM (data not shown), it was evident that all of these metabolites were also abundant in pure AM, and when plotting the levels of these compounds in C. arabica leaves against the number of AM spots, clear positive correlations were observed (data not shown). Other compounds were present in similar levels in pure AM but showed very little or no positive correlation between the number of AM spots on the leaf surface and the levels in the C. arabica leaves. At the time of writing, no explanations for these observations existed; thus, all leaves with AM spots were excluded from the remaining data analysis. This step was performed to prevent compounds deposited in the AM spots from confounding the metabolic response of ant-hosting C. arabica plants. Thus, the following univariate and multivariate data analyses were performed on a reduced data set consisting of all leaves from control plants and only the leaves from ant-hosting plants without AM spots (23 samples removed). One exception was the Pearson correlation matrices, in which correlations for leaves from ant-hosting plants with AM spots were also investigated.

Supplemental material 4

Validation of the PLS-DA models

PLS-DA models have a tendency to be over-fitted and provide overly optimistic results, so we needed to rigorously validate these models (Westerhuis et al. 2008). The present study included a relatively large number of samples, which reduced the risk of over-fitting that is more likely to occur when modeling few samples and hundreds of variables (Westerhuis et al. 2008). Close attention was still paid to over-fitting by the careful validation and evaluation of the models. Validation was performed by random repeated cross-validation using 1,000 repetitions and either 28 segments for the model using the number of AM spots on the leaf surface as the Y-variable or 20 segments for the model using the treatment of the plants (ants/control) as the Y-variable. With this type of validation, six samples were randomly chosen as test set samples in each repetition. A calibration model was then built using the remaining samples and tested using the test set. This process was repeated 1,000 times, and the resulting model was an average of all of the repetitions. When aiming for a rigorous validation, it is important to remove test set samples for validation that are independent of the samples used for calibration. Samples such as technical or biological replicates cannot be considered independent. When repeating the random selection of test set samples 1,000 times, in some cases, independent samples were selected, and in others, they were not. Thus, the validation was sometimes weak and sometimes rigorous. However, due to the high number of repetitions, we believe that the validation could be considered a rigorous validation. The evaluation of the validated models was performed by inspecting the root mean square error (RMSE) and standard error of prediction (SEP) plotted against the number of latent variables (LVs) for both the calibrated and the validated models as well as the scores plots, the variance explained in the X matrix and r2 values.

4.1 Validation of PLS-DA model with number of AM spots as Y-variable

The validated model (random repeated cross-validation using 1,000 repetitions and 28 segments) was evaluated by the inspection of RMSE and SEP with an increasing number of LVs (supplementalfigure 4.1). Curves for both the calibrated and the validated model were similar and decreased smoothly with the number of LVs, which indicates a robust model. As expected, the validated RMSE and SEP were slightly higher, but they followed the same trend as the calibration model curves. The r2 values for the calibrated and the validated model were 0.9996 and 0.9683, respectively, which indicated good separation. The explained variance in the X-matrix increased smoothly with the number of LVs used in the model (supplementalfigure 4.2), which also indicates a robust model. The optimal number of LVs was found to be 19, which explained 65% of the total variation. Increasing the number of LVs by one only explained approximately 1% additional variation. Fifty LVs explained 86% of the variation.

Supplemental Figure 4.1 –Validation of the PLS-DA model of leaves from ant-hosting plants with AM spots using the number of ant-manure (AM) spots on the leaf surface as Y-variable. Root mean square error of prediction (RMSE) and standard error of prediction (SEP) plotted against the number of latent variables for both the validated model (val) as well as the calibrations model (cal).

Supplemental Figure 4.2 –Validation of the PLS-DA model of leaves from ant-hosting plants with AM spotsusing the number of ant-manure (AM) spots on the leaf surface as Y-variable. Explained variance in the X-matrix plotted against the number of latent variables.

4.2 Validation of PLS-DA model using the treatment of the plants (with/without ants) asdummy Y-variable

The validated model was evaluated by inspecting plots of RMSE and SEP versus the number of LVs (supplemental figure 4.3). Both curves for the calibrated and the validated model were similar and decreased smoothly with the number of LVs, which indicated a robust model. As expected, the validated RMSE and SEP were slightly higher, but they followed the same trend as the calibration model curves. The r2 values for the calibrated and the validated models were 0.9974 and 0.9058, respectively, which indicated a good separation of the groups. The explained variance in the X-matrix increased smoothly with the number of LVs used in the model (supplemental figure 4.4), which also indicates a robust model. The optimal number of LVs was 13, which explained 59% of the total variation. By including an additional LV, the explained variance increased less than 2%. Ninety percent of the variation was explained with 50 LVs.

Supplemental Figure 4.3 - Validation of the PLS-DA model of leaves from control plants and ant-hosting plants without AM spots using the treatment of the plants (with/without ants) as dummy Y-variable. Root mean square error of prediction (RMSE) and standard error of prediction (SEP) plotted against the number of latent variables for both the validated model (val) as well as the calibrations model (cal).

Supplemental Figure 4.4 - Validation of the PLS-DA model of leaves from control plants and ant-hosting plants without AM spotsusing the treatment of the plants (with/without ants) as dummy Y-variable. Explained variance in the X-matrix plotted against the number of latent variables.

Supplemental material 5

Supplemental Table 5.1 –Metabolite identifiers for annotated metabolites with significantly different levels in the leaves of Coffeaarabica as a result of the treatment of the plants (with/without ants). Only metabolites which have an LV1 loading among the 100 highest (up-regulated in ant-hosting plants) or the 100 lowest (up-regulated in control plants) in the PLS-DA model which also have p<0.05 and FC>1.4 (fold-change, calculated as ant-hosting plants versus control plants) are shown.

Metabolite ID: / p: / FC: / LV1 loading: / FAME retention index: / Characteristic ion m/z:
Valine / 0.0191 / 3.1225 / 0.0839 / 314553 / 144
Isoleucine / 0.0323 / 2.8360 / 0.0765 / 360071 / 158
Phenylalanine / 0.0215 / 2.3929 / 0.0567 / 537507;502040 / 218;120
Threonine / 0.0007 / 2.3670 / 0.0956 / 410607;361026 / 218;130
Serine / 0.0074 / 2.2944 / 0.0808 / 396155;339438 / 204;116
Citrulline / 0.0362 / 2.2622 / 0.0700 / 622683 / 157
Glycine / 0.0048 / 2.0909 / 0.0687 / 368800 / 174
Alanine / 0.0252 / 1.9317 / 0.0549 / 245337 / 116
Aspartate / 0.0105 / 1.6963 / 0.0622 / 433407 / 160
Glutamate / 0.0029 / 1.6208 / 0.0721 / 530204 / 246
Oxoproline / 0.0205 / 1.5422 / 0.0564 / 485692 / 156
β-alanine / 0.0151 / 1.4920 / 0.0582 / 435969 / 174
Linoleic acid / 0.0130 / 1.7259 / 0.0825 / 776982 / 337
Oleic acid / 0.0066 / 1.6704 / 0.0739 / 778854 / 117
Palmitic acid / 0.0006 / 1.4233 / 0.0823 / 714075 / 117
Catechin / 0.0316 / 4.6644 / 0.0785 / 987442 / 368
2,3-Dimethylquinoxaline / 0.0076 / 2.7319 / 0.0748 / 828765 / 158
Epicatechin / 0.0455 / 1.6127 / 0.0599 / 981994 / 368
Caffeine / 0.0017 / 1.6018 / 0.0760 / 644775 / 109
1-Kestose / 0.0160 / 3.7472 / 0.0878 / 1123718 / 230
Ribose / 0.0029 / 1.6926 / 0.0853 / 554970 / 103
β-gentiobiose / 0.0070 / 1.6735 / 0.0657 / 969142;975821 / 204;160
Glycerol / 0.0154 / 1.6398 / 0.0748 / 346242 / 117
Ribitol / 0.0022 / 0.5755 / -0.0573 / 577209 / 217
Myo-inositol / 0.0022 / 0.4970 / -0.0540 / 730336 / 305
N-acetyl-D-mannosamine / 0.0305 / 0.6681 / -0.0534 / 735610 / 319
Pyrrole-2-carboxylic acid / 0.0436 / 3.1467 / 0.0711 / 394622 / 240
Cytidine-5'-diphosphate degr. product / 0.0314 / 2.2642 / 0.0599 / 860212 / 217

Supplemental Table 5.2 –Non-annotated (unknown) metabolites with significantly different levels in the leaves of Coffeaarabica as a result of the treatment of the plants (with/without ants). Only metabolites which have an LV1 loading among the 100 highest (up-regulated in ant-hosting plants) or the 100 lowest (up-regulated in control plants) in the PLS-DA model which also have p<0.05 and FC>1.4 (fold-change, calculated as ant-hosting plants vs. control plants) are shown.

Binbase ID: / p: / FC: / LV1 loading: / FAME retention index: / Characteristic ion m/z:
642982 / 0.0083 / 10.2600 / 0.0895 / 934466 / 217
706781 / 0.0066 / 9.1527 / 0.0918 / 935035 / 217
702622 / 0.0090 / 8.3791 / 0.0884 / 929537 / 217
648015 / 0.0468 / 5.4337 / 0.0565 / 533744 / 188
644764 / 0.0385 / 3.9787 / 0.0684 / 1186583 / 169
650050 / 0.0329 / 3.6255 / 0.0696 / 1255158 / 361
650104 / 0.0033 / 3.3083 / 0.0935 / 812874 / 144
702626 / 0.0289 / 3.0860 / 0.0734 / 940158 / 361
652424 / 0.0024 / 3.0526 / 0.0911 / 757794 / 174
643592 / 0.0349 / 2.8180 / 0.0702 / 870730 / 172
642725 / 0.0003 / 2.7820 / 0.1075 / 806751 / 204
651429 / 0.0204 / 2.6935 / 0.0758 / 1216662 / 368
642800 / 0.0131 / 2.6465 / 0.0946 / 329656 / 211
643579 / 0.0401 / 2.5565 / 0.0646 / 897868 / 186
642929 / 0.0068 / 2.5078 / 0.0838 / 873587 / 156
644737 / 0.0044 / 2.4557 / 0.0906 / 1062648 / 204
644384 / 0.0011 / 2.3289 / 0.0973 / 1043439 / 307
649763 / 0.0087 / 2.3215 / 0.0829 / 865183 / 446
642854 / 0.0320 / 2.2756 / 0.0654 / 522559 / 159
643610 / 0.0156 / 2.1734 / 0.0788 / 886168 / 103
643556 / 0.0179 / 2.1726 / 0.0752 / 821701 / 144
643478 / 0.0144 / 2.0955 / 0.0729 / 716136 / 189
676244 / 0.0296 / 1.9952 / 0.0661 / 522900 / 159
644770 / 0.0185 / 1.9946 / 0.0691 / 606037 / 199
642993 / 0.0377 / 1.9607 / 0.0693 / 544001 / 204
643037 / 0.0087 / 1.9578 / 0.0984 / 243095 / 154
647770 / 0.0183 / 1.9194 / 0.0686 / 898931 / 217
654330 / 0.0141 / 1.8696 / 0.0811 / 361035 / 86
642853 / 0.0298 / 1.8067 / 0.0610 / 590685 / 231
643133 / 0.0068 / 1.7480 / 0.0732 / 1048666 / 204
647676 / 0.0131 / 1.7413 / 0.0650 / 257225 / 102
652613 / 0.0009 / 1.7345 / 0.0807 / 445402 / 128
648982 / 0.0289 / 1.7087 / 0.0711 / 345567 / 174
680901 / 0.0055 / 1.6679 / 0.0754 / 1049167 / 204
650963 / 0.0193 / 1.6513 / 0.0508 / 959971 / 169
Binbase ID: / p: / FC: / LV1 loading: / FAME retention index: / Characteristic ion m/z:
642756 / 0.0004 / 1.6369 / 0.0779 / 1071213 / 307
643055 / 0.0349 / 1.5899 / 0.0671 / 318475 / 117
642772 / 0.0236 / 1.5699 / 0.0669 / 520411 / 210
643207 / 0.0199 / 1.5658 / 0.0503 / 722243 / 255
651461 / 0.0137 / 1.5479 / 0.0650 / 956180 / 319
642744 / 0.0285 / 1.5164 / 0.0647 / 481441 / 210
642946 / 0.0017 / 1.4985 / 0.0634 / 287799 / 126
646257 / 0.0223 / 1.4969 / 0.0673 / 464235 / 210
642836 / 0.0022 / 1.4766 / 0.0836 / 592619 / 299
643244 / 0.0157 / 1.4190 / 0.0706 / 302503 / 188
645601 / 0.0328 / 1.4165 / 0.0624 / 519392 / 103
644436 / 0.0219 / 1.4103 / 0.0511 / 523603 / 217
650426 / 0.0189 / 1.3126 / 0.0584 / 845126 / 225
642862 / 0.0192 / 0.6990 / -0.0551 / 867290 / 204
642992 / 0.0033 / 0.6702 / -0.0698 / 452959 / 172
644449 / 0.0174 / 0.6574 / -0.0502 / 844679 / 204
645035 / 0.0097 / 0.6552 / -0.0601 / 842293 / 204
644113 / 0.0227 / 0.6168 / -0.0463 / 725049 / 289
643074 / 0.0018 / 0.5910 / -0.0743 / 458177 / 172
645383 / 0.0000 / 0.5764 / -0.0644 / 1099460 / 309
644051 / 0.0078 / 0.5655 / -0.0630 / 1010048 / 297
643099 / 0.0031 / 0.5652 / -0.0525 / 512420 / 117
644383 / 0.0006 / 0.5597 / -0.0585 / 746733 / 319
642990 / 0.0076 / 0.5549 / -0.0467 / 1020541 / 249
644013 / 0.0063 / 0.5468 / -0.0463 / 704139 / 241
644109 / 0.0144 / 0.5463 / -0.0563 / 878402 / 204
642866 / 0.0435 / 0.5445 / -0.0479 / 922064 / 204
642877 / 0.0249 / 0.5386 / -0.0498 / 585149 / 217
646027 / 0.0001 / 0.5341 / -0.0646 / 667067 / 130
643630 / 0.0017 / 0.5317 / -0.0578 / 962017 / 183
642963 / 0.0282 / 0.5025 / -0.0495 / 549286 / 245
642828 / 0.0001 / 0.4928 / -0.0712 / 744276 / 319
644475 / 0.0136 / 0.4239 / -0.0445 / 974084 / 191
642814 / 0.0019 / 0.4193 / -0.0503 / 669235 / 289
644145 / 0.0234 / 0.3439 / -0.0535 / 1025629 / 223
644081 / 0.0041 / 0.2176 / -0.0476 / 1001042 / 105

Supplemental Table 5.3 –Annotated metabolites found significant in either the univariateor multivariate data analyses, but not both and hence was excluded from the final list of metabolites with significantly different levels in the leaves of Coffeaarabica as a result of the treatment of the plants (with/without ants). Exclusion was based on metabolites meeting one or two of the following criteria: p>0.05, FC<1.4 or an LV1 loading not among the 100 highest (up-regulated in ant-hosting plants) or the 100 lowest (up-regulated in control plants) in the PLS-DA model (indicated by *). FC (fold-change) was calculated as ant-hosting plants vs. control plants.

Metabolite ID: / P: / FC: / LV1 loading: / FAME retention index: / Characteristic ion m/z:
3,4-dihydroxybenzoic acid / 0.0175 / 0.8197 / -0.0421 / 621690 / 193
4-hydroxybenzoic acid / 0.1517 / 1.4066 / 0.0372* / 537976 / 223
Alpha-tocopherol / 0.0762 / 0.5619 / -0.0420 / 1064327 / 237
Asparagine / 0.0930 / 6.6308 / 0.0690 / 475819;554034 / 100;116
Benzoic acid / 0.0052 / 1.2296 / 0.0652 / 337942 / 179
Citramalic acid / 0.0242 / 1.2869 / 0.0518* / 457703 / 247
Cyanoalanine / 0.0613 / 4.1556 / 0.0681 / 404175 / 141
Fructose / 0.0554 / 0.7401 / -0.0474 / 640502;644307 / 103;103
Galactinol / 0.4161 / 0.8769 / -0.0216 / 1018120;1020192 / 204;204
Glutamine / 0.1441 / 2.5316 / 0.0764 / 600653 / 156
Guanosine / 0.0518 / 3.6561 / 0.0715 / 956964 / 324
Leucine / 0.0760 / 6.0842 / 0.0622 / 347153 / 158
Linolenic acid / 0.0722 / 1.7817 / 0.0605 / 779400 / 108
Lyxosylamine / 0.1719 / 0.7616 / -0.0275 / 541792 / 103
Nicotinic acid / 0.0160 / 1.2672 / 0.0602 / 365117 / 180
Phosphate / 0.2978 / 1.4078 / 0.0409* / 345765 / 299
Pipecolinic acid / 0.1273 / 2.8485 / 0.0495* / 403451 / 156
Proline / 0.0586 / 3.8504 / 0.0717 / 364708 / 142
Putrescine / 0.0918 / 2.6831 / 0.0599 / 588551 / 174
Salicylaldehyde / 0.0899 / 1.9899 / 0.0388* / 405428 / 119
Stearic acid / 0.0045 / 1.1887 / 0.0624 / 787569 / 117
Sucrose / 0.2811 / 0.8051 / -0.0282 / 915714 / 361
Tyramine / 0.0598 / 1.7337 / 0.0497* / 664017 / 174
Urea / 0.7660 / 0.7129 / -0.0073* / 327621 / 171

Supplemental material 6

Supplemental Figure 6.1 - Pearson correlation matrix heat map of annotated compounds in leaves from ant-hosting plants with AM spots