Model / R2X cum / R2Y cum / Q2 cum / CV-ANOVA / Percentage of correct prediction (internal validation)
1 / 0.25 / 0.942 / 0.696 / 1.12*10-13 / 100
2 / 0.19 / 0.927 / 0.672 / 1.27*10-14 / 99.3
3 / 0.194 / 0.935 / 0.638 / 2.04*10-12 / 100
4 / 0.27 / 0.898 / 0.657 / 4.83*10-8 / 98.61
5 / 0.218 / 0.93 / 0.656 / 1.19*10-11 / 98.91
6 / 0.243 / 0.912 / 0.646 / 7.07*10-9 / 98.61
7 / 0.194 / 0.945 / 0.677 / 6.82*10-16 / 100
8 / 0.229 / 0.918 / 0.611 / 2.41*10-7 / 99.31
9 / 0.259 / 0.893 / 0.659 / 2.04*10-7 / 99.31
10 / 0.184 / 0.93 / 0.646 / 1.11*10-11 / 99.31
Total (median) / 0.22 / 0.93 / 0.66 / 1.15*10-11 / 99.31

Table e-1: Summary of OPLS-DA models (MND vs control patients) from 10 independent training sets. The modeled variations in X (R2X(cum)) and Y (R2Y(cum)) matrices on spectral datasets, predictability of the model (Q²), CV-ANOVA and percentage of correct prediction in the training set (i.e. internal validation) are given. Models were generated from 80% (n=145 out of 181) of the entire cohort; data from 76 randomly and independently selected MND and 69 controls patients were used.

Figure e-1 : Principal Component Analysis Score plot for A: MND patients (n=95), turquoise dot: ALS, dark blue dot: other motor neuron disorders and B : control patients (n=86), from light to dark green, respectively: MS :multiple sclerosis, APN: axonal peripheral neuropathy; CIDP: chronic inflammatory demyelinating polyradiculoneuropathy; other : other neurological diseases.

Figure e-2 : Scatter plot of OPLS-DA scores from the NMR examination of CSF samples of the entire cohort (internal validation). MND patients (n=95): blue dot (turquoise dot: ALS, dark blue dot: other motor neuron disorders); Control patients (n=86): green dot; from light to dark green, respectively: MS :multiple sclerosis, APN: axonal peripheral neuropathy; CIDP: chronic inflammatory demyelinating polyradiculoneuropathy; other : other neurological diseases.

e1-Methods

Using Carr-Purcell-Meiboom-Gill (CPMG) sequences spin echo spectra were obtained on 32K data points with a spectral width of 7500Hz. Prior to Fourier transformation (FT), the FIDs were zero-filled to 64K data points which provided sufficient data points for each resonance. Spectra were processed using WinNMR version 3.5 software (Bruker Daltonik, Karlsruhe, Germany). All spectra were corrected for phase distortion and the baseline was manually corrected for each spectrum.

The 1H-NMR spectra were automatically reduced to ASCII files using the AMIX software package (Analysis of MIXture, version 3.1.5, BrukerBiospin, Karslruhe, Germany). The regions containing water (4.70 – 5.51 ppm) signal were removed from each spectrum to eliminate baseline effects of imperfect water saturation. Spectral intensities were scaled to the total intensity and reduced to equidistant integrated regions of 0.005 ppm (buckets) over the chemical shift range of 0.7-9.5 ppm. Before multivariate analyses, the NMR spectral datasets were preprocessed using the peak alignment algorithm icoshift (http://www.models.life.ku.dk) in order to minimize spectral peak shift caused by residual pH differences within samples. The corresponding realigned bucket tables were then exported.

e2-Methods

A Principal component analysis (PCA) was done first as unsupervised clustering to identify similarities or differences between sample profiles. Grouping, trends and outliers were examined from scatter plots generated by the program. Orthogonal partial least-squares discriminant analysis (OPLS-DA) evaluated variations in buckets between groups. The OPLS-DA was cross validated by withholding one-seventh of the samples in seven successive simulations such that each sample was omitted once in order to guard against over fitting. This approach meant that the OPLS-DA was built from one “predictive” component and two or more orthogonal components. Q2 and R2 assessed the robustness of the model. R2 is defined as a fraction of the variance explained by a component. Cross validation of R2 gives Q2, which represents the proportion of total variation predicted by a component.The set of multiple models resulting from the cross validation was used to calculate jack-knife uncertainty measures.