Bi-Partite Clustering Based on Individual Spelt 1 and Spelt 52 Tandem Repeats Features

Bi-Partite Clustering Based on Individual Spelt 1 and Spelt 52 Tandem Repeats Features

Supplementary material

When considering population polymorphism for Spelt 1 tandem repeats separately, bi-partite cluster analysis shows the grouping of central and intermediate populations versus marginal populations (Table 2; supplementary Fig. 7). Thus, only 1 to 3 distal Spelt 1 blocks per diploid genome were detected in the marginal Technion-2 population, while in the central populations there were 26-28 blocks (Table 2). Moreover, in central and intermediate populations (Fig. 3a-c), the sizes of distal Spelt 1 blocks and, accordingly, the intensity of fluorescence, were distinctly higher than in almost all (except Tartus, not shown) marginal populations (Fig. 3d). The impact of Spelt 52 tandem repeats on population subdivisions differed from that of Spelt 1. The unsupervised bi-partite clustering based on cytogenetic data for Spelt 52blocks abundance showed the separation of marginal and intermediate from central populations (supplementary Fig. 8). Thus, the numbers of blocks vary from 22-33 in central populations to 6-18 in intermediate and marginal populations (Table 2).

Bi-partite clustering based on individual Spelt 1 and Spelt 52 tandem repeats features

The unsupervised PAM bi-partite clustering of genotypes based on (i) only 28 Spelt 1tandem repeat features (numbers of Spelt 1 blocks across 28 chromosome arms in the diploid genome, 2n=14) and (ii) only 28 Spelt 52tandem repeatfeatures demonstrated the following.

  1. The Spelt 1 blocksbased clustering (supplementary Fig. 7) separated normal and intermediate populations from marginal populations.The clusteringcorresponded to the first discriminant function of the tri-partite clustering based on all Spelt1 and Spelt52blocksfeatures.
  2. The Spelt 52 blocksbased clustering (supplementary Fig. 8) separated marginal and intermediate populations from central populations.This clusteringcorresponded to the second discriminant function of the tri-partite clustering based on all Spelt 1 and Spelt 52 tandem repeatsfeatures.

Fig. 7a The bi-partite clustering of genotypes based on 28 Spelt 1 tandem repeatsfeatures (counts of blocks in 28 chromosome arms of the diploid genome). 1 - central and intermediate populations; 2 - marginal populations. This clustering unites clusters I (intermediate populations)and C (centralpopulations) of Fig. 2a, and separates them from cluster M (marginal populations). Such a clustering corresponds to the first discrimination function of LDA (supplementary Table 4). b The clustering silhouette widths for individual genotypes (dark blue bars) and predicted classes (yellow bars) for the bi-partite clustering according to Spelt 1 blocks features.

Fig. 8a The bi-partite clustering of genotypes that is based on 28 Spelt 52 tandem repeats features. 1 -Marginal and intermediate populations; 2 - central populations. This clustering united clusters M and I of Fig. 2a and separated them from cluster C. Such a clustering corresponded to the second discrimination function of LDA. b The clustering silhouette widths for individual genotypes (dark blue bars) and predicted classes (yellow bars) for the bi-partite clustering according to the Spelt 52 tandem repeats features.

Table4Selected “predictive” features and their loadings on two discriminant functions of the initial LDA based on Spelt1 and Spelt52tandem repeatsfeaturesfor all 56 genotypes.

tandem repeat / chromosome arm / LD1 / LD2
Spelt 1 / 1Sa / 63.89 / 37.15
Spelt 1 / 1La / -32.99 / 45.09
Spelt 1 / 1Sb / 30.16 / -6.78
Spelt 1 / 2La / 42.19 / -2.76
Spelt 1 / 3Sa / -83.00 / -18.79
Spelt 1 / 3Lb / 44.82 / -20.92
Spelt 1 / 4Lb / -83.00 / -18.79
Spelt 1 / 5Sa / 18.96 / -25.94
Spelt 1 / 5Sb / -34.63 / 11.27
Spelt 1 / 5Lb / 20.72 / 47.48
Spelt 1 / 6La / 4.69 / 35.71
Spelt 1 / 6Sb / -27.78 / -9.40
Spelt 1 / 7Sb / -34.63 / 11.27
Spelt 1 / 7Lb / 46.78 / 2.72
Spelt 52 / 2Sb / 6.75 / 21.82

Small letters “a” and “b” denote the long (L) and short (S) arms of homologous chromosomes; because the statistical analysis was performed taking into account the distributions of Spelt1 and Spelt52 blocks in each of the 28 arms, homologous chromosomes in each individual genotype were identified nominally as 1a and 1b, 2a and 2b, etc.