Table 1. Polychlorinated biphenyls identification number, chlorine-field, and measured octanol-water partition coefficient [41] expressed in logarithmical scale (logKow)
PCB id. / Chlorine-filled / logKow / PCB id. / Chlorine-filled / logKow / PCB id. / Chlorine-filled / logKow1 / 2 / 4.601 / 70 / 2,3',5,5' / 6.267 / 139 / 2,2',3,4,5,6 / 6.517
2 / 3 / 4.421 / 71 / 2,3',5',6 / 6.047 / 140 / 2,2',3,4,5,6' / 6.607
3 / 4 / 4.401 / 72 / 2,4,4',5 / 6.671 / 141 / 2,2',3,4,5',6 / 6.677
4 / 2,2' / 5.023 / 73 / 2,4,4',6 / 6.057 / 142 / 2,2',3,4,6,6' / 6.257
5 / 2,3' / 5.021 / 74 / 2',3,4,5 / 6.137 / 143 / 2,2',3,4',5,5' / 6.897
6 / 2,4 / 5.150 / 75 / 3,3',4,4' / 6.523 / 144 / 2,2',3,4',5,6 / 6.647
7 / 2,4' / 5.301 / 76 / 3,3',4,5 / 6.357 / 145 / 2,2',3,4',5,6' / 6.737
8 / 2,5 / 5.180 / 77 / 3,3',4,5' / 6.427 / 146 / 2,2',3,4',5',6 / 7.281
9 / 2,6 / 5.311 / 78 / 3,3',5,5' / 6.583 / 147 / 2,2',3,4',6,6' / 6.327
10 / 3,3' / 5.343 / 79 / 3,4,4',5 / 6.367 / 148 / 2,2',3,5,5',6 / 6.647
11 / 3,4 / 5.295 / 80 / 2,2',3,3',4 / 6.142 / 149 / 2,2',3,5,6,6' / 6.227
12 / 3,5 / 5.404 / 81 / 2,2',3,3',5 / 6.267 / 150 / 2,2',4,4',5,5' / 7.751
13 / 4,4' / 5.335 / 82 / 2,2',3,3',6 / 6.041 / 151 / 2,2',4,4',5,6' / 6.767
14 / 2,2',3 / 5.311 / 83 / 2,2',3,4,4' / 6.611 / 152 / 2,2',4,4',6,6' / 7.123
15 / 2,2',4 / 5.761 / 84 / 2,2',3,4,5 / 6.204 / 153 / 2,3,3',4,4',5 / 7.187
16 / 2,2',5 / 5.551 / 85 / 2,2',3,4,5' / 6.371 / 154 / 2,3,3',4,4',5' / 7.187
17 / 2,2',6 / 5.481 / 86 / 2,2',3,4,6 / 7.516 / 155 / 2,3,3',4,4',6 / 7.027
18 / 2,3,3' / 5.577 / 87 / 2,2',3,4,6' / 6.077 / 156 / 2,3,3',4,5,5' / 7.247
19 / 2,3,4 / 5.517 / 88 / 2,2',3,4',5 / 6.367 / 157 / 2,3,3',4,5,6 / 6.937
20 / 2,3,4' / 5.421 / 89 / 2,2',3,4',6 / 6.137 / 158 / 2,3,3',4,5',6 / 7.087
21 / 2,3,5 / 5.577 / 90 / 2,2',3,5,5' / 6.357 / 159 / 2,3,3',4',5,5' / 7.247
22 / 2,3,6 / 5.671 / 91 / 2,2',3,5,6 / 6.047 / 160 / 2,3,3',4',5,6 / 6.997
23 / 2,3',4 / 5.677 / 92 / 2,2',3,5,6' / 6.137 / 161 / 2,3,3',4',5',6 / 7.027
24 / 2,3',5 / 5.667 / 93 / 2,2',3,5',6 / 6.137 / 162 / 2,3,3',5,5',6 / 7.057
25 / 2,3',6 / 5.447 / 94 / 2,2',3,6,6 / 5.717 / 163 / 2,3,4,4',5,6 / 6.937
26 / 2,4,4' / 5.691 / 95 / 2,2',3',4,5 / 6.671 / 164 / 2,3',4,4',5,5' / 7.277
27 / 2,4,5 / 5.743 / 96 / 2,2',3',4,6 / 6.137 / 165 / 2,3',4,4',5',6 / 7.117
28 / 2,4,6 / 5.504 / 97 / 2,2',4,4',5 / 7.211 / 166 / 3,3',4,4',5,5' / 7.427
29 / 2,4',5 / 5.677 / 98 / 2,2',4,4',6 / 6.237 / 167 / 2,2',3,3',4,4',5 / 7.277
30 / 2,4',6 / 5.751 / 99 / 2,2',4,5,5' / 7.071 / 168 / 2,2',3,3',4,4',6 / 6.704
31 / 2',3,4 / 5.572 / 100 / 2,2',4,5,6' / 6.167 / 169 / 2,2',3,3',4,5,5' / 7.337
32 / 2',3,5 / 5.667 / 101 / 2,2',4,5',6 / 6.227 / 170 / 2,2',3,3',4,5,6 / 7.027
33 / 3,3',4 / 5.827 / 102 / 2,2',4,6,6 / 5.817 / 171 / 2,2',3,3',4,5,6' / 7.117
34 / 3,3',5 / 4.151 / 103 / 2,3,3',4,4' / 6.657 / 172 / 2,2',3,3',4,5',6 / 7.177
35 / 3,4,4' / 4.941 / 104 / 2,3,3',4,5 / 6.647 / 173 / 2,2',3,3',4,6,6' / 6.767
36 / 3,4,5 / 5.767 / 105 / 2,3,3',4',5 / 6.717 / 174 / 2,2',3,3',4',5,6 / 7.087
37 / 3,4',5 / 5.897 / 106 / 2,3,3',4,5' / 6.717 / 175 / 2,2',3,3',5,5',6 / 7.147
38 / 2,2',3,3' / 5.561 / 107 / 2,3,3',4,6 / 6.487 / 176 / 2,2',3,3',5,6,6' / 6.737
39 / 2,2',3,4 / 6.111 / 108 / 2,3,3',4',6 / 6.532 / 177 / 2,2',3,4,4',5,5' / 7.367
40 / 2,2',3,4' / 5.767 / 109 / 2,3,3',5,5' / 6.767 / 178 / 2,2',3,4,4',5,6 / 7.117
41 / 2,2',3,5 / 5.757 / 110 / 2,3,3',5,6 / 6.457 / 179 / 2,2',3,4,4',5,6' / 7.207
42 / 2,2',3,5' / 5.811 / 111 / 2,3,3',5',6 / 6.547 / 180 / 2,2',3,4,4',5',6 / 7.207
43 / 2,2',3,6 / 5.537 / 112 / 2,3,4,4',5 / 6.657 / 181 / 2,2',3,4,4',6,6' / 6.857
44 / 2,2',3,6' / 5.537 / 113 / 2,3,4,4',6 / 6.497 / 182 / 2,2',3,4,5,5',6 / 7.933
45 / 2,2',4,4' / 6.291 / 114 / 2,3,4,5,6 / 6.304 / 183 / 2,2',3,4,5,6,6' / 6.697
46 / 2,2'4,5 / 5.787 / 115 / 2,3,4',5,6 / 6.467 / 184 / 2,2',3,4',5,5',6 / 7.177
47 / 2,2',4,5' / 6.221 / 116 / 2,3',4,4',5 / 7.121 / 185 / 2,2',3,4',5,6,6' / 6.827
48 / 2,2',4,6 / 5.637 / 117 / 2,3',4,4',6 / 6.587 / 186 / 2,3,3',4,4',5,5' / 7.717
49 / 2,2',4,6' / 5.637 / 118 / 2,3',4,5,5' / 6.797 / 187 / 2,3,3',4,4',5,6 / 7.467
50 / 2,2',5,5' / 6.091 / 119 / 2,3',4,5',6 / 6.647 / 188 / 2,3,3',4,4',5',6 / 7.557
51 / 2,2',5,6' / 5.627 / 120 / 2',3,3',4,5 / 6.647 / 189 / 2,3,3',4,5,5',6 / 7.527
52 / 2,2',6,6' / 5.904 / 121 / 2',3,4,4',5 / 6.747 / 190 / 2,3,3',4',5,5',6 / 7.527
53 / 2,3,3',4 / 6.117 / 122 / 2',3,4,5,5' / 6.737 / 191 / 2,2',3,3',4,4',5,5' / 8.683
54 / 2,3,3',4' / 6.117 / 123 / 2',3,4,5,6' / 6.517 / 192 / 2,2',3,3',4,4',5,6 / 7.567
55 / 2,3,3',5 / 6.177 / 124 / 3,3',4,4'5 / 6.897 / 193 / 2,2',3,3',4,4',5',6 / 7.657
56 / 2,3,3',5' / 6.177 / 125 / 3,3',4,5,5' / 6.957 / 194 / 2,2',3,3',4,4',6,6' / 7.307
57 / 2,3,3',6 / 5.957 / 126 / 2,2',3,3',4,4' / 6.961 / 195 / 2,2',3,3',4,5,5',6 / 7.627
58 / 2,3,4,4' / 5.452 / 127 / 2,2',3,3',4,5 / 7.321 / 196 / 2,2',3,3',4,5,6,6' / 7.207
59 / 2,3,4,5 / 5.943 / 128 / 2,2',3,3',4,5' / 7.391 / 197 / 2,2',3,3',4,5',6,6' / 7.277
60 / 2,3,4,6 / 5.897 / 129 / 2,2',3,3',4,6 / 6.587 / 198 / 2,2',3,3',4',5,5',6 / 7.627
61 / 2,3,4',5 / 6.177 / 130 / 2,2',3,3',4,6' / 6.587 / 199 / 2,2',3,3',5,5',6,6' / 8.423
62 / 2,3,4',6 / 5.957 / 131 / 2,2',3,3',5,5' / 6.867 / 200 / 2,2',3,4,4',5,5',6 / 7.657
63 / 2,3,5,6 / 5.867 / 132 / 2,2',3,3',5,6 / 7.304 / 201 / 2,2',3,4,4',5,6,6' / 7.307
64 / 2,3',4,4' / 5.452 / 133 / 2,2',3,3',5,6' / 7.151 / 202 / 2,3,3',4,4',5,5',6 / 8.007
65 / 2,3',4,5 / 6.207 / 134 / 2,2',3,3',6,6' / 6.511 / 203 / 2,2',3,3',4,4',5,5',6 / 9.143
66 / 2,3',4,5' / 6.267 / 135 / 2,2',3,4,4',5' / 7.441 / 204 / 2,2',3,3',4,4',5,6,6' / 7.747
67 / 2,3',4,6 / 6.047 / 136 / 2,2',3,4,4',6 / 6.677 / 205 / 2,2',3,3',4,5,5',6,6' / 8.164
68 / 2,3',4',5 / 6.231 / 137 / 2,2',3,4,4',6' / 6.677 / 206 / 2,2',3,3',4,4',5,5',6,6' / 9.603
69 / 2,3',4',6 / 5.987 / 138 / 2,2',3,4,5,5' / 7.592
Table 2. Results of partial F test and Steiger’s Z test: GA-MLR descriptors
Partial F testModel / Descriptor(s) / F* (p) / Fadj (p)
M1 / IhDDJCt / n.a. (n.a.) / n.a. (n.a.)
M2 / IhDDJCt, iIPRJCg / 15.05 (1.41∙10-4) / 10.30 (1.55∙10-3)
M3 / IhDDJCt, iIPRJCg, IiPDLCg / 34.09 (2.07∙10-8) / 22.19 (4.60∙10-6)
M4 / IhDDJCt, iIPRJCg, IiPDLCg, IMDRLHt / 37.17 (5.46∙10-9) / 24.01 (1.97∙10-6)
Steiger’s Z test
Comparison / Correlation coefficients† / Z‡ / p
M2 vs. M1 / 0.9324 vs. 0.9272 (using 0.9944 in between) / 1.9252 / 2.71∙10-2
M3 vs. M2 / 0.9425 vs. 0.9324 (using 0.9894 in between) / 2.8744 / 2.02∙10-3
M4 vs. M3 / 0.9516 vs. 0.9425 (using 0.9904 in between) / 2.9983 / 1.36∙10-3
* the largest model was compared to the previously smaller (M2 compared to M1, etc.);
Partial F test: ▪ F = (SSReg(i+1)-SSRegi)/MSRes(i+1); ▪ Fadj = (SSReg(i+1)-SSRegi)/√(MSRes(i+1)2+MSResi2) (Adjusted F value),
where: SSReg = regression sum of squares; MSRes = residual mean square; i = number of descriptors in the smallest model
n.a. = not applicable
Hypotheses of the partial F test at a significance level of 5%: H0 = the two models estimate Y equally well &
Ha = the larger model does a better job of estimating Y.
† between Y and ŶM(i), where i = number of descriptors;
‡ Steiger’s Z parameter
¶ Adjusted F value is an approximate solution to the Behrens-Fisher problem
variance estimate is approximated using the Welch-Satterthwaite equation (see below);
Table 3. GA-MLR analysis: training vs. test experiment
Training set / Test setIntercept / iIPRJCg / IiPDLCg / IMDRLHt / IhDDJCt / ntr / r2tr / Ftr / nte / r2ts / Fts
11.74 / -2.10 / -9.65 / -0.03 / -0.06 / 120 / 0.9105 / 292* / 86 / 0.9007 / 164*
12.34 / -2.18 / -10.35 / -0.03 / -0.06 / 122 / 0.9003 / 264* / 84 / 0.9167 / 195*
11.25 / -1.99 / -9.18 / -0.02 / -0.05 / 124 / 0.9081 / 294* / 82 / 0.9030 / 170*
9.88 / -1.72 / -7.43 / -0.02 / -0.05 / 126 / 0.9006 / 274* / 80 / 0.9121 / 179*
10.17 / -1.66 / -8.11 / -0.02 / -0.05 / 128 / 0.9022 / 284* / 78 / 0.9071 / 170*
11.98 / -2.11 / -10.14 / -0.02 / -0.06 / 130 / 0.9069 / 304* / 76 / 0.9065 / 150*
11.29 / -2.04 / -9.46 / -0.02 / -0.05 / 132 / 0.8957 / 272* / 74 / 0.9254 / 209*
11.35 / -2.01 / -9.08 / -0.02 / -0.06 / 134 / 0.9018 / 265* / 72 / 0.9059 / 158*
11.64 / -2.02 / -9.41 / -0.03 / -0.06 / 136 / 0.9074 / 321* / 70 / 0.8959 / 133*
11.69 / -2.01 / -9.84 / -0.02 / -0.06 / 138 / 0.9070 / 324* / 68 / 0.9098 / 120*
11.56 / -2.09 / -9.66 / -0.02 / -0.05 / 140 / 0.8944 / 286* / 66 / 0.9247 / 181*
11.90 / -2.17 / -9.99 / -0.02 / -0.06 / 142 / 0.9085 / 340* / 64 / 0.8951 / 117*
10.79 / -1.86 / -8.89 / -0.02 / -0.05 / 144 / 0.9028 / 323* / 62 / 0.9156 / 141*
11.06 / -1.92 / -8.97 / -0.02 / -0.06 / 146 / 0.9004 / 319* / 60 / 0.9235 / 148*
10.74 / -1.86 / -8.49 / -0.02 / -0.05 / 148 / 0.9140 / 380* / 58 / 0.8924 / 103*
11.23 / -1.91 / -9.18 / -0.02 / -0.06 / 150 / 0.8967 / 314* / 56 / 0.9271 / 150*
11.72 / -2.06 / -9.49 / -0.02 / -0.06 / 152 / 0.9101 / 372* / 54 / 0.8945 / 98*
9.73 / -1.79 / -7.44 / -0.01 / -0.05 / 154 / 0.9051 / 355* / 52 / 0.9183 / 132*
9.45 / -1.65 / -7.32 / -0.01 / -0.04 / 156 / 0.9011 / 343* / 50 / 0.9154 / 95*
11.12 / -1.94 / -9.09 / -0.02 / -0.06 / 158 / 0.9012 / 349* / 48 / 0.9248 / 129*
11.46 / -2.08 / -9.51 / -0.02 / -0.06 / 160 / 0.9039 / 364* / 46 / 0.9201 / 104*
11.25 / -2.00 / -9.12 / -0.02 / -0.06 / 162 / 0.8922 / 325* / 44 / 0.9401 / 148*
11.02 / -1.94 / -9.11 / -0.02 / -0.05 / 164 / 0.8977 / 349* / 42 / 0.9300 / 115*
11.16 / -2.00 / -9.09 / -0.02 / -0.06 / 166 / 0.9006 / 365* / 40 / 0.9200 / 100*
11.36 / -2.00 / -9.31 / -0.02 / -0.06 / 168 / 0.9098 / 411* / 38 / 0.8810 / 60*
11.06 / -1.96 / -9.02 / -0.02 / -0.06 / 170 / 0.9038 / 387* / 36 / 0.9167 / 75*
11.53 / -2.06 / -9.56 / -0.02 / -0.06 / 172 / 0.9060 / 402* / 34 / 0.9084 / 65*
11.03 / -1.98 / -9.19 / -0.01 / -0.05 / 174 / 0.8941 / 356* / 32 / 0.9264 / 64*
11.62 / -2.03 / -9.51 / -0.02 / -0.06 / 176 / 0.9058 / 411* / 30 / 0.9149 / 57*
11.16 / -1.95 / -9.10 / -0.02 / -0.06 / 178 / 0.9032 / 403* / 28 / 0.9283 / 73*
11.35 / -2.03 / -9.37 / -0.02 / -0.05 / 180 / 0.9015 / 400* / 26 / 0.9271 / 55*
iIPRJCg, IiPDLCg, IMDRLHt, IhDDJCt = MDF members of GA-MLR model;
Intercept = intercept of the GA-MLR model;
ntr, nte = number of PCBs in training and test set; tr = training set; ts= test set;
r2tr, r2ts = determination coefficient in training and test set;
Ftr, Fts = Fisher value of the regression model;
* p < 0.001
Figure 1. Contribution of descriptors to the first two factors in principal components and classification analysis
Appendix 1. Algorithmic complexity on complete and GA search for model with 4 descriptors
- Complete search (62750 valid descriptors):
- GA algorithm: , where first round parenthesis is the probable number of regressions in cultivar per generation; exp(15) is the estimated value of generation in which the observed maximum is obtained using the tournament selection (averaged and rounded from a different experiment of 46 independent runs - complete results available upon request); 48 is the product between number of possible linearization (6) and number of persistent genotypes in cultivar(8) - which was not modified in generation (12 genotypes in cultivar, 4 changed through cross-over and mutation); f is the number obtained by division of alive descriptors (167089 in our case) to the product between total number of MDF descriptors (131328) and to the number of possible linearization (6).
The difference in the number of MDF descriptors on complete (62750) and GA search (167089) is due to the deletion of all similar descriptors (using a correlation limit of 0.999) in complete search.