10/26/189:53 AM
Pecan Analysis data
Got excel file NAmPecanAll.xls
Made into ARFF format
Deleted columns that were categorical
SOIL
SOIL2
Made all cols numeric except a few
SEASON 1,2,3,4
CLASS D,W
Pecan 0,1
Made pecans.arff
Header from the “variable names” worksheet
Exported the data as data.csv
Combined (in word) as Pecans.arff
Trouble at line 191
If we include thru case 4, it’s oK (pecans-small.arff)
If thru 100, not OK (pecans-small2.arff)
1 – 89 pecans-small3 no good
1-79 pecans-small4 – OK
therefore b/w 80 & 89
looked in excel file
#83 starts to be real # not 1-4
fixed in data.csv only
blew up on line 2898
fixed RCORR on station 2790
was blank
made it 1.0000
fixed only in pecans.arff
blew up on 2921 = station 2813
it was formatted as 1,032.67
removed the commas in data.csv
also fixed 2790
re-made pecans.arff
ran with weka explorer J48 –C 0.25 –M 2
remember to give extra memory
java –Xmx300m –jar weka.jar
J48 classifier – ran 10-fold validation in about 5 mins
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1
Instances: 4637
Attributes: 103
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------
MWM <= 24.28: 0 (2358.0/7.0)
MWM > 24.28
| RLOW <= 20.32: 0 (710.0/3.0)
| RLOW > 20.32
| | LPTOAE <= 0.068
| | | TRANGE <= 24.63
| | | | PERWRET <= 0.5714
| | | | | PERWLTG <= 0.375: 0 (14.0)
| | | | | PERWLTG > 0.375
| | | | | | COKLM <= 35.9: 0 (4.0/1.0)
| | | | | | COKLM > 35.9: 1 (12.0)
| | | | PERWRET > 0.5714
| | | | | RLOW <= 89.92
| | | | | | RRCORR3 <= 7.5
| | | | | | | LCRR <= 3.2087: 0 (560.0/1.0)
| | | | | | | LCRR > 3.2087
| | | | | | | | ELEV <= 15
| | | | | | | | | MWM <= 27.67: 0 (5.0)
| | | | | | | | | MWM > 27.67: 1 (6.0/1.0)
| | | | | | | | ELEV > 15: 0 (28.0)
| | | | | | RRCORR3 > 7.5
| | | | | | | SEASON = 1: 0 (0.0)
| | | | | | | SEASON = 2: 0 (0.0)
| | | | | | | SEASON = 3
| | | | | | | | ELEV <= 413: 1 (6.0)
| | | | | | | | ELEV > 413: 0 (5.0)
| | | | | | | SEASON = 4: 0 (24.0/1.0)
| | | | | RLOW > 89.92
| | | | | | RRCORR3 <= 2.5: 0 (28.0)
| | | | | | RRCORR3 > 2.5
| | | | | | | PGROW <= 29
| | | | | | | | LPET <= 3.028
| | | | | | | | | ELEV <= 290
| | | | | | | | | | WSTORAGE <= 120.4052: 1 (10.0)
| | | | | | | | | | WSTORAGE > 120.4052
| | | | | | | | | | | TEMP <= 55.6976: 1 (8.0)
| | | | | | | | | | | TEMP > 55.6976
| | | | | | | | | | | | AVWAT <= 7: 0 (5.0)
| | | | | | | | | | | | AVWAT > 7
| | | | | | | | | | | | | SUCSTAB <= 0.0111: 1 (4.0)
| | | | | | | | | | | | | SUCSTAB > 0.0111: 0 (4.0)
| | | | | | | | | ELEV > 290: 0 (4.0)
| | | | | | | | LPET > 3.028: 0 (6.0)
| | | | | | | PGROW > 29: 0 (6.0)
| | | TRANGE > 24.63
| | | | LPTOWATR <= 1.0465
| | | | | ELEV <= 1051
| | | | | | LEXPREY <= 2.7946
| | | | | | | LCOKLM <= 2.3347: 0 (16.0)
| | | | | | | LCOKLM > 2.3347
| | | | | | | | LCMAT <= 1.6691
| | | | | | | | | LBIO5 <= 4.3056: 1 (3.0)
| | | | | | | | | LBIO5 > 4.3056
| | | | | | | | | | SEASON = 1
| | | | | | | | | | | SUCSTAB <= 0.0118
| | | | | | | | | | | | COKLM <= 336.8: 0 (7.0)
| | | | | | | | | | | | COKLM > 336.8
| | | | | | | | | | | | | ELEV <= 725: 1 (3.0)
| | | | | | | | | | | | | ELEV > 725: 0 (7.0)
| | | | | | | | | | | SUCSTAB > 0.0118: 1 (3.0)
| | | | | | | | | | SEASON = 2: 0 (13.0)
| | | | | | | | | | SEASON = 3: 0 (2.0)
| | | | | | | | | | SEASON = 4: 0 (0.0)
| | | | | | | | LCMAT > 1.6691
| | | | | | | | | WATRGRC <= 4
| | | | | | | | | | WATDGRC <= 2: 0 (4.0)
| | | | | | | | | | WATDGRC > 2
| | | | | | | | | | | REVEN <= 1.3688
| | | | | | | | | | | | EXPREY <= 292.2337: 1 (2.0)
| | | | | | | | | | | | EXPREY > 292.2337: 0 (11.0/1.0)
| | | | | | | | | | | REVEN > 1.3688: 1 (4.0)
| | | | | | | | | WATRGRC > 4
| | | | | | | | | | MCM <= -1.17
| | | | | | | | | | | WATDGRC <= 1
| | | | | | | | | | | | EXPREY <= 464.7716
| | | | | | | | | | | | | TEMP <= 46.4227: 0 (3.0)
| | | | | | | | | | | | | TEMP > 46.4227: 1 (2.0)
| | | | | | | | | | | | EXPREY > 464.7716: 1 (16.0/2.0)
| | | | | | | | | | | WATDGRC > 1
| | | | | | | | | | | | TEMP <= 47.1188: 1 (16.0)
| | | | | | | | | | | | TEMP > 47.1188: 0 (2.0)
| | | | | | | | | | MCM > -1.17
| | | | | | | | | | | MEDSTAB <= 0.054
| | | | | | | | | | | | RUNGRC <= 6
| | | | | | | | | | | | | WATRGRC <= 5: 0 (4.0/1.0)
| | | | | | | | | | | | | WATRGRC > 5
| | | | | | | | | | | | | | RRCORR <= -1.5
| | | | | | | | | | | | | | | WSTORAGE <= 155.083: 1 (2.0)
| | | | | | | | | | | | | | | WSTORAGE > 155.083: 0 (5.0/1.0)
| | | | | | | | | | | | | | RRCORR > -1.5: 1 (3.0)
| | | | | | | | | | | | RUNGRC > 6: 1 (5.0)
| | | | | | | | | | | MEDSTAB > 0.054: 0 (13.0/2.0)
| | | | | | LEXPREY > 2.7946: 0 (18.0)
| | | | | ELEV > 1051: 0 (44.0/1.0)
| | | | LPTOWATR > 1.0465
| | | | | ELEV <= 710: 1 (14.0)
| | | | | ELEV > 710: 0 (3.0/1.0)
| | LPTOAE > 0.068
| | | LCOKLM <= 1.8716: 0 (46.0)
| | | LCOKLM > 1.8716
| | | | ELEV <= 1205
| | | | | CLASS = D
| | | | | | CVRAIN <= 42.1182: 1 (6.0/1.0)
| | | | | | CVRAIN > 42.1182: 0 (16.0/1.0)
| | | | | CLASS = W
| | | | | | AVWAT <= 6
| | | | | | | LCRR <= 2.9411
| | | | | | | | PERWLTG <= 0.5714
| | | | | | | | | PTORUN <= 2.633: 0 (2.0)
| | | | | | | | | PTORUN > 2.633: 1 (12.0)
| | | | | | | | PERWLTG > 0.5714
| | | | | | | | | WATD <= 304.7457: 0 (12.0)
| | | | | | | | | WATD > 304.7457
| | | | | | | | | | RHIGH <= 115.1: 0 (9.0/2.0)
| | | | | | | | | | RHIGH > 115.1: 1 (6.0)
| | | | | | | LCRR > 2.9411
| | | | | | | | LWATD <= 2.2506
| | | | | | | | | WLTGRC <= 0
| | | | | | | | | | MWM <= 25.11: 0 (2.0)
| | | | | | | | | | MWM > 25.11
| | | | | | | | | | | PGROW <= 26: 1 (16.0)
| | | | | | | | | | | PGROW > 26
| | | | | | | | | | | | RLOW <= 64.26
| | | | | | | | | | | | | REVEN <= 1.3928: 0 (4.0)
| | | | | | | | | | | | | REVEN > 1.3928: 1 (5.0/1.0)
| | | | | | | | | | | | RLOW > 64.26: 1 (9.0)
| | | | | | | | | WLTGRC > 0: 0 (2.0)
| | | | | | | | LWATD > 2.2506: 1 (163.0/6.0)
| | | | | | AVWAT > 6
| | | | | | | PERWLTG <= 0.375
| | | | | | | | RRCORR <= -3.5
| | | | | | | | | SNOWAC <= 37.9349
| | | | | | | | | | SUCSTAB <= 0.0099
| | | | | | | | | | | ELEV <= 307: 1 (7.0)
| | | | | | | | | | | ELEV > 307: 0 (8.0/1.0)
| | | | | | | | | | SUCSTAB > 0.0099: 0 (21.0)
| | | | | | | | | SNOWAC > 37.9349: 1 (2.0)
| | | | | | | | RRCORR > -3.5
| | | | | | | | | LATITUDE <= 38.73
| | | | | | | | | | WATDGRC <= 2
| | | | | | | | | | | RHIGH <= 154.43
| | | | | | | | | | | | RHIGH <= 121.41: 0 (2.0)
| | | | | | | | | | | | RHIGH > 121.41: 1 (26.0)
| | | | | | | | | | | RHIGH > 154.43: 0 (4.0)
| | | | | | | | | | WATDGRC > 2
| | | | | | | | | | | LREVEN <= 0.0918: 1 (24.0/1.0)
| | | | | | | | | | | LREVEN > 0.0918
| | | | | | | | | | | | WSTORAGE <= 136.7894: 1 (36.0/7.0)
| | | | | | | | | | | | WSTORAGE > 136.7894
| | | | | | | | | | | | | SEASON = 1
| | | | | | | | | | | | | | MWM <= 27.5: 0 (6.0)
| | | | | | | | | | | | | | MWM > 27.5
| | | | | | | | | | | | | | | LATITUDE <= 36.63: 1 (4.0)
| | | | | | | | | | | | | | | LATITUDE > 36.63: 0 (2.0)
| | | | | | | | | | | | | SEASON = 2: 0 (5.0)
| | | | | | | | | | | | | SEASON = 3
| | | | | | | | | | | | | | ELEV <= 513
| | | | | | | | | | | | | | | WRET <= 107.2064: 1 (7.0)
| | | | | | | | | | | | | | | WRET > 107.2064: 0 (2.0)
| | | | | | | | | | | | | | ELEV > 513: 0 (3.0)
| | | | | | | | | | | | | SEASON = 4
| | | | | | | | | | | | | | REVEN <= 1.307: 1 (9.0/2.0)
| | | | | | | | | | | | | | REVEN > 1.307: 0 (9.0)
| | | | | | | | | LATITUDE > 38.73
| | | | | | | | | | EXPREY <= 366.9686: 1 (3.0)
| | | | | | | | | | EXPREY > 366.9686
| | | | | | | | | | | WATRGRC <= 3
| | | | | | | | | | | | ELEV <= 556: 1 (3.0)
| | | | | | | | | | | | ELEV > 556: 0 (5.0)
| | | | | | | | | | | WATRGRC > 3: 0 (9.0)
| | | | | | | PERWLTG > 0.375: 1 (14.0)
| | | | ELEV > 1205
| | | | | PERWRET <= 0.6364
| | | | | | LET <= 1.1751: 0 (52.0)
| | | | | | LET > 1.1751
| | | | | | | MWM <= 27.83: 0 (9.0)
| | | | | | | MWM > 27.83
| | | | | | | | TRANGE <= 20.88: 1 (4.0)
| | | | | | | | TRANGE > 20.88: 0 (2.0)
| | | | | PERWRET > 0.6364
| | | | | | RHIGH <= 126.24
| | | | | | | WLTGRC <= 0: 0 (8.0/1.0)
| | | | | | | WLTGRC > 0: 1 (2.0)
| | | | | | RHIGH > 126.24: 1 (7.0)
Number of Leaves : 96
Size of the tree : 185
Time taken to build model: 13.38 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4368 94.1988 %
Incorrectly Classified Instances 269 5.8012 %
Kappa statistic 0.6889
Mean absolute error 0.0635
Root mean squared error 0.2306
Relative absolute error 33.6934 %
Root relative squared error 75.1416 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.969 0.287 0.966 0.969 0.968 0
0.713 0.031 0.73 0.713 0.721 1
=== Confusion Matrix ===
a b <-- classified as
4020 129 | a = 0
140 348 | b = 1
Conclusion:
94% accurate!!!
Kappa is low because the pecans are rare in the data set.
Should be able to do this on the command line and get the classified instances
(looked in the Weka tutorial)
in the weka directory
java –mx300m weka.classifiers.trees.J48 – C 0.25 – M 2 –t ../PecanData/pecans.arff
-d ../PecanData/J48-classifier.model
doesn’t work from command line
can’t find class weka/classifiers/trees/J48
hmmm...
try
and also add in stuff –i –k to get more info
java -cp weka.jar -mx300m weka.classifiers.trees.J48 -C 0.25 -M 2 -t ../PecanData/peca.arff -i -k -d ../PecanData/J48-classifier.model
worked!
Time taken to build model: 12.72 seconds
Time taken to test model on training data: 0.1 seconds
=== Error on training data ===
Correctly Classified Instances 4587 98.9217 %
Incorrectly Classified Instances 50 1.0783 %
Kappa statistic 0.9427
K&B Relative Info Score 412419.0911 %
K&B Information Score 2004.0102 bits 0.4322 bits/instance
Class complexity | order 0 2250.7576 bits 0.4854 bits/instance
Class complexity | scheme 277.1448 bits 0.0598 bits/instance
Complexity improvement (Sf) 1973.6128 bits 0.4256 bits/instance
Mean absolute error 0.019
Root mean squared error 0.0974
Relative absolute error 10.0721 %
Root relative squared error 31.7479 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.994 0.051 0.994 0.994 0.994 0
0.949 0.006 0.949 0.949 0.949 1
=== Confusion Matrix ===
a b <-- classified as
4124 25 | a = 0
25 463 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4373 94.3067 %
Incorrectly Classified Instances 264 5.6933 %
Kappa statistic 0.6949
K&B Relative Info Score 268786.582 %
K&B Information Score 1305.6899 bits 0.2816 bits/instance
Class complexity | order 0 2250.7629 bits 0.4854 bits/instance
Class complexity | scheme 131711.8722 bits 28.4045 bits/instance
Complexity improvement (Sf) -129461.1092 bits -27.9192 bits/instance
Mean absolute error 0.0629
Root mean squared error 0.2301
Relative absolute error 33.3937 %
Root relative squared error 74.9854 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.969 0.281 0.967 0.969 0.968 0
0.719 0.031 0.734 0.719 0.727 1
=== Confusion Matrix ===
a b <-- classified as
4022 127 | a = 0
137 351 | b = 1
looks good!
now have classifier J48-classifier.model
try to get it to classify the data
labuser% java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-classifier.model -T ../PecanData/pecans.arff -p 1
works and gives data lines like
4633 0 0.9970313825275657 0 (4634)
the values are
- the instance number (0-indexed)
- the predicted value
- the confidence in the prediction
- the actual value
- (the first attribute) – in this case, the station ID
ran to put results into J48-output.txt
opened in excel and made J48output.xls
need to fix
since the station ID comes in as (1), it is entered as a negative #!
multiplied by -1 and copied values
Tried 1b1 – lazy single nearest neighbor – took about 20 mins
=== Run information ===
Scheme: weka.classifiers.lazy.IB1
Relation: PresenceOfPecans
Instances: 4637
Attributes: 104
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
IB1 classifier
Time taken to build model: 0.16 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4392 94.7164 %
Incorrectly Classified Instances 245 5.2836 %
Kappa statistic 0.7212
Mean absolute error 0.0528
Root mean squared error 0.2299
Relative absolute error 28.0327 %
Root relative squared error 74.9065 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.97 0.244 0.971 0.97 0.97 0
0.756 0.03 0.745 0.756 0.751 1
=== Confusion Matrix ===
a b <-- classified as
4023 126 | a = 0
119 369 | b = 1
looks a little better
try K-nearest neighbors – K = 3 (3 nearest neighbors)
=== Run information ===
Scheme: weka.classifiers.lazy.IBk -K 3 -W 0
Relation: PresenceOfPecans
Instances: 4637
Attributes: 104
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
IB1 instance-based classifier
using 3 nearest neighbour(s) for classification
Time taken to build model: 0.08 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4415 95.2124 %
Incorrectly Classified Instances 222 4.7876 %
Kappa statistic 0.7449
Mean absolute error 0.0602
Root mean squared error 0.1951
Relative absolute error 31.9603 %
Root relative squared error 63.5862 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.974 0.232 0.973 0.974 0.973 0
0.768 0.026 0.775 0.768 0.772 1
=== Confusion Matrix ===
a b <-- classified as
4040 109 | a = 0
113 375 | b = 1
slightly better still
It might be worth trying a “reduced error pruned tree”
it is supposed to make smaller trees
see if it is better.
runs in less than 10 mins!!
=== Run information ===
Scheme: weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2
Relation: PresenceOfPecans
Instances: 4637
Attributes: 104
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------
MWM <= 24.5: 0 (1662.0/10.0)
MWM > 24.5
| RLOW <= 20.57: 0 (434.0/2.0)
| RLOW > 20.57
| | LPTOAE <= 0.0575
| | | TRANGE <= 26.33: 0 (437.0/27.0)
| | | TRANGE > 26.33
| | | | EXPREY <= 540.4335
| | | | | MCM <= -3.5: 0 (13.0/2.0)
| | | | | MCM > -3.5
| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)
| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)
| | | | EXPREY > 540.4335: 0 (32.0/4.0)
| | LPTOAE > 0.0575
| | | LCOKLM <= 1.9957: 0 (41.0)
| | | LCOKLM > 1.9957
| | | | WATDGRC <= 3
| | | | | LPTOAE <= 0.0751
| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)
| | | | | | RRCORR > -3.5
| | | | | | | ELEV <= 831
| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)
| | | | | | | | LRRANGE > 1.6697
| | | | | | | | | RRCORR3 <= 9
| | | | | | | | | | ELEV <= 413: 1 (23.0)
| | | | | | | | | | ELEV > 413
| | | | | | | | | | | CLIM <= 3
| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)
| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)
| | | | | | | | | | | CLIM > 3: 0 (2.0)
| | | | | | | | | RRCORR3 > 9
| | | | | | | | | | WRET <= 104.2606: 0 (8.0)
| | | | | | | | | | WRET > 104.2606: 1 (4.0)
| | | | | | | ELEV > 831: 0 (11.0)
| | | | | LPTOAE > 0.0751
| | | | | | LCRR <= 2.922
| | | | | | | RRCORR3 <= 3
| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)
| | | | | | | | PERWRET > 0.6364
| | | | | | | | | COKLM <= 693: 1 (4.0)
| | | | | | | | | COKLM > 693: 0 (4.0/2.0)
| | | | | | | RRCORR3 > 3: 0 (3.0)
| | | | | | LCRR > 2.922: 1 (218.0/38.0)
| | | | WATDGRC > 3
| | | | | LET <= 1.1957
| | | | | | PERWDEF <= 0.5: 0 (40.0)
| | | | | | PERWDEF > 0.5
| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)
| | | | | | | RLOW > 26.67: 1 (3.0)
| | | | | LET > 1.1957
| | | | | | AVWAT <= 4: 0 (11.0/4.0)
| | | | | | AVWAT > 4: 1 (7.0)
Number of Leaves : 27
Size of the tree : 53
Time taken to build model: 7.96 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4348 93.7675 %
Incorrectly Classified Instances 289 6.2325 %
Kappa statistic 0.6524
Mean absolute error 0.0729
Root mean squared error 0.2247
Relative absolute error 38.6837 %
Root relative squared error 73.2403 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.972 0.35 0.959 0.972 0.965 0
0.65 0.028 0.729 0.65 0.687 1
=== Confusion Matrix ===
a b <-- classified as
4031 118 | a = 0
171 317 | b = 1
not quite as good as the full tree but it is very fast
try other rule-generating things because they give interpretable output
try JRip
ran in 15 mins
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1
Relation: PresenceOfPecans
Instances: 4637
Attributes: 104
[list of attributes omitted]
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:
======
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)
(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)
(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)
(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)
(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)
(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)
(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)
(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0)
=> Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 69.58 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4394 94.7595 %
Incorrectly Classified Instances 243 5.2405 %
Kappa statistic 0.7153
Mean absolute error 0.0744
Root mean squared error 0.2155
Relative absolute error 39.4775 %
Root relative squared error 70.2129 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.974 0.275 0.968 0.974 0.971 0
0.725 0.026 0.765 0.725 0.744 1
=== Confusion Matrix ===
a b <-- classified as
4040 109 | a = 0
134 354 | b = 1
about as good as the J45.
for comparison purposes, do the “null model” = zeroR (pick the majority type)
=== Classifier model (full training set) ===
ZeroR predicts class value: 0
Time taken to build model: 0.02 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4149 89.476 %
Incorrectly Classified Instances 488 10.524 %
Kappa statistic 0
Mean absolute error 0.1885
Root mean squared error 0.3069
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
1 1 0.895 1 0.944 0
0 0 0 0 0 1
=== Confusion Matrix ===
a b <-- classified as
4149 0 | a = 0
488 0 | b = 1
only 89% agreement.
so the others are an improvement
Get some scored data sets for mapping
1) “J48 reduced” = the one from page 11 – using “reduced error pruning”
java -cp weka.jar -mx300m weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2-t ../PecanData/pecans.arff -i -k -d ../PecanData/J48-reduced-classifier.model
got this result
Options: -R -N 3 -Q 1 -M 2
J48 pruned tree
------
MWM <= 24.5: 0 (1662.0/10.0)
MWM > 24.5
| RLOW <= 20.57: 0 (434.0/2.0)
| RLOW > 20.57
| | LPTOAE <= 0.0575
| | | TRANGE <= 26.33: 0 (437.0/27.0)
| | | TRANGE > 26.33
| | | | EXPREY <= 540.4335
| | | | | MCM <= -3.5: 0 (13.0/2.0)
| | | | | MCM > -3.5
| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)
| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)
| | | | EXPREY > 540.4335: 0 (32.0/4.0)
| | LPTOAE > 0.0575
| | | LCOKLM <= 1.9957: 0 (41.0)
| | | LCOKLM > 1.9957
| | | | WATDGRC <= 3
| | | | | LPTOAE <= 0.0751
| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)
| | | | | | RRCORR > -3.5
| | | | | | | ELEV <= 831
| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)
| | | | | | | | LRRANGE > 1.6697
| | | | | | | | | RRCORR3 <= 9
| | | | | | | | | | ELEV <= 413: 1 (23.0)
| | | | | | | | | | ELEV > 413
| | | | | | | | | | | CLIM <= 3
| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)
| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)
| | | | | | | | | | | CLIM > 3: 0 (2.0)
| | | | | | | | | RRCORR3 > 9
| | | | | | | | | | WRET <= 104.2606: 0 (8.0)
| | | | | | | | | | WRET > 104.2606: 1 (4.0)
| | | | | | | ELEV > 831: 0 (11.0)
| | | | | LPTOAE > 0.0751
| | | | | | LCRR <= 2.922
| | | | | | | RRCORR3 <= 3
| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)
| | | | | | | | PERWRET > 0.6364
| | | | | | | | | COKLM <= 693: 1 (4.0)
| | | | | | | | | COKLM > 693: 0 (4.0/2.0)
| | | | | | | RRCORR3 > 3: 0 (3.0)
| | | | | | LCRR > 2.922: 1 (218.0/38.0)
| | | | WATDGRC > 3
| | | | | LET <= 1.1957
| | | | | | PERWDEF <= 0.5: 0 (40.0)
| | | | | | PERWDEF > 0.5
| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)
| | | | | | | RLOW > 26.67: 1 (3.0)
| | | | | LET > 1.1957
| | | | | | AVWAT <= 4: 0 (11.0/4.0)
| | | | | | AVWAT > 4: 1 (7.0)
Number of Leaves : 27
Size of the tree : 53
Time taken to build model: 6.6 seconds
Time taken to test model on training data: 0.11 seconds
=== Error on training data ===
Correctly Classified Instances 4453 96.0319 %
Incorrectly Classified Instances 184 3.9681 %
Kappa statistic 0.781
K&B Relative Info Score 287357.2097 %
K&B Information Score 1396.3146 bits 0.3011 bits/instance
Class complexity | order 0 2250.7576 bits 0.4854 bits/instance
Class complexity | scheme 19037.9527 bits 4.1057 bits/instance
Complexity improvement (Sf) -16787.1951 bits -3.6203 bits/instance
Mean absolute error 0.0644
Root mean squared error 0.1852
Relative absolute error 34.1766 %
Root relative squared error 60.3369 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.983 0.232 0.973 0.983 0.978 0
0.768 0.017 0.841 0.768 0.803 1
=== Confusion Matrix ===
a b <-- classified as
4078 71 | a = 0
113 375 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4348 93.7675 %
Incorrectly Classified Instances 289 6.2325 %
Kappa statistic 0.6524
K&B Relative Info Score 233687.7747 %
K&B Information Score 1135.1897 bits 0.2448 bits/instance
Class complexity | order 0 2250.7629 bits 0.4854 bits/instance
Class complexity | scheme 83502.5671 bits 18.0079 bits/instance
Complexity improvement (Sf) -81251.8041 bits -17.5225 bits/instance
Mean absolute error 0.0729
Root mean squared error 0.2247
Relative absolute error 38.6837 %
Root relative squared error 73.2403 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.972 0.35 0.959 0.972 0.965 0
0.65 0.028 0.729 0.65 0.687 1
=== Confusion Matrix ===
a b <-- classified as
4031 118 | a = 0
171 317 | b = 1
looks the same as when run from explorer – good!
now, classify the pecan data
java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-reduced-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/J48-reduced-output.txt
open in excel & fix to make J48-reduced-output.xls
2) Do this for the JRip from page 13 as well
java -cp weka.jar -mx300m weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1-t ../PecanData/pecans.arff -i -k -d ../PecanData/JRip-classifier.model
it gave this output:
Options: -F 3 -N 2.0 -O 2 -S 1
JRIP rules:
======
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)
(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)
(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)
(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)
(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)
(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)
(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)
(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0)
=> Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 62.44 seconds
Time taken to test model on training data: 0.19 seconds
=== Error on training data ===
Correctly Classified Instances 4448 95.9241 %
Incorrectly Classified Instances 189 4.0759 %
Kappa statistic 0.799
K&B Relative Info Score 301337.6929 %
K&B Information Score 1464.248 bits 0.3158 bits/instance
Class complexity | order 0 2250.7576 bits 0.4854 bits/instance
Class complexity | scheme 791.9992 bits 0.1708 bits/instance
Complexity improvement (Sf) 1458.7584 bits 0.3146 bits/instance
Mean absolute error 0.0607
Root mean squared error 0.1742
Relative absolute error 32.1921 %
Root relative squared error 56.7583 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.967 0.107 0.987 0.967 0.977 0
0.893 0.033 0.761 0.893 0.822 1
=== Confusion Matrix ===
a b <-- classified as
4012 137 | a = 0
52 436 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4394 94.7595 %
Incorrectly Classified Instances 243 5.2405 %
Kappa statistic 0.7153
K&B Relative Info Score 261538.1361 %
K&B Information Score 1270.479 bits 0.274 bits/instance
Class complexity | order 0 2250.7629 bits 0.4854 bits/instance
Class complexity | scheme 5543.159 bits 1.1954 bits/instance
Complexity improvement (Sf) -3292.396 bits -0.71 bits/instance
Mean absolute error 0.0744
Root mean squared error 0.2155
Relative absolute error 39.4775 %
Root relative squared error 70.2129 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.974 0.275 0.968 0.974 0.971 0
0.725 0.026 0.765 0.725 0.744 1
=== Confusion Matrix ===
a b <-- classified as
4040 109 | a = 0
134 354 | b = 1
now, classify the pecan data
java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/JRip-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/JRip-output.txt
set up in excel
Now, try doing the J48 with only the “raw variables” – not the derived ones.
This is since the tree and rule schemes seem to use derived variables mostly
it will be interesting to see if and how it works with the raw ones
The “raw” ones are:
CMAT
CRR
MWM
MCM
RHIGH
RLOW
ELEV
WSTORAGE
COKLM
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1
Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1-2,9,12-36,38-103
Instances: 4637
Attributes: 10
ELEV
COKLM
MWM
MCM
RHIGH
RLOW
CMAT
CRR
WSTORAGE
Pecan
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:
======
(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 220) and (MWM >= 26.94) and (CRR >= 909.56) and (MWM >= 27.6) => Pecan=1 (157.0/13.0)
(MWM >= 24.6) and (RLOW >= 25.91) and (COKLM >= 352) and (WSTORAGE <= 136.7894) and (ELEV <= 719) and (MCM <= 1.1) => Pecan=1 (73.0/6.0)
(MWM >= 25.9) and (RLOW >= 25.91) and (COKLM >= 340) and (MWM >= 26.94) and (ELEV <= 1030) => Pecan=1 (82.0/18.0)
(MWM >= 26.2) and (RLOW >= 26.42) and (WSTORAGE >= 188.644) and (RLOW >= 44.96) and (RHIGH >= 114.81) => Pecan=1 (27.0/2.0)
(MWM >= 24.6) and (RLOW >= 20.83) and (RLOW <= 45.47) and (MCM <= 3.56) and (RLOW >= 27.69) and (RLOW >= 41.66) => Pecan=1 (21.0/5.0)
(MWM >= 24.3) and (RLOW >= 23.62) and (CRR <= 1130.55) and (RHIGH >= 119.63) and (COKLM >= 113.75) and (ELEV <= 825) and (MCM >= -3) => Pecan=1 (24.0/5.0)
(MWM >= 26.5) and (RLOW >= 71.88) and (MWM >= 27.44) and (COKLM >= 15.9) and (ELEV <= 116) => Pecan=1 (51.0/17.0)
(MWM >= 24.2) and (RLOW >= 20.57) and (CRR <= 1097.05) and (COKLM >= 139.74) and (ELEV <= 549) and (RLOW <= 45.21) => Pecan=1 (15.0/3.0)
(MWM >= 24.1) and (WSTORAGE >= 188.644) and (RLOW >= 26.42) and (COKLM >= 563.3) and (COKLM <= 716.5) and (MCM <= 3.89) and (MWM >= 26.2) => Pecan=1 (14.0/1.0)
=> Pecan=0 (4173.0/94.0)
Number of Rules : 10
Time taken to build model: 15.6 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4366 94.1557 %
Incorrectly Classified Instances 271 5.8443 %
Kappa statistic 0.6883
Mean absolute error 0.0815
Root mean squared error 0.2216
Relative absolute error 43.2606 %
Root relative squared error 72.2221 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.968 0.283 0.967 0.968 0.967 0
0.717 0.032 0.725 0.717 0.721 1
=== Confusion Matrix ===
a b <-- classified as
4016 133 | a = 0
138 350 | b = 1
almost as good!!
so, set up a classified data set for this:
need to get data set with just raw attributes
edited data.csv to data-raw.csv
pasted into pecans.arff to make pecans-raw.arff
ran with explorer and JRip as before.
printed output to be sure it’s the same
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1
Relation: PresenceOfPecans
Instances: 4637
Attributes: 11
Site
ELEV
COKLM
MWM
MCM
RHIGH
RLOW
CMAT
CRR
WSTORAGE
Pecan
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:
======
(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 272) and (MWM >= 27) and (ELEV <= 660) => Pecan=1 (175.0/23.0)
(MWM >= 25.1) and (RLOW >= 25.91) and (WSTORAGE >= 188.644) and (RLOW >= 40.89) => Pecan=1 (76.0/11.0)
(MWM >= 24.6) and (RLOW >= 28.45) and (COKLM >= 352) and (CRR <= 1263.13) and (ELEV <= 830) and (ELEV <= 605) => Pecan=1 (74.0/12.0)
(MWM >= 26.17) and (RLOW >= 25.91) and (COKLM >= 507) and (Site <= 3496) and (MWM >= 26.94) and (COKLM <= 693) => Pecan=1 (30.0/3.0)
(MWM >= 24.3) and (RLOW >= 20.57) and (MCM <= 0.94) and (RHIGH >= 128.52) and (WSTORAGE <= 136.7894) and (ELEV <= 690) => Pecan=1 (11.0/1.0)
(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.68) and (COKLM >= 117.12) and (ELEV <= 1025) and (ELEV <= 507) and (RLOW <= 54.1) => Pecan=1 (12.0/0.0)
(MWM >= 24.3) and (RLOW >= 20.57) and (MWM >= 27.44) and (RLOW >= 71.88) and (RLOW >= 91.95) and (RHIGH <= 177.8) => Pecan=1 (27.0/7.0)
(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.44) and (Site <= 3159) and (RHIGH >= 128.02) and (ELEV <= 1050) and (MCM >= -3.3) => Pecan=1 (39.0/13.0)
(MWM >= 24) and (RLOW >= 23.62) and (WSTORAGE <= 161.2) and (RLOW >= 72.9) and (MWM >= 27.44) and (RLOW <= 81.53) => Pecan=1 (20.0/5.0)
=> Pecan=0 (4173.0/99.0)