1) kNN, SVM, & Ensembles
a) The soft margin support vector machine solves the following optimization problem:
What does the firstterm minimize? Depict all non-zero iin the figure below! What is the advantage of the soft margin approach over the linear SVM approach? [5]
The (inverse) width of the margin with respect to the class1 and class2 hyperplan[1]. Depict [2; 2 errors=0 points]. Can deal with classification problems in which the examples are not linearly separable[2].
b) Referring to the figure above, explain how examples are classified by SVMs! What is the relationship between i and example i being classified correctly?[4]
Example which are above the straight line hyperplane belong to the round class, and example below the line belong to the square class [1.5]. An example will be classified correctly if iis less equal to half of the width of the hyperplane; the width w is the distance between the class1 and class2 hyperplane.[2.5].
c) How does kNN (k-nearest-neighbor) predict the class label of a new example? [2]
Find the nearest k neighbor of the example which needs to be classified; take a major vote based on the class labels of the k-nearest neighbors found.
d) Assume you want to use a nearest neighbor classifier for a particular classification task. How would you approach the problem to choosing the parameter k of a nearest neighbor classifier? [3]
Use N-fold cross validation to assess the testing accuracy of kNN classifier for different kvalues; choose the k for your final classifier for which the testing accuracy is the highest!
e) What can be said about the number of decision boundaries a kNN classifier uses? [2]
Many[2]; as many as N-1 where N is the number of training examples[1 extra point].
f) Assume you use an ensemble approach. What properties should base classifiers have to obtain a highly accurate ensemble classifier? [3]
The classifiersshould be diverse; that is, they make different errors.[2]
The classifier should be somewhat accurate; at least above 50%!.[1]
g) What role do example weights play for the AdaBoost algorithm? Why does AdaBoost modify weights of training examples? [4]
The weight determines with what probability an example is picked as a training example.[1.5]The key idea to modify weights is to obtain high weights for examples that are misclassified ultimately forcing the classification algorithms to learn a different classifier that classifies those examples correctly, but likely misclassifies other examples.[2.5]
4) Decision Trees/Classification
Compute the GINI-gain[1] for the following decision tree split (just giving the formula is fine!)[3]:
(12,4,6)(3,3,0)
(9,1,0)
(0,0,6)
G(6/11,2/11,3/11) – (6/22*G(0.5,0.5,0) + 10/22* G(0.9,0,1,0) + 0)
G(6/11,2/11.3/11)= 1-(6/11**2 + 2/11**2+3/11**2)= 1- ((36+4+9)/121)= 72/121=0.595
The worst Gini for 3 classes is: G(1/3.1/3.1/3)=0.667
a)Assume there are 3 classes and 50% of the examples belong to class1, and 25% of the examples belong to class2 and class3, respectively. Compute the entropy of this class distribution, giving the exact number not only the formula! [2]
H(1/2,1/4,1/4)= ½*log2(2)+ 2*1/4log2(4)=3/2=1.5
b)The decision tree learning algorithms is a greedy algorithm—what does this mean? [2]
Seeks the shortest path from the current state to a goal state/makes local decisions[1]; does not backtrack[1]; frequently, does not find the optimal solution[1].
c)Assume you learn a decision tree for a dataset that only contains numerical attributes (except the class attribute). What can be said about the decision boundaries that decision trees use to separate the classes? [1]
Axis parallel lines/hyperplanes (of the form att=value where att is one attribute of the dataset and value is a floating point number) where each axis corresponds
d)Why is pruning important when using decision trees? What is the difference between pre-pruning and post pruning? [4]
To come up with a decision tree that uses the correct amount of model complexity to avoid under and overfitting. [2]
Prepruning: directly prevents a tree from growing too much by using stricter termination conditions for the decision tree induction algorithm
Postpruning; Grows a large tree and then reduces it in size by replacing subtrees by leaf nodes.
1
[1]
(GINI before the split) minus (GINI after the split)