Dr. Eick

COSC 6342“Machine Learning”Assignment2Spring 2009

First Draft

Due: Tuesday,April 21, 11p (electronic Submission); problem 9 is due Sa., April 25, 11p

  1. I---Topic8

Construct a one-dimensional classification dataset for which the leave-one-out cross validation error for 1NN is always 1—in other words, the 1NN algorithm never predicts the held out example correctly.

  1. G---Topic8 (don’t start too late solving this problem!!)

a)Download the arsenic dataset arsenic_ds1_D1.txt ignoring the class label!

b)Compute the average 5-nearest neighbor distance called d5 using Euclidian distance

c)Compute and Visualize the Gaussian Kernel density function for 22=0.5*d5, 22=d5, 22=d5*2 (see Topic8d.ppt)

d)Compute and Visualize the k-NN density function for k=3, k=5 and k=7 (see Topic8d.ppt)

e)Analyze the differences in the 6 created density functions!

f)Explain the differences in the distance functions (try your best!)!

g)Submit a report that contains your software, visualizations, and answers to questions e and f!

  1. I---Topic 10

Assume the following dataset is given with twonominal attributes A and B, and 3 different classes C1, C2, and C3. Compute the information gain for A and B. Based on your answers to the last question which test should be used as the root of a decision tree?

A / B / Class
1 / 2 / C3
1 / 1 / C3
1 / 2 / C1
1 / 2 / C1
2 / 2 / C1
2 / 1 / C2
3 / 1 / C2
3 / 1 / C2
3 / 1 / C2
3 / 2 / C2
  1. I---Topic13+14

a) Support vector machine maximize margins when creating hyperplanes. What is the motivation for doing that? Why a large margins desirable?

b) What role does C play in the Soft Margin HyperplaneApproach (section 10.9.3 of the textbook); what do slack variables measure? Assume the obtained hyperplane for a dataset of 100 examples has the following values for the slack variable: 1=2, 2=3,4=0.8, 17=0.2; i is 0 for all other examples in the dataset; what does this mean?

c) Why do most support vector machine machines map examples to a higher dimensional space?

d) What is a support vector? If we know what the support vectors are—how can this knowledge be used to speed up support vector learning?

e) What are kernel functions? Why are kernel functions popular in conjunction with support vector machines—what is their contribution in speeding up the learning process?

1