CIS526: Homework 6

Assigned: November 06, 2007

Due: due Nov 12, 2007 in class

Homework Policy

All assignments are INDIVIDUAL! You may discuss the problems with your colleagues, but you must solve the homework by yourself. Please acknowledge all sources you use in the homework (papers, code or ideas from someone else).Assignments should be submitted in class on the day when they are due. No credit is given for assignments submitted at a later time, unless you have a medical problem.

Reading Assignment

Read the following two papers and write a ½ page report for each. You should report on:

  • the motivation for the paper,
  • approach and methodology,
  • main experimental results and conclusions drawn from there
  • give your opinion on the strengths and weaknesses of the paper

To get the full credit, please refrain from copy-pasting sentences from the paper, but try to give a summary in your own words.

PAPER 1 (Presented by Siyuan Ren in class):

Mitra, C. Murthy and S. Pal, ”A Probabilistic Active Support Vector Learning Algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no.3, pp.413 - 418, 2004.

PAPER 2 (Presented by Michael Baranthan in class):

K. Q. Weinberger, F. Sha, and L. K. Saul. “Learning a kernel matrix for nonlinear dimensionality reduction.” In Proceedings of the Twenty First International Conference on Machine Learning (ICML-04), pages 839–846, Banff, Canada, 2004.

Both papers can be downloaded from Google (if you are at any Temple-owned computer) or through Temple University Library site.

Programming Assignment.SVM Classification

This problem is designed as a competition – you will be given a labeled data set that you can use to develop a classifier and an unlabeled data set on which you should use your classifier to predict the class of each example. You will submit your predictions and I will check the accuracy. Your score will depend on how high accuracy you achieve. Students with the most accurate predictions will get extra credit.

a)

Download SPIDER library for MATLAB. It is a very nice collection of machine learning algorithms you can get from:

Read the tutorial and get introduced to SPIDER functionality

b)

Download DataSet_1 from

It consists of two files: data1labeled.mat and data1test.mat. The first contains a matrix of dimension 1000*4, where there are 1000 examples, the first 3 columns are attributes, and the last column is class label. The second contains a matrix of dimension 1000*3, where there are 1000 examples with three attributes, but there is no class label information. Both data sets were obtained using the same data generator.

Your goal is to train a support vector machine (SVM) using SPIDER on data1labeled.mat and to use the SVM to predict class labels on data1test.mat. To build a good SVM you will need to select appropriate value of slack parameter C and an appropriate kernel function (hint: Gaussian kernel should work well; in this case you need to select the kernel width). The selection can be done by splitting the labeled data into training and validation sets and using them to explore what is the best parameter choice.

Deliverables: description of your parameter selection experiments, listing of matlab code you used, and a file data1prediction.mat that contains a vector of dimension 1000*1 that contains your class predictions for examples in data1test.mat.

c)

Download DataSet_2 from

It consists of two files: data2labeled.mat and data2test.mat. The first contains a matrix of dimension 2300*58, where there are 2300 examples, the first 57 columns are attributes, and the last column is class label. The second is of dimension 2301*57, where there are 2301 examples with 57 attributes, but there is no class label information. Both data sets were obtained using the same data generator.

Your goal is the same as in 1.b). Observe that attribute preprocessing (normalization, removal of uninformative attributes – using feature selection algorithms in SPIDER) might be helpful. If you are doing this, be very careful when applying the resulting SVM on test data. It is also possible that linear or polynomial kernels work better than Gaussian kernel.

Deliverables: description of your parameter selection experiments, listing of matlab code you used, and a file data1prediction.mat that contains a vector of dimension 2301*1 that contains your class predictions for examples in data2test.mat.

Good luck!!!