Data Analysis and Visualization

Practical work

1.  STAR (STate of the Art) in Data Analysis and Visualization

STAR in Data Analysis and Visualization in one of these areas:

·  Bioinformatics: Genomics.

·  Bioinformatics: Proteomics.

·  Neuroscience.

·  Advertisement.

·  Census.

·  Any other scientific area justified by the student.

For the selected area, it will be necessary to present possible future research lines focused on data analysis and visualization.

The report will have at least the following sections:

·  Introduction.

·  Summary of the collected papers.

·  Methodologies and techniques used.

·  Application to datasets.

·  Results achieved.

·  Future works.

·  Other relevant information.

·  References.

·  Personal discussion.

Contact: Santiago González () and Angel Rodríguez ()

2.  Data Analysis using a Stereo Viewer

The student must implement a data mining application with a 3D interactive viewer running on a portable PC with a standard graphics card. No specific hardware devices will be required for implementing the system.

Data Analysis

The Breast Cancer dataset collects data about 286 patients of this disease checking if the cancer reappears after its diagnosis and treatment. The dataset has 9 attributes and the reference class. The required system will:

·  Represent in 3D the instances of the dataset. It will be necessary to transform a problem with dimension 9 (attributes) to a 3D space. This transformation following any of the simple transformation techniques presented in class or more complex techniques like the one described by Kandogan.

·  Implement and use the KNN (K Nearest Neighbour) method for estimating the class of the instance considering the K nearest patients.

·  Interactively select X instances and compute the percentage of well classified according to the estimated class and the correct class.

Contact: Santiago González ()

Visualization

For the development of the user interface and the visualization engine, the student will be able to use any high level graphic tool like Unity3D, Qt, Coin3D, etc.

The student will be able to choose the OS where the application will run (Windows, Mac, Linux). For the evaluation, portable code will be very well considered, although this fact is not a requirement for the implementation. Another feature to be considered in the evaluation will be the interactive reconfiguration of the 3D viewer parameters.

Contact: Angel Rodríguez ()

Documentation

The documentation will include:

·  Introduction, including a general description of how works the application.

·  Data Mining techniques or Technologies used in the case study.

·  Mathematical models used.

·  Application description.

·  Libraries used.

·  Other relevant information.

·  References.

·  Personal discussion.

References

1.  Breast Cancer Data Set. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Breast+Cancer

2.  Eser Kandogan: “Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions”

3.  Paul Bourke: Stereo rendering.

http://local.wasp.uwa.edu.au/~pbourke/projection/stereorender/

4.  Stereo tutorial: http://www.captain3d.com/stereo/html/tutorial.html

Data analysis and visualization. Advanced computing for science and engineering.