Project Summary for 2011 SULI Program at SLAC
July 1, 2011
Ashley Marie Parker
Mentor: Deborah Bard
1)Is blended two or more objects which have different redshifts superimposed?
2)Blended has“Multiple spectral peaks”? Don’t ordinary stars have many peaks?
3)Is the learning done on blended data or only on unblended?
4)How much spectral data does SDSS have?
The goal of this project is to utilize Sloan Digital Sky Survey’s data along with machine learning techniques to ultimately increase the reliability of photometric redshift analysis for “blended” galaxies. This is, to our knowledge, an original research project which will yield a photometric redshift determination method, for “blended” galaxies, to be implemented in the next generation of sky survey databases, namely LSST, Large Synaptic Survey Telescope.
The SDSS's newest data release DR8, covers approximately one third of the sky and includes all photometric measurements that will be taken with this imaging camera. A photometric redshift is measured using photometry, a method of looking at the light from an object through various filters and using the overall magnitudes per filter to determine the redshift. Photometry is much less time consuming than the alternate method of spectroscopic redshift determination. In order to spectroscopically measure redshift there must be significantly more light collected for the object, so the full spectrum can be seen rather than just the intensities per filter, which makes this method far more accurate however it is also more time consuming making it less useful for a large-scale data set. For the purpose of this project, only galaxies for which the redshift has been determined by both photometric and spectroscopic methods will be analyzed so that an accurate redshift measurement exists to test against results from new machine learning techniques.
A “blended” object is defined by the SDSS, database as a light source, could be galaxy, star, etc., for which spectral analysis shows multiple spectral peaks in the single light source, meaning there are multiple objects present. Within the database the “frames pipeline” analyzes the data to determine if a light emitting object is “blended”, if so, a de-blending algorithm is used to separate the multiple objects into “child” objects whose spectra add to become the “parent” image. Objects that are flagged as “blended” are given a unique parentID number greater than zero, otherwise parentID is set to zero for the not “blended”.
The initial step of the project was to learn SQL, Structured Query Language, which is used to write queries that acquire data from SDSS. The multitude of data available on SDSS's database makes it ideal for “training” a machine learning program such that the example data should show nearly every variation in galaxy type. This project made use of CasJobs DR8, a program which utilizes SQL queries to acquire large amounts of data, from the most recent data release. The queries allowed for request of specific useful quantities such as; spectroscopic redshift measurement, two separate photometric redshift measurements using “random forest” and “robust fit” methods, magnitudes in the bands u, g, r, I, z, parentID and uncertainty measurements for all relevant quantities.
Data was requested for all objects which are of the type “galaxy” and downloaded for use in the ROOT data analysis framework or possibly WEKA. Data was requested for both “blended” and not blended objects so that a determination can be made if the photometric redshift measurements for blended galaxies are less accurate than measurements of not blended objects. It is hypothesized that the photometric redshift measurements will be less precise for parentID > 0, if this is the case then there will be an investigation into which of the predetermined photometric redshift determination method that is most accurate.
After the best predetermined photometric redshift method has been determined, machine learning techniques will be used on the acquired data in order to find a more accurate determination method for “blended” galaxies, assuming there is one. If this study proves useful there will also be an investigation into “blending” with other object types, such as when stars and galaxies are blended. Ultimately this work is hoped to be useful for the future LSST database which will not contain all spectroscopic redshift measurements and will therefore need to make use of the photometric method.