Supernova Recognition using Support Vector Machines

Raquel Romano

Large digital sky surveys are a rapidly growing source of astronomical data that are creating novel opportunities for scientific research, such as measuring the changing expansion rate of the universe via observations of Type Ia supernovae. However, the search for rare celestial objects in large sets of digital imagery poses enormous computational challenges for the efficient analysis of vast amounts of data. Statistical learning techniques offer powerful, adaptive methods for the automatic detection, recognition, and classification of rare objects from digital imagery. This talk presents a novel application of supervised learning in the form of Support Vector Machines (SVMs) to the recognition of Type Ia supernovae from geometric and photometric measurements (features) of hundreds of thousands of subimages per night. We address several difficulties inherent to digital sky surveys that preclude straightforward application of standard learning methods: 1) noisy and corrupt imagery resulting in high levels of feature uncertainty, 2) feature measurements with heavy-tailed, peaked distributions, 3) extremely imbalanced and overlapping positive and negative data sets, i.e., as few as 1 out of every 10,000 sample subimages contain an actual supernova, and 4) the need to reach very high supernova recognition rates while reducing the burdensome human workload of manually eliminating false candidates. We present results on several data sets as well as the current impact of our analysis on a large-scale supernova survey, the Nearby Supernova Factory at LBNL, that uses an SVM decision value to screen up to 600,000 potential supernovae per night. This capability has great potential to improve the search efficiency of both this and other digital sky surveys that are slated to collect orders of magnitude more imagery in the near future.