The alternating descent conditional gradient method

A ubiquitous prior in modern statistical signal processing asserts that an observed signal is the noisy measurement of a few weighted sources. In other words, compared to the entire dictionary of possible sources, the set of sources actually present is sparse. In many cases of practical interest the sources are parameterized and the measurement of multiple weighted sources is linear in their individual measurements.

As a concrete example, consider the idealized task of identifying the aircraft that lead to an observed radar signal. The sources are the aircraft themselves, and each is parameterized by, perhaps, its position and velocity relative to the radar detectors. The sparse inverse problem is to recover the number of aircraft present, along with each of their parameters.

sparse inverse problem
   hbox{minimize}_{N, theta, w} quad ell(sum_{i=1}^N w_ipsi(theta_i) - y)   quadhbox{subject to}quad   |w|_1 le tau.

We want to estimate the parameters theta_i, their number, N, and the corresponding scalar weights, w_i. This problem is nonconvex, due to N and theta_i.

algorithm

ADCG combines interleaves steps from three algorithms: matching pursuit, ell_1 constrained minimization (basis pursuit), and nonconvex optimization.

step 1: basis pursuit

The algorithm state is a list of estimated parameters, theta_1, ldots, theta_k. At the begining of each iteration the algorithm estimates the corresponding weights w_1, ldots, w_k by solving the following finite-dimensional convex problem:

   hbox{minimize}_w quad ell(sum_i w_i psi(theta_i) - y)   quadhbox{subject to}quad   |w|_1 le tau.

Note that some of the resulting w may be zero, in which case we can forget the corresponding parameters.

step 2: matching pursuit

In this step we find a single new parameter to add to our set of candidate parameters. If we let g = nabla ell(sum_i w_i psi(theta_i) - y), the parameter we add is the solution to the following matching pursuit problem:

   hbox{maximize}_theta |psi(theta)^Tg|.

Note that this problem is nonconvex in theta, but does not grow in dimensionality with each iteration. In many applications, Theta is low dimensional, and the matching pursuit problem is readily solved by gridding followed by gradient descent. In other applications, such as matrix completion, special structure in psi allows for efficient computation of the matching pursuit step.

step 3: nonconvex optimization

Starting with the weights and parameters from step 2, we do nonconvex optimization on the functional

   (w, theta) mapsto ell(sum_i w_i psi(theta_i) - y)

keeping the ell_1 constraints.

In our implementation we do block coordinate descent over w and theta to take advantage of the convexity over w, but this is a design choice. The main idea is to use the fact that psi is differentiable to adjust the parameters.

applications

One can apply ADCG to a wide range of problems:

  • superresolution imaging (ADCG is now state-of-the-art in high-density 2D localization! SMLMS 2016 challenge.)

  • neural spike identification

  • lidar

  • designing radiation therapy

  • matrix completion

  • linear system identification

  • bayesian experimental design

  • design of numerical quadrature rules

  • fitting mixture models

We give brief descriptions of these problems in the applications section of this website.

paper

For a full explanation of ADCG, please read our preprint:

The Alternating Descent Conditional Gradient Method for Sparse Inverse Problems - N. Boyd, G. Schiebinger, B. Recht.

feedback

If you have any suggestions or have found ADCG useful in your work please let us know!

nick boyd

geoff schiebinger