# The alternating descent conditional gradient method

A ubiquitous prior in modern statistical signal processing asserts that an observed signal is the noisy measurement of a few weighted sources. In other words, compared to the entire dictionary of possible sources, the set of sources actually present is sparse. In many cases of practical interest the sources are parameterized and the measurement of multiple weighted sources is linear in their individual measurements.

As a concrete example, consider the idealized task of identifying the aircraft that lead to an observed radar signal. The sources are the aircraft themselves, and each is parameterized by, perhaps, its position and velocity relative to the radar detectors. The sparse inverse problem is to recover the number of aircraft present, along with each of their parameters.

sparse inverse problem

$\hbox{minimize}_{N, \theta, w} \quad \ell(\sum_{i=1}^N w_i\psi(\theta_i) - y) \quad\hbox{subject to}\quad |w|_1 \le \tau.$

We want to estimate the parameters $$\theta_i$$, their number, $$N$$, and the corresponding scalar weights, $$w_i$$. This problem is nonconvex, due to $$N$$ and $$\theta_i$$.

# algorithm

ADCG combines interleaves steps from three algorithms: matching pursuit, $$\ell_1$$ constrained minimization (basis pursuit), and nonconvex optimization.

## step 1: basis pursuit

The algorithm state is a list of estimated parameters, $$\theta_1, \ldots, \theta_k$$. At the begining of each iteration the algorithm estimates the corresponding weights $$w_1, \ldots, w_k$$ by solving the following finite-dimensional convex problem:

$\hbox{minimize}_w \quad \ell(\sum_i w_i \psi(\theta_i) - y) \quad\hbox{subject to}\quad \|w\|_1 \le \tau.$

Note that some of the resulting $$w$$ may be zero, in which case we can forget the corresponding parameters.

## step 2: matching pursuit

In this step we find a single new parameter to add to our set of candidate parameters. If we let $$g = \nabla \ell(\sum_i w_i \psi(\theta_i) - y)$$, the parameter we add is the solution to the following matching pursuit problem:

$\hbox{maximize}_\theta |\psi(\theta)^Tg|.$

Note that this problem is nonconvex in $$\theta$$, but does not grow in dimensionality with each iteration. In many applications, $$\Theta$$ is low dimensional, and the matching pursuit problem is readily solved by gridding followed by gradient descent. In other applications, such as matrix completion, special structure in $$\psi$$ allows for efficient computation of the matching pursuit step.

## step 3: nonconvex optimization

Starting with the weights and parameters from step 2, we do nonconvex optimization on the functional

$(w, \theta) \mapsto \ell(\sum_i w_i \psi(\theta_i) - y)$

keeping the $$\ell_1$$ constraints.

In our implementation we do block coordinate descent over $$w$$ and $$\theta$$ to take advantage of the convexity over $$w$$, but this is a design choice. The main idea is to use the fact that $$\psi$$ is differentiable to adjust the parameters.

## applications

One can apply ADCG to a wide range of problems:

• superresolution imaging (ADCG is now state-of-the-art in high-density 2D localization! SMLMS 2016 challenge.)

• neural spike identification

• lidar

• matrix completion

• linear system identification

• bayesian experimental design

• design of numerical quadrature rules

• fitting mixture models

We give brief descriptions of these problems in the applications section of this website.