Applications

superresolution imaging

superres 

The diffraction of light imposes a physical limit on the resolution of optical images. However, with prior information about the nature of an image, it is possible to resolve images well below this limit. For images of a collection of point sources of light, we can attempt to resolve the points using ADCG. The parameters theta_1,ldots,theta_n denote the locations of n point sources (in mathbb{R}^2 or mathbb{R}^3), and w_i denotes the intensity, or brightness, of the ith source. The image of the ith source is given by w_ipsi(theta_i), where psi is the pixelated point spread function of the imaging apparatus.

By solving a sparse inverse problem it is possible to localize the point sources better than the diffraction limit — even with extreme pixelization. Astronomers use this framework to deconvolve images of stars to angular resolution below the Rayleigh limit. In biology this tool has revolutionized imaging of subcellular features, and a variant of this framework allows imaging through scattering media such as tissue.

neural spike identification

neural spikes 

Consider the voltage v recorded by an extracellular electrode implanted in the vicinity of a population of neurons. Suppose that this population of neurons contains K types of neurons, and that when a neuron of type k fires at time t in mathbb{R}, an action potential of the form psi(t,k) is recorded. Here psi : mathbb{R} times {1,ldots,K} to mathbb{R}^d is a vector of voltage samples. If we denote the parameters of the ith neuron by theta_i = (t_i, k_i), then the total voltage vin mathbb{R}^d can be modeled as a superposition of these action potentials:

 v = sum_{i=1}^n w_ipsi(theta_i).

Here the weights w_i > 0 can encode the distance between the ith neuron and the electrode. The sparse inverse problem in this application is to recover the parameters theta_1,ldots,theta_n and weights w_1,ldots,w_n from the voltage signal v.

lidar

lidar 

Lidar is a remote sensing technology that works by shooting ultra-short laser pulses and analyzing the reflected light. The many applications of lidar include exquisite distance measurements for generating detailed maps or for obstacle detection in autonomous cars. The reflected laser pulse can even measure the atmosphere and the wind.

We now show how to recover distance measurements from a lidar waveform by solving a sparse inverse problem. We work under the assumption that when the laser illuminates an object at distance theta, we measure a reflected pulse that is well approximated by some known function psi(theta). Here psi : mathbb{R} to mathbb{R}^d is some function that records a discrete time series of samples of the reflected pulse. Hence if the laser pulse partially illuminates a collection of objects (say part of the street and part of the curb), then the measured reflected waveform is the linear combination sum_{i=1}^n w_i psi(theta_i). Recovering the parameters theta_i and weights w_i is a sparse inverse problem.

designing radiation therapy

radiation therapy 

External radiation therapy is a common treatment for cancer in which several beams of radiation are fired at the patient to irradiate tumors. The collection of beam parameters (their intensities, positions, and angles) is called the treatment plan, and is chosen to minimize an objective function specified by an oncologist. The objective usually rewards giving large doses of radiation to tumors, and low dosages to surrounding healthy tissue and vital organs. Plans with few beams are desired as repositioning the emitter takes time — increasing the cost of the procedure and the likelihood that the patient moves enough to invalidate the plan.

A beam fired with intensity w>0 and parameters theta delivers a radiation dosage wpsi(theta)in mathbb{R}^d. Here the output is interpreted as the radiation delivered to each of d voxels in the body of a patient. The radiation dosage from beams with parameters theta_1,ldots,theta_n and intensities w_1,ldots,w_n add linearly, and the objective function is convex.

matrix completion

netflix 

The task of matrix completion is to estimate all entries of a large matrix given observations of a few entries. Clearly this task is impossible without prior information or assumptions about the matrix. If we believe that a low-rank matrix well-approximates the true matrix, a common heuristic is to minimize the squared error subject to a nuclear norm bound. We solve the following optimization problem:

 min_{|A|_* le tau} |M(A) - y |^2.

Here M is the masking operator, that is, the linear operator that maps a matrix Ain mathbb{R}^{ntimes m} to the vector containing its observed entries, and y is the vector of observed entries. We can rephrase this in our notation by letting Theta = {(u,v) in mathbb{R}^n times mathbb{R}^m : |u|_2 = |v|_2 = 1 }, psi((u,v)) = M(uv^T), and ell(cdot) = |cdot |^2.

linear system identification

Linear time-invariant (LTI) dynamical systems describe the evolution of an output y_tin mathbb{R} based on the input u_tinmathbb{R}, where t in mathbb{Z}_+ indexes time. The internal state at time t of the system is parameterized by a vector x_tin mathbb{R}^m, and its relationship to the output is described by

 x_{t+1} = Ax_t + Bu_t
 y_t = Cx_t.

Here C is a fixed matrix, while x_0, A, and B are unknown parameters.

The task of learning these parameters from input, output data can be posed as a sparse inverse problem as follows. Each source is a small LTI system parameterized by (x_0, r, alpha, B) where x_0 and B both lie in the ell_infty unit ball in mathbb{R}^2, r is in [0,1], and alpha is in [0,pi]. The LTI system that each source describes has

A = r begin{bmatrix} cos(alpha) & -sin(alpha)  sin(alpha) & cos(alpha) end{bmatrix}, qquad C = begin{bmatrix}     1 & 0 end{bmatrix}.

The mapping psi from the parameters (x_0, r, alpha, B) to the output of the corresponding LTI system on input u_1, ldots, u_T is differentiable. In terms of the overall LTI system, adding the output of two weighted sources corresponds to concatenating the corresponding parameters.

bayesian experimental design

In experimental design we seek to estimate a vector x in mathbb{R}^d from measurements of the form

 y_i = f(theta_i)^Tx + epsilon_i.

Here f : Theta rightarrow mathbb{R}^d is a known differentiable feature function and epsilon_i are independent noise terms. We want to choose theta_i, ldots, theta_k to minimize our uncertainty about x if each measurement requires a costly experiment, this corresponds to getting the most information from a fixed number of experiments.

In general, this task in intractable. However, if we assume epsilon_i are independently distributed as standard normals and x comes from a standard normal prior we can analytically derive the posterior distribution of x given y_1, ldots, y_k, as the full joint distribution of x ,  y_1, ldots, y_m is normal.

One notion of how much information y_1, ldots, y_m carry about x is the entropy of the posterior distribution of x given the measurements. We can then choose theta_1, ldots, theta_k to minimize the entropy of the posterior, which is equivalent to minimizing the (log) volume of an uncertainty ellipsoid. With this setup, the posterior entropy is (up to additive constants and a positive multiplicative factor) simply

 -logdet left( I + sum_i f(theta_i) f(theta_i)^T right)^{-1}.

To put this in our framework, we can take psi(theta) = f(theta)f(theta)^T, y = 0 and ell(M) = -logdet ( I + M)^{-1}. We relax the requirement to choose exactly k measurement parameters and instead search for a sparse measure with bounded total mass, giving us a sparse inverse problem.

design of numerical quadrature rules

In many numerical computing applications we require fast procedures to approximate integration against a fixed measure. One way to do this is use a quadrature rule:

 int f(theta) dp(theta) simeq sum_{i=1}^k w_i f(x_i).

The quadrature rule, given by w_i in mathbb{R} and x_i in Theta, is chosen so that the above approximation holds for functions f in a certain function class. The pairs (x_i,w_i) are known as quadrature nodes. In practice, we want quadrature rules with very few nodes to speed evaluation of the rule.

Often we don't have an a priori description of the function class from which f is chosen, but we might have a finite number of examples of functions in the class, f_1, ldots, f_d, along with their integrals against p, y_1, ldots, y_d. In other words, we know that

 int f_i(theta) dp(theta) = y_i.

A reasonable quadrature rule should approximate the integrals of the known f_i well.

We can phrase this task as a sparse inverse problem where each source is a single quadrature node. In our notation, psi(theta) = (f_1(theta), ldots, f_d(theta)). Assuming each function f_i is differentiable, psi is differentiable. A common choose of ell for this application is simply the squared loss. Note that in this application there is no need to constraint the weights to be positive.

fitting mixture models

Given a parametric distribution P(x | theta) we consider the task of recovering the components of a mixture model from i.i.d. samples. To be more precise, we are given data {x_1, ldots, x_d } sampled i.i.d. from a distribution of the form P(x) = int_{theta in Theta}  P(x | theta) pi(theta). The task is to recover the mixing distribution pi. If we assume pi is sparse, we can phrase this as a sparse inverse problem. To do so, we choose psi(theta) = (P(x_i | theta))_{i=1}^d. A common choice for ell is the (negative) log-likelihood of the data: i.e., y = 0, ell(p) = - sum_i log p_i. The obvious constraint is int dpi(theta) le 1.