Applications

superresolution imaging

superres 

The diffraction of light imposes a physical limit on the resolution of optical images. However, with prior information about the nature of an image, it is possible to resolve images well below this limit. For images of a collection of point sources of light, we can attempt to resolve the points using ADCG. The parameters \(\theta_1,\ldots,\theta_n\) denote the locations of \(n\) point sources (in \(\mathbb{R}^2\) or \(\mathbb{R}^3\)), and \(w_i\) denotes the intensity, or brightness, of the \(i\)th source. The image of the \(i\)th source is given by \(w_i\psi(\theta_i)\), where \(\psi\) is the pixelated point spread function of the imaging apparatus.

By solving a sparse inverse problem it is possible to localize the point sources better than the diffraction limit — even with extreme pixelization. Astronomers use this framework to deconvolve images of stars to angular resolution below the Rayleigh limit. In biology this tool has revolutionized imaging of subcellular features, and a variant of this framework allows imaging through scattering media such as tissue.

neural spike identification

neural spikes 

Consider the voltage \(v\) recorded by an extracellular electrode implanted in the vicinity of a population of neurons. Suppose that this population of neurons contains \(K\) types of neurons, and that when a neuron of type \(k\) fires at time \(t \in \mathbb{R}\), an action potential of the form \(\psi(t,k)\) is recorded. Here \(\psi : \mathbb{R} \times \{1,\ldots,K\} \to \mathbb{R}^d\) is a vector of voltage samples. If we denote the parameters of the \(i\)th neuron by \(\theta_i = (t_i, k_i)\), then the total voltage \(v\in \mathbb{R}^d\) can be modeled as a superposition of these action potentials:

\[ v = \sum_{i=1}^n w_i\psi(\theta_i). \]

Here the weights \(w_i > 0\) can encode the distance between the \(i\)th neuron and the electrode. The sparse inverse problem in this application is to recover the parameters \(\theta_1,\ldots,\theta_n\) and weights \(w_1,\ldots,w_n\) from the voltage signal \(v\).

lidar

lidar 

Lidar is a remote sensing technology that works by shooting ultra-short laser pulses and analyzing the reflected light. The many applications of lidar include exquisite distance measurements for generating detailed maps or for obstacle detection in autonomous cars. The reflected laser pulse can even measure the atmosphere and the wind.

We now show how to recover distance measurements from a lidar waveform by solving a sparse inverse problem. We work under the assumption that when the laser illuminates an object at distance \(\theta\), we measure a reflected pulse that is well approximated by some known function \(\psi(\theta)\). Here \(\psi : \mathbb{R} \to \mathbb{R}^d\) is some function that records a discrete time series of samples of the reflected pulse. Hence if the laser pulse partially illuminates a collection of objects (say part of the street and part of the curb), then the measured reflected waveform is the linear combination \(\sum_{i=1}^n w_i \psi(\theta_i)\). Recovering the parameters \(\theta_i\) and weights \(w_i\) is a sparse inverse problem.

designing radiation therapy

radiation therapy 

External radiation therapy is a common treatment for cancer in which several beams of radiation are fired at the patient to irradiate tumors. The collection of beam parameters (their intensities, positions, and angles) is called the treatment plan, and is chosen to minimize an objective function specified by an oncologist. The objective usually rewards giving large doses of radiation to tumors, and low dosages to surrounding healthy tissue and vital organs. Plans with few beams are desired as repositioning the emitter takes time — increasing the cost of the procedure and the likelihood that the patient moves enough to invalidate the plan.

A beam fired with intensity \(w>0\) and parameters \(\theta\) delivers a radiation dosage \(w\psi(\theta)\in \mathbb{R}^d\). Here the output is interpreted as the radiation delivered to each of \(d\) voxels in the body of a patient. The radiation dosage from beams with parameters \(\theta_1,\ldots,\theta_n\) and intensities \(w_1,\ldots,w_n\) add linearly, and the objective function is convex.

matrix completion

netflix 

The task of matrix completion is to estimate all entries of a large matrix given observations of a few entries. Clearly this task is impossible without prior information or assumptions about the matrix. If we believe that a low-rank matrix well-approximates the true matrix, a common heuristic is to minimize the squared error subject to a nuclear norm bound. We solve the following optimization problem:

\[ \min_{\|A\|_* \le \tau} \|M(A) - y \|^2. \]

Here \(M\) is the masking operator, that is, the linear operator that maps a matrix \(A\in \mathbb{R}^{n\times m}\) to the vector containing its observed entries, and \(y\) is the vector of observed entries. We can rephrase this in our notation by letting \(\Theta = \{(u,v) \in \mathbb{R}^n \times \mathbb{R}^m : \|u\|_2 = \|v\|_2 = 1 \}\), \(\psi((u,v)) = M(uv^T)\), and \(\ell(\cdot) = \|\cdot \|^2.\)

linear system identification

Linear time-invariant (LTI) dynamical systems describe the evolution of an output \(y_t\in \mathbb{R}\) based on the input \(u_t\in\mathbb{R}\), where \(t \in \mathbb{Z}_+\) indexes time. The internal state at time \(t\) of the system is parameterized by a vector \(x_t\in \mathbb{R}^m\), and its relationship to the output is described by

\[ x_{t+1} = Ax_t + Bu_t \]

\[ y_t = Cx_t. \]

Here \(C\) is a fixed matrix, while \(x_0, A\), and \(B\) are unknown parameters.

The task of learning these parameters from input, output data can be posed as a sparse inverse problem as follows. Each source is a small LTI system parameterized by \((x_0, r, \alpha, B)\) where \(x_0\) and \(B\) both lie in the \(\ell_\infty\) unit ball in \(\mathbb{R}^2\), \(r\) is in \([0,1]\), and \(\alpha\) is in \([0,\pi]\). The LTI system that each source describes has

\[A = r \begin{bmatrix} \cos(\alpha) & -\sin(\alpha) \\ \sin(\alpha) & \cos(\alpha) \end{bmatrix}, \qquad C = \begin{bmatrix} 1 & 0 \end{bmatrix}. \]

The mapping \(\psi\) from the parameters \((x_0, r, \alpha, B)\) to the output of the corresponding LTI system on input \(u_1, \ldots, u_T\) is differentiable. In terms of the overall LTI system, adding the output of two weighted sources corresponds to concatenating the corresponding parameters.

bayesian experimental design

In experimental design we seek to estimate a vector \(x \in \mathbb{R}^d\) from measurements of the form

\[ y_i = f(\theta_i)^Tx + \epsilon_i. \]

Here \(f : \Theta \rightarrow \mathbb{R}^d\) is a known differentiable feature function and \(\epsilon_i\) are independent noise terms. We want to choose \(\theta_i, \ldots, \theta_k\) to minimize our uncertainty about \(x\) if each measurement requires a costly experiment, this corresponds to getting the most information from a fixed number of experiments.

In general, this task in intractable. However, if we assume \(\epsilon_i\) are independently distributed as standard normals and \(x\) comes from a standard normal prior we can analytically derive the posterior distribution of \(x\) given \(y_1, \ldots, y_k\), as the full joint distribution of \(x , y_1, \ldots, y_m\) is normal.

One notion of how much information \(y_1, \ldots, y_m\) carry about \(x\) is the entropy of the posterior distribution of \(x\) given the measurements. We can then choose \(\theta_1, \ldots, \theta_k\) to minimize the entropy of the posterior, which is equivalent to minimizing the (log) volume of an uncertainty ellipsoid. With this setup, the posterior entropy is (up to additive constants and a positive multiplicative factor) simply

\[ -\log\det \left( I + \sum_i f(\theta_i) f(\theta_i)^T \right)^{-1}. \]

To put this in our framework, we can take \(\psi(\theta) = f(\theta)f(\theta)^T\), \(y = 0\) and \(\ell(M) = -\log\det ( I + M)^{-1}\). We relax the requirement to choose exactly \(k\) measurement parameters and instead search for a sparse measure with bounded total mass, giving us a sparse inverse problem.

design of numerical quadrature rules

In many numerical computing applications we require fast procedures to approximate integration against a fixed measure. One way to do this is use a quadrature rule:

\[ \int f(\theta) dp(\theta) \simeq \sum_{i=1}^k w_i f(x_i). \]

The quadrature rule, given by \(w_i \in \mathbb{R}\) and \(x_i \in \Theta\), is chosen so that the above approximation holds for functions \(f\) in a certain function class. The pairs \((x_i,w_i)\) are known as quadrature nodes. In practice, we want quadrature rules with very few nodes to speed evaluation of the rule.

Often we don't have an a priori description of the function class from which \(f\) is chosen, but we might have a finite number of examples of functions in the class, \(f_1, \ldots, f_d\), along with their integrals against \(p\), \(y_1, \ldots, y_d\). In other words, we know that

\[ \int f_i(\theta) dp(\theta) = y_i. \]

A reasonable quadrature rule should approximate the integrals of the known \(f_i\) well.

We can phrase this task as a sparse inverse problem where each source is a single quadrature node. In our notation, \(\psi(\theta) = (f_1(\theta), \ldots, f_d(\theta))\). Assuming each function \(f_i\) is differentiable, \(\psi\) is differentiable. A common choose of \(\ell\) for this application is simply the squared loss. Note that in this application there is no need to constraint the weights to be positive.

fitting mixture models

Given a parametric distribution \(P(x | \theta)\) we consider the task of recovering the components of a mixture model from i.i.d. samples. To be more precise, we are given data \(\{x_1, \ldots, x_d \}\) sampled i.i.d. from a distribution of the form \(P(x) = \int_{\theta \in \Theta} P(x | \theta) \pi(\theta)\). The task is to recover the mixing distribution \(\pi\). If we assume \(\pi\) is sparse, we can phrase this as a sparse inverse problem. To do so, we choose \(\psi(\theta) = (P(x_i | \theta))_{i=1}^d\). A common choice for \(\ell\) is the (negative) log-likelihood of the data: i.e., \(y = 0\), \(\ell(p) = - \sum_i \log p_i.\) The obvious constraint is \(\int d\pi(\theta) \le 1.\)