1D gaussian deconvolution

To demonstrate how to use ADCG, we'll walk through the process of applying ADCG to 1D gaussian deconvolution.

In this problem, the task is to decompose a 1d function into a sum of a few shifted gaussian functions. In other words, we get noisy observations of a function \(o(x)\) at a prespecified collection of points (often a 1d grid) and our task is to approximate \(o(x)\) as a sum of a few weighted gaussians: \(\hat{o}(x) = \sum_{i=1}^k w_i e^{-(c_i - x)^2}\). In some applications, the function \(o(t)\) really is a sum of a few shifted gaussians, in which case we hope to recover their positions.

Even this simple model has applications in multi-return lidar.

In pictures, we want to take observations like this:


Here the observed locations \(x\) are a simple grid of \(40\) points. We want to return estimates of \(c_i\) and \(w_i\) like this:


Above we've plotted the estimated locations and weights as vertical lines on top of the denoised signal: \(\sum_{i=1}^k w_i e^{-(c_i - x)^2}\).

Because this is simulated data, we can plot the true locations to see how we did:


1d gaussian deconvolution as a sparse inverse problem

Recall the definition of the sparse inverse problem:

sparse inverse problem

\[ \hbox{minimize} \quad \ell(\Phi \mu - y) \quad\hbox{subject to}\quad |\mu|(\Theta) \le \tau \]

We need to specify \(\ell\), \(y\), \(\phi\), \(d\), and \(\Theta\). We'll denote the \(x\)-locations at which we observe \(o(x)\) as \(x_1, \ldots, x_d\) (in the plots above this would the 40 equally spaced points between -5 and 5) and the observed values as \(y_1, \ldots, y_d\). Then we can take \(d = 40, \Theta = \mathbb{R}, \ell = \| \cdot \|_2^2\). The last object we have to specify is \(\psi\). \(\psi(\theta) = (\exp(-\frac{(x_1 - \theta)^2}{2}), \ldots, \exp(-\frac{(x_d - \theta)^2}{2}))\).

using ADCG

Here's an example using the excellent autograd package for python.

# In this file we'll go over a simple example of 1D deconvolution using ADCG.
import autograd.numpy as np
from autograd import grad
# We'll use autograd to compute gradients of the forward operator
import scipy.optimize
# And scipy.optimize for nonconvex optimization and non-negative least squares
import pylab
# And pylab for plotting.

### The optimization problem we'll be solving is
# \min_{μ ≥ 0} || ∫ ψ(θ) dμ(θ) - y ||_2^2.
# In this example, Θ will be [0,1] and  ψ(θ) is a gaussian centered at
# θ evaluated at a series of points.

evaluation_points = np.linspace(0,1,50)
sigma = 0.1

def ψ(θ):
    return np.exp(-((evaluation_points - θ)/sigma)**2)

pylab.plot(evaluation_points, ψ(0.3))

def Ψ(ws, θs):
    return np.sum(np.array([w*ψ(θ) for (w,θ) in zip(ws, θs)]),0)

def ℓ(ws, θs):
    return ((Ψ(ws, θs)-y)**2).sum()

# now we'll generate the ground truth \theta and weights
num_true_θs = 4
noise_level = 0.1
true_θs = np.random.rand(num_true_θs)
true_weights = np.random.rand(num_true_θs) + 1
y =  Ψ(true_weights, true_θs)+ np.random.randn(len(evaluation_points))*noise_level
pylab.plot(evaluation_points, y)

# In each iteration of ADCG the first step is the "linear minimization oracle",
# which returns \argmin_{θ ∈ Θ} <ψ(θ), v>.
# For this example we'll grid Θ to approximately solve this problem.
grid_points = np.linspace(0,1,30)
grid_psi = np.stack([ψ(θ) for θ in grid_points])

def lmo(v):
    scores = grid_psi @ v
    return grid_points[scores.argmin()]

### The second step of ADCG requires us to (attempt to) solve
# \min_{(w_i ≥ 0, θ_i)}  || ∑_i w_i Θ_i - y ||_2^2.
# We'll try two different methods: first, block coordinate descent over w and θ,
# then joint optimization over w and θ.

def coordinate_descent(θs, iters = 35, min_drop = 1E-5):
    def min_ws():
        return scipy.optimize.nnls(np.stack([ψ(θ) for θ in θs]).T, y)[0]
    def min_θs():
        res = scipy.optimize.minimize(autograd.value_and_grad(lambda θs : ℓ(ws, θs), θs, jac=True,method='L-BFGS-B',bounds=[(0.0,1.0)]*len(θs))
        return res['x'], res['fun']
    old_f_val = np.Inf
    for iter in range(iters):
        ws = min_ws()
        θs,f_val = min_θs()
        # check if loss is stationary
        if old_f_val - f_val < min_drop:
        old_f_val = f_val
    return ws, θs

### Because we don't have a constraint on the total variation, it's very easy to
# do joint optimization. Let's try it.
def local_search(θs):
    n = len(θs)
    # stack thetas and ws into a vector
    def f(x):
        return ℓ(x[:n], x[n:])
    x_init = np.concatenate((np.zeros(n), θs))
    res = scipy.optimize.minimize(autograd.value_and_grad(f), x_init, jac=True,method='L-BFGS-B',bounds=([(0.0, None)]*n)+([(0.0,1.0)]*n))
    x = res['x']
    ws = x[:n]
    θs = x[n:]
    return ws, θs

### Let's define ADCG
def ADCG(local_update, max_iters):
    θs = np.zeros(0)
    output = np.zeros(len(evaluation_points))
    history = []
    for iter in range(max_iters):
        residual = output - y
        loss = (residual**2).sum()
        history.append((loss, θs))
        θ = lmo(residual)
        ws, θs = local_update(np.append(θs,θ))
        output = Ψ(ws, θs)
    return history
### And a heuristic for selecting the number of true sources
def select_k(history):
    drop = np.array([history[i][0]-history[i+1][0] for i in range(len(history)-1)])
    k_hat = np.argmax(drop<0.1)
    return history[k_hat][1]
### Let's run it!
theta_cd = select_k(ADCG(coordinate_descent, 10))
theta_nl = select_k(ADCG(local_search, 10))
print("True θs:", true_θs)
print("Coordinate descent estimate:", theta_cd)
print("Joint optimization estimate:", theta_nl)

To play around a bit more, try making the following modifications to the script and checking how they affect the output:

  • Make the bumps closer together, or add more bumps. When does ADCG fail?

  • Add more noise

  • Change the number of evaluation points

  • Change the size of the grid