# Tutorial

## 1D gaussian deconvolution

To demonstrate how to use ADCG, we'll walk through the process of applying ADCG to 1D gaussian deconvolution.

In this problem, the task is to decompose a 1d function into a sum of a few shifted gaussian functions. In other words, we get noisy observations of a function $$o(x)$$ at a prespecified collection of points (often a 1d grid) and our task is to approximate $$o(x)$$ as a sum of a few weighted gaussians: $$\hat{o}(x) = \sum_{i=1}^k w_i e^{-(c_i - x)^2}$$. In some applications, the function $$o(t)$$ really is a sum of a few shifted gaussians, in which case we hope to recover their positions.

Even this simple model has applications in multi-return lidar.

In pictures, we want to take observations like this: Here the observed locations $$x$$ are a simple grid of $$40$$ points. We want to return estimates of $$c_i$$ and $$w_i$$ like this: Above we've plotted the estimated locations and weights as vertical lines on top of the denoised signal: $$\sum_{i=1}^k w_i e^{-(c_i - x)^2}$$.

Because this is simulated data, we can plot the true locations to see how we did: ## 1d gaussian deconvolution as a sparse inverse problem

Recall the definition of the sparse inverse problem:

sparse inverse problem

$\hbox{minimize} \quad \ell(\Phi \mu - y) \quad\hbox{subject to}\quad |\mu|(\Theta) \le \tau$

We need to specify $$\ell$$, $$y$$, $$\phi$$, $$d$$, and $$\Theta$$. We'll denote the $$x$$-locations at which we observe $$o(x)$$ as $$x_1, \ldots, x_d$$ (in the plots above this would the 40 equally spaced points between -5 and 5) and the observed values as $$y_1, \ldots, y_d$$. Then we can take $$d = 40, \Theta = \mathbb{R}, \ell = \| \cdot \|_2^2$$. The last object we have to specify is $$\psi$$. $$\psi(\theta) = (\exp(-\frac{(x_1 - \theta)^2}{2}), \ldots, \exp(-\frac{(x_d - \theta)^2}{2}))$$.

Here's an example using the excellent autograd package for python.

# In this file we'll go over a simple example of 1D deconvolution using ADCG.
import scipy.optimize
# And scipy.optimize for nonconvex optimization and non-negative least squares
import pylab
pylab.ion()
# And pylab for plotting.

### The optimization problem we'll be solving is
#
# \min_{μ ≥ 0} || ∫ ψ(θ) dμ(θ) - y ||_2^2.
#
# In this example, Θ will be [0,1] and  ψ(θ) is a gaussian centered at
# θ evaluated at a series of points.

evaluation_points = np.linspace(0,1,50)
sigma = 0.1

def ψ(θ):
return np.exp(-((evaluation_points - θ)/sigma)**2)

pylab.plot(evaluation_points, ψ(0.3))

def Ψ(ws, θs):
return np.sum(np.array([w*ψ(θ) for (w,θ) in zip(ws, θs)]),0)

def ℓ(ws, θs):
return ((Ψ(ws, θs)-y)**2).sum()

# now we'll generate the ground truth \theta and weights
num_true_θs = 4
noise_level = 0.1
true_θs = np.random.rand(num_true_θs)
true_weights = np.random.rand(num_true_θs) + 1
y =  Ψ(true_weights, true_θs)+ np.random.randn(len(evaluation_points))*noise_level
pylab.figure()
pylab.plot(evaluation_points, y)

# In each iteration of ADCG the first step is the "linear minimization oracle",
# which returns \argmin_{θ ∈ Θ} <ψ(θ), v>.
# For this example we'll grid Θ to approximately solve this problem.
grid_points = np.linspace(0,1,30)
grid_psi = np.stack([ψ(θ) for θ in grid_points])

def lmo(v):
scores = grid_psi @ v
return grid_points[scores.argmin()]

### The second step of ADCG requires us to (attempt to) solve
#
# \min_{(w_i ≥ 0, θ_i)}  || ∑_i w_i Θ_i - y ||_2^2.
#
# We'll try two different methods: first, block coordinate descent over w and θ,
# then joint optimization over w and θ.

def coordinate_descent(θs, iters = 35, min_drop = 1E-5):
def min_ws():
return scipy.optimize.nnls(np.stack([ψ(θ) for θ in θs]).T, y)
def min_θs():
return res['x'], res['fun']
old_f_val = np.Inf
for iter in range(iters):
ws = min_ws()
θs,f_val = min_θs()
# check if loss is stationary
if old_f_val - f_val < min_drop:
break
old_f_val = f_val
return ws, θs

### Because we don't have a constraint on the total variation, it's very easy to
# do joint optimization. Let's try it.
def local_search(θs):
n = len(θs)
# stack thetas and ws into a vector
def f(x):
return ℓ(x[:n], x[n:])
x_init = np.concatenate((np.zeros(n), θs))
x = res['x']
ws = x[:n]
θs = x[n:]
return ws, θs

θs = np.zeros(0)
output = np.zeros(len(evaluation_points))
history = []
for iter in range(max_iters):
residual = output - y
loss = (residual**2).sum()
print(iter,loss)
history.append((loss, θs))
θ = lmo(residual)
ws, θs = local_update(np.append(θs,θ))
output = Ψ(ws, θs)
return history
### And a heuristic for selecting the number of true sources
def select_k(history):
drop = np.array([history[i]-history[i+1] for i in range(len(history)-1)])
k_hat = np.argmax(drop<0.1)
return history[k_hat]
### Let's run it!
print("True θs:", true_θs)
print("Coordinate descent estimate:", theta_cd)
print("Joint optimization estimate:", theta_nl)


To play around a bit more, try making the following modifications to the script and checking how they affect the output:

• Make the bumps closer together, or add more bumps. When does ADCG fail?