Tutorial1D gaussian deconvolutionTo demonstrate how to use ADCG, we'll walk through the process of applying ADCG to 1D gaussian deconvolution. In this problem, the task is to decompose a 1d function into a sum of a few shifted gaussian functions. In other words, we get noisy observations of a function \(o(x)\) at a prespecified collection of points (often a 1d grid) and our task is to approximate \(o(x)\) as a sum of a few weighted gaussians: \(\hat{o}(x) = \sum_{i=1}^k w_i e^{(c_i  x)^2}\). In some applications, the function \(o(t)\) really is a sum of a few shifted gaussians, in which case we hope to recover their positions. Even this simple model has applications in multireturn lidar. In pictures, we want to take observations like this: Here the observed locations \(x\) are a simple grid of \(40\) points. We want to return estimates of \(c_i\) and \(w_i\) like this: Above we've plotted the estimated locations and weights as vertical lines on top of the denoised signal: \(\sum_{i=1}^k w_i e^{(c_i  x)^2}\). Because this is simulated data, we can plot the true locations to see how we did: 1d gaussian deconvolution as a sparse inverse problemRecall the definition of the sparse inverse problem: sparse inverse problem
\[ \hbox{minimize} \quad \ell(\Phi \mu  y) \quad\hbox{subject to}\quad \mu(\Theta) \le \tau \] We need to specify \(\ell\), \(y\), \(\phi\), \(d\), and \(\Theta\). We'll denote the \(x\)locations at which we observe \(o(x)\) as \(x_1, \ldots, x_d\) (in the plots above this would the 40 equally spaced points between 5 and 5) and the observed values as \(y_1, \ldots, y_d\). Then we can take \(d = 40, \Theta = \mathbb{R}, \ell = \ \cdot \_2^2\). The last object we have to specify is \(\psi\). \(\psi(\theta) = (\exp(\frac{(x_1  \theta)^2}{2}), \ldots, \exp(\frac{(x_d  \theta)^2}{2}))\). using ADCGHere's an example using the excellent autograd package for python. # In this file we'll go over a simple example of 1D deconvolution using ADCG. import autograd.numpy as np from autograd import grad # We'll use autograd to compute gradients of the forward operator import scipy.optimize # And scipy.optimize for nonconvex optimization and nonnegative least squares import pylab pylab.ion() # And pylab for plotting. ### The optimization problem we'll be solving is # # \min_{μ ≥ 0}  ∫ ψ(θ) dμ(θ)  y _2^2. # # In this example, Θ will be [0,1] and ψ(θ) is a gaussian centered at # θ evaluated at a series of points. evaluation_points = np.linspace(0,1,50) sigma = 0.1 def ψ(θ): return np.exp(((evaluation_points  θ)/sigma)**2) pylab.plot(evaluation_points, ψ(0.3)) def Ψ(ws, θs): return np.sum(np.array([w*ψ(θ) for (w,θ) in zip(ws, θs)]),0) def ℓ(ws, θs): return ((Ψ(ws, θs)y)**2).sum() # now we'll generate the ground truth \theta and weights num_true_θs = 4 noise_level = 0.1 true_θs = np.random.rand(num_true_θs) true_weights = np.random.rand(num_true_θs) + 1 y = Ψ(true_weights, true_θs)+ np.random.randn(len(evaluation_points))*noise_level pylab.figure() pylab.plot(evaluation_points, y) # In each iteration of ADCG the first step is the "linear minimization oracle", # which returns \argmin_{θ ∈ Θ} <ψ(θ), v>. # For this example we'll grid Θ to approximately solve this problem. grid_points = np.linspace(0,1,30) grid_psi = np.stack([ψ(θ) for θ in grid_points]) def lmo(v): scores = grid_psi @ v return grid_points[scores.argmin()] ### The second step of ADCG requires us to (attempt to) solve # # \min_{(w_i ≥ 0, θ_i)}  ∑_i w_i Θ_i  y _2^2. # # We'll try two different methods: first, block coordinate descent over w and θ, # then joint optimization over w and θ. def coordinate_descent(θs, iters = 35, min_drop = 1E5): def min_ws(): return scipy.optimize.nnls(np.stack([ψ(θ) for θ in θs]).T, y)[0] def min_θs(): res = scipy.optimize.minimize(autograd.value_and_grad(lambda θs : ℓ(ws, θs), θs, jac=True,method='LBFGSB',bounds=[(0.0,1.0)]*len(θs)) return res['x'], res['fun'] old_f_val = np.Inf for iter in range(iters): ws = min_ws() θs,f_val = min_θs() # check if loss is stationary if old_f_val  f_val < min_drop: break old_f_val = f_val return ws, θs ### Because we don't have a constraint on the total variation, it's very easy to # do joint optimization. Let's try it. def local_search(θs): n = len(θs) # stack thetas and ws into a vector def f(x): return ℓ(x[:n], x[n:]) x_init = np.concatenate((np.zeros(n), θs)) res = scipy.optimize.minimize(autograd.value_and_grad(f), x_init, jac=True,method='LBFGSB',bounds=([(0.0, None)]*n)+([(0.0,1.0)]*n)) x = res['x'] ws = x[:n] θs = x[n:] return ws, θs ### Let's define ADCG def ADCG(local_update, max_iters): θs = np.zeros(0) output = np.zeros(len(evaluation_points)) history = [] for iter in range(max_iters): residual = output  y loss = (residual**2).sum() print(iter,loss) history.append((loss, θs)) θ = lmo(residual) ws, θs = local_update(np.append(θs,θ)) output = Ψ(ws, θs) return history ### And a heuristic for selecting the number of true sources def select_k(history): drop = np.array([history[i][0]history[i+1][0] for i in range(len(history)1)]) k_hat = np.argmax(drop<0.1) return history[k_hat][1] ### Let's run it! theta_cd = select_k(ADCG(coordinate_descent, 10)) theta_nl = select_k(ADCG(local_search, 10)) print("True θs:", true_θs) print("Coordinate descent estimate:", theta_cd) print("Joint optimization estimate:", theta_nl) To play around a bit more, try making the following modifications to the script and checking how they affect the output:
