\[ \newcommand{\mvec}{\mathbf{m}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\xvec}{\mathbf{x}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\qvec}{\mathbf{q}} \newcommand{\dm}{\delta\mvec} \newcommand{\dd}{\delta\dvec} \newcommand{\ddp}{\delta\tilde{\dvec}} \newcommand{\nsrc}{n_{\text{s}}} \newcommand{\nrec}{n_{\text{r}}} \newcommand{\niter}{n_{\text{it}}} \newcommand{\ngrid}{n_{\text{g}}} \newcommand{\ntime}{n_{\text{t}}} \newcommand{\DF}{\nabla{\mathbf{F}}} \newcommand{\BR}{\mathbb{R}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Jmat}{\mathbf{J}} \newcommand{\Amat}{\mathbf{A}} \newcommand{\Cmat}{\mathbf{C}} \def\argmin{\mathop{\rm arg\,min}} \def\vec{\mbox{$\mathrm{vec}$}} \def\ivec{\mbox{$\mathrm{vec}^{-1}$}} %\newcommand{\Id}{\mbox{$\tensor{I}\,$}} \def\pector#1{\mathrm{\mathbf{#1}}} \def\cector#1{#1} \def\censor#1{#1} \def\vector#1{\mathbf{#1}} \def\fvector#1{{\widehat{\vector{#1}}}} \def\evector#1{{\widetilde{\vector{#1}}}} \def\pvector#1{{\breve{\vector{#1}}}} \def\pector#1{\mathrm{#1}} \def\ctensor#1{\bm{\mathcal{#1}}} %\def\tensorm#1{{#1}} \def\tensorm#1{\bm{#1}} %\def\tensor#1{\bm{#1}}7 \def\tensor#1{\vector{#1}} %\def\cector#1{{\vector{\underline{#1}}}} \def\hensor#1{\tensor{#1}} \def\uensor#1{\underline{\bm{#1}}} %\def\hensor#1{{\boldsymbol{\mathsf{#1}}}} \def\hector#1{\vector{#1}} %\def\hector#1{{\boldsymbol{\mathsf{#1}}}} % \def\hfector#1{\hat{\boldsymbol{\mathsf{#1}}}} % \def\ctensor#1{{\tensor{\underline{\underline{\mathsf{#1}}}}}} \newcommand{\minim}{\mathop{\mathrm{minimize}}} \def\ftensor#1{{\widehat{\tensor{#1}}}} \def\calsor#1{{\boldsymbol{\mathcal{#1}}}} \def\optensor#1{{\boldsymbol{\mathcal{#1}}}} \def\hvector#1{\hat{\boldsymbol{\mathbf{#1}}}} \def\uvector#1{{\underline{\vector{#1}}}} \def\utensor#1{{\underline{\tensor{#1}}}} \]

Time-domain sparsity promoting least-squares reverse time migration with source estimation

Mengmeng Yang^1*#, Zhilong Fang^2*, Philipp Witte³ and Felix J. Herrmann^1,3
¹ School of Earth and Atmospheric Sciences, Georgia Institute of Technology
² Department of Mathematics, Massachusetts Institute of Technology
³ School of Computational Science and Engineering, Georgia Institute of Technology
* Equally contributed
# To whom correspondence should be addressed: myang64@gatech.edu

Abstract

Least-squares reverse time migration is well-known for its capability to generate artifact-free true-amplitude subsurface images through fitting observed data in the least-squares sense. However, when applied to realistic imaging problems, this approach is faced with issues related to overfitting and excessive computational costs induced by many wave-equation solves. The fact that the source function is unknown complicates this situation even further. Motivated by recent results in stochastic optimization and transform-domain sparsity-promotion, we demonstrate that the computational costs of inversion can be reduced significantly while avoiding imaging artifacts and restoring amplitudes. While powerfull, these new approaches do require accurate information on the source-time function, which is often lacking. Without this information, the imaging quality deteriorates rapidly. We address this issue by presenting an approach where the source-time function is estimated on the fly through a technique known as variable projection. Aside from introducing negligible computational overhead, the proposed method is shown to perform well on imaging problems with noisy data and problems that involve complex settings such as salt. In either case, the presented method produces high resolution high-amplitude fidelity images including an estimates for the source-time function. In addition, due to its use of stochastic optimization, we arrive at these images at roughly one to two times the cost of conventional reverse time migration involving all data.

Introduction

Reverse-time migration (RTM) is a popular wave equation-based seismic imaging methodology where the inverse of the linearized Born scattering operator is approximated by applying its adjoint directly to the observed reflection data (Baysal et al., 1983; Whitmore, 1983). Because the adjoint does not equal the pseudo inverse, conventional RTM produces images with incorrect amplitudes. Among the factors that contribute to low-fidelity amplitudes, the imprint of the temporal bandwidth limitation of the typically unknown source wavelet features prominently and so does the fact that the Born scattering operator is not inverted. To overcome these issues, we formulate our imaging problem as a linear least-squares inversion problem where the difference between observed and predicted data is minimized in the $\ell_2$ -norm (G. Schuster, 1983; Nemeth et al., 1999; S. Dong et al., 2012; Zeng et al., 2014). While least-squares migration is a powerful technique, its succesful application to industry-scale problems is hampered by three key issues. First, iterative demigrations (i.e. Born modeling) and migrations become computationally prohibitively expensive when carried out over all shots. Second, we run the risk of overfitting the data when minimizing the $\ell_2$ -norm of the data residual. This overfitting may introduce noise-related artifacts in inverted images. Third, while the source location is generally well known, the temporal source function is often not known accurately. Because imaging relies on knowing the source function, this may have a detrimental effect on the image and makes it necessary to come up with source estimation methodology. Since we carry out our imaging iteratively, we propose to estimate the wavelet on the fly as we build up the image.

We address the issue of computational feasibility by combining techniques from stochastic optimization (Leeuwen et al., 2011; Haber et al., 2012; Powell, 2014), curvelet-domain sparsity-promotion (F. J. Herrmann and Li, 2012), and online convex optimization (Lorenz et al., 2014) with linearized Bregman. Stochastic optimization allows us to work with small random subsets of shots, which limits the number of wave equation solves—i.e., passes through the data. Convergence is guaranteed (Herrmann et al., 2015; Yang et al., 2016 ; Philipp A Witte et al., 2019) by replacing the $\ell_1$ -norm, by an elastic net consisting of a strongly convex combination of $\ell_1$ - and $\ell_2$ -norm objectives. Inclusion of the $\ell_2$ -norm results in a greatly simplified algorithm involving linearized Bregman iterations, which corresponds to gradient descent on the dual variable supplemented by a simple soft thresholding operation (J.-F. Cai et al., 2009; Yin, 2010) with a threshold that is fixed. We refer to this method as sparsity-promoting least-squares reverse-time migration (SPLS-RTM).

In addition to the high computational cost, the lack of accurate knowledge on the unknown temporal source signature may also adversely affect the performance of the inversion. Errors in the source signature lead to erroneous residuals, which in turn result in inaccurately imaged reflectors, which now may be positioned wrongly or may have the wrong amplitude or phase. To mitigate these errors, we need an embedded procedure where the source signature is updated along with the image during the inversion (Pratt, 1999; Aravkin et al., 2012, 2013; Fang et al., 2018) using a technique known as variable projection (Rickett, 2013; Leeuwen et al., 2014). For time-harmonic imaging, variable projection involves the estimation of the source function by solving a least-squares problem for each frequency separately. Since the unknown for each frequency in that case is a single complex-valued variable, this process is simple and has resulted in an accurate estimation and compensation for the source-time function (Tu et al., 2013 ; Fang et al., 2018). Unfortunately, the situation is more complicated during imaging in the time domain, where we have to estimate the complete source signature during each iteration. For this purpose, we integrate early work by Yang et al. (2016) and inverse-scattering method (Philipp A Witte et al., 2019) and achieve an approach that is suitable for realistic imaging scenarios that may include salt.

Our work is outlined as follows. First, we introduce the basic equations for time-domain reverse time migration and least-squares reverse time migration. To overcome the computational cost associated with the latter, we introduce a stochastic optimization method with sparsity promotion. This method is designed to provide an image at a fraction of the cost. Next, we extend this approach so it includes on-the-fly source estimation. This allows us to remove the requirement of the source function. We conclude by presenting a number of synthetic case studies designed to demonstrate robustness with respect to noisy data and to complex imaging scenarios that include salt.

Methodology

Since our approach hinges on cost-effective least-squares imaging, we first introduce our formulation of sparsity-promoting least-squares migration with stochastic optimization, followed by our approach to on-the-fly source estimation during the iterations.

From RTM to LS-RTM

Reverse time migration derives from a linearization (see e.g. Mulder and Plessix (2004)) with respect to the background squared slowness. For the $i^\mathrm{th}$ source this linearization reads $\begin{equation} \dd_{i} = F_{i}(\mvec_0 + \dm, \qvec) - F_{i}(\mvec_0, \qvec) \approx \Jmat_{i}(\mvec_0, \qvec) \dm, \label{BornModeling} \end{equation}$ where the vectors $\dm$ , $\qvec$ , and $\dd$ denote the model perturbation, the source-time function, and the corresponding data perturbation, respectively. We model the data for $$n_t$$ time samples over a time interval of $T\,\mathrm{s}$ . The number of receivers is $$n_r$$ so a single shot record is of size $n_t\times n_r$ . The nonlinear forward modeling operator $F_{i}(\mvec, \qvec)$ for the $i^\mathrm{th}$ source location involves the solution of the discretized acoustic wave equation $\begin{equation} \begin{aligned} \left (\mvec\odot\frac{\partial^2}{\partial t^{2}} - \Delta\right)\uvec_{i}&=\Pmat_{\text{s},i}^{\top}\qvec, \\ \Pmat_{\text{r},i} \uvec_{i} &= \dvec_{i}, \end{aligned} \label{AcousticWave} \end{equation}$ parameterized by the squared slowness collected in the vector $\mvec$ (for simplicity, we kept the density constant and we used the symbol $\odot$ to denote elementwise multiplication.) The symbol $\Delta$ represents the discretized Laplacian and the linear operator $\Pmat_{\text{r},i}$ restricts the wavefield for the $i^\mathrm{th}$ source to the corresponding receiver locations, while the linear operator $\Pmat_{\text{s},i}^{\top}$ injects the source time function at the location of the $i^\mathrm{th}$ source in the computational grid. The Jacobian $\Jmat_{i}(\mvec_0, \qvec)$ is known as the Born modeling operator and is given by the derivative of $F_i(\mvec, \qvec)$ at the point of $\mvec_0$ . Applying the Jacobian $\Jmat_{i}(\mvec_0, \qvec)$ to the model perturbation $\dm$ requires the solution of the following linearized equation: $\begin{equation} \begin{aligned} \left(\mvec_0\odot\frac{\partial^2}{\partial t^{2}} - \Delta\right)\delta\uvec_{i}& = -\frac{\partial^2}{\partial t^2} \big( \delta\mvec\odot\uvec_{i} \big), \\ \Pmat_{\text{r},i} \delta\uvec_{i} &= \dd_{i}, \end{aligned} \label{BornWave} \end{equation}$ where the vector $\delta\uvec_{i}$ corresponds to the wavefield perturbation for the $i^\mathrm{th}$ source.

The goal of seismic imaging is to estimate the model perturbations from observed data. We can expect this reconstruction process to be successful in situations where the above linear approximation is accurate—i.e., the background velocity model needs to be sufficiently accurate, which we assume it is. We also need accurate knowledge on the source function, an important aspect we will address below.

While the above linearization allows us to create an image via $\begin{equation} \dm_{\text{RTM}}=\sum_{i=1}^{n_s} \Jmat^\top_{i}\dd_i \label{migration} \end{equation}$ with $$n_s$$ the number of shots, the adjoint of the Jacobian (denoted by the symbol $^\top$ ) does not correspond to its inverse and $\dm_{\text{RTM}}$ will suffer from wavelet side lobes and inaccurate and unbalanced amplitudes (Mulder and Plessix, 2004; J. B. Bednar et al., 2006; Hou and Symes, 2016). Unlike RTM (Equation $\ref{migration}$ ), LS-RTM (Aoki and Schuster, 2009; F. J. Herrmann and Li, 2012; Tu and Herrmann, 2015b) reconstructs the model perturbation by computing the pseudo-inverse of the Born modeling operator, which can significantly mitigate these defects. LS-RTM iteratively solves the following least squares data-fitting problem: $\begin{equation} \minim_{\dm}\frac{1}{2}\sum_{i=1}^{\nsrc}\|\Jmat_{i}(\mvec_0,\qvec)\dm-\dd_{i}\|^2. \label{Lineardata} \end{equation}$ Compared to Equation $\ref{migration}$ , the above minimization requires multiple evaluations of the Jacobian and its adjoint, which becomes rapidly computationally prohibitive for large 2D and 3D imaging problems as the number of sources $$n_s$$ grows. This in part explains the relatively slow adaptation of least-squares reverse time migration (cf. Equation $\ref{Lineardata}$ ) by industry. As we show below, we overcome this problem by combining ideas from stochastic optimization and sparsity promotion (Herrmann et al., 2015; Yang et al., 2016; Philipp A Witte et al., 2019), which allow us to obtain artifact-free images at the cost of two to three passes through the data.

Stochastic optimization with sparsity promotion

As we mentioned above, the minimization of Equation $\ref{Lineardata}$ over all $$n_s$$ shots is computationally prohibitively expensive. In addition, the minimization is unconstrained and misses regularization to battle the adverse effects of noise and the null space (missing frequencies and finite aperture) associated with solving the least-squares imaging problems of the type listed in Equation $\ref{Lineardata}$ . To address these two problems, we combine ideas from stochastic optimization, during which we only work on randomized subsets of shots during each iteration, and ideas from sparsity-promoting optimization designed to remove the imprint of the null space and source subsampling related artifacts. As we have learned from the field of compressive sensing (Candès and others, 2006; Donoho, 2006; Candès and Wakin, 2008), transform-domain sparsity promotion is a viable technique to remove subsample related noise in imaging via $\begin{equation} \begin{aligned} \minim_{\xvec} \quad &\|\xvec \|_{1}, \\ \text{subject to } \, &\sum_{i=1}^{n_s} \|\Jmat_{i}( \mvec_0, \mathbf{q}) \Cmat^{\top} \xvec - \delta \dvec_{i}\|_{2} \leq \sigma. \end{aligned} \label{BPDN} \end{equation}$ In this formulation, known as the Basis Pursuit Denoise (BPDN, Chen et al. (2001)) problem, we include the sparsity-promoting $\ell_1$ -norm as the objective on the curvelet coefficients $\xvec$ of the image. These coefficients are related to the linearized data via the adjoint of the curvelet transform $(\Cmat^{\top})$ and the above program seeks to find the sparsest curvelet coefficient vector that matches the data within the noise level $\sigma$ . While the above problem is known to produce high-fidelity results, its solution relies on iterations that involve a loop over all $$n_s$$ shots.

Stochastic gradient descent (e.g. Haber et al., 2012 in the context of seismic inversion) is a widely used tool to make unconstrained optimization problems of the type included in Equation $\ref{Lineardata}$ computationally feasible by computing the gradient of Equation $\ref{Lineardata}$ for randomized subsets of shots, using a given a batch size that corresponds to the number of shots used per iteration. This approach allows to minimize Equation $\ref{Lineardata}$ in very few epochs, using only few passes through data consisting of $$n_s$$ shot records, as long as the step lengths adhere to certain conditions to guarantee convergence. Unfortunately, this complicates the solution of BPDN. To avoid this complication, we reformulate, following J.-F. Cai et al. (2009), Equation $\ref{BPDN}$ by replacing its convex $\ell_1$ -norm objective by the strongly convex objective involving $\begin{equation} \begin{aligned} \minim_{\mathbf{x}} \ & \lambda_{1} \Vert\mathbf{x} \Vert_1 + \frac{1}{2} \Vert\mathbf{x} \Vert^{2}_{2} \\ \text{subject to} \ &\sum_{i=1}^{n_s} \Vert \Jmat_{i}( \mathbf{m}_0, \mathbf{q}) \mathbf{C}^{\top} \mathbf{x} - \delta \mathbf{d}_{i}\Vert_{2} \leq \sigma \end{aligned} \label{PlainLB} \end{equation}$ with the estimate for the image given by $\delta\mathbf{\hat{m}}=\mathbf{C}^{\top} \mathbf{\hat{x}}$ where $\mathbf{\hat{x}}$ is the minimizer of the above optimization problem. The mixed objective in this problem is known as an elastic net in machine learning, which offers convergence guarantees (see Lorenz et al. (2014)) in situations where we work during each iteration with different randomized subsets of shots indexed by $\mathcal{I}_k\subset [1\cdots n_s]$ with cardinality $|\mathcal{I}|=n_s^\prime\ll n_s$ . We choose these subsets without replacement.

For $\lambda\rightarrow\infty$ , which in practice means $\lambda$ large enough, iterative solutions of Equation $\ref{PlainLB}$ as summarized in Algorithm 1 converge to the solution of Equation $\ref{BPDN}$ , even in situations where we work with randomized subsets of shots. Compared to iterative solutions of Equation $\ref{BPDN}$ , the iterations (lines 7–8 in Algorithm 1) correspond to iterative thresholding with a fixed threshold $\lambda$ on the dual variable ( $\mathbf{z_k}$ ) with a dynamic step length given by $t_k = \Vert \mathbf{A}_k \mathbf{x}_k - \mathbf{b}_k\Vert^{2}_{2} / \Vert \mathbf{A}_k^{\top} (\mathbf{A}_k \mathbf{x}_k - \mathbf{b}_k)\Vert^{2}_{2}$ (Lorenz et al., 2014). During each iteration, known as linerarized Bregman iterations, the residual is projected onto an $\ell_2$ -norm ball with the radius $\sigma$ through a projection operator $\mathcal{P}_\sigma$ . To avoid too many iterations, we set the threshold $\lambda$ , related to the the tradeoff between the $\ell_1$ and $\ell_2$ -norm objectives in Equation $\ref{PlainLB}$ , to a value that is not too large—i.e., typically proportional to the maximum of $|\mathbf{z}_k|$ at the first iteration ( $$k=1$$ ). As reported by Yang et al. (2016) and Philipp A Witte et al. (2019), high quality images can be obtained running Algorithm 1 for a few epochs as long as the source time function $\mathbf{q}$ and background velocity model are sufficiently accurate. As we will show below, the background velocity model also needs to be smooth so the tomography-related imaging is avoided.

   1. Initialize $\mathbf{x}_0 = \mathbf{0}$ , $\mathbf{z}_0 = \mathbf{0}$ , $\mathbf{q}$ , $\lambda_{1}$ , batchsize $n^\prime_{s} \ll n_s$
   2. for   $k=0,1, \cdots$
   3.     Randomly choose shot subsets $\mathcal{I}_k \subset [1 \cdots n_s],\, \vert \mathcal{I} \vert = n^\prime_{s}$
   4.      $\mathbf{A}_k = \{\Jmat_i ( \mathbf{m}_0,\mathbf{q} ) \mathbf{C}^{\top}\}_{i\in\mathcal{I}_k}$
   5.      $\mathbf{b}_k = \{\mathbf{\delta d}_i\}_{i\in\mathcal{I}_k}$
   6.      $t_k = \Vert \mathbf{A}_k \mathbf{x}_k - \mathbf{b}_k\Vert^{2}_{2} / \Vert \mathbf{A}_k^{\top} (\mathbf{A}_k \mathbf{x}_k - \mathbf{b}_k)\Vert^{2}_{2}$
   7.      $\mathbf{z}_{k+1} = \mathbf{z}_k - t_{k} \mathbf{A}^{\top}_k \mathcal{P}_\sigma (\mathbf{A}_k\mathbf{x}_k - \mathbf{b}_k)$
   8.      $\mathbf{x}_{k+1}=S_{\lambda_{1}}(\mathbf{z}_{k+1})$
   9. end
  10. Output: $\hat{\dm}=\mathbf{C}^\top\mathbf{x}_{k+1}$
   note: $S_{\lambda_{1}}(\mathbf{z}_{k+1})=\mathrm{sign}(\mathbf{z}_{k+1})\max\{ 0, \vert \mathbf{z}_{k+1} \vert - \lambda_{1} \}$
              $\mathcal{P}_{\sigma}(\mathbf{A}_k \mathbf{x}_k - \mathbf{b}_k) = \max\{ 0,1-\frac{\sigma}{\Vert \mathbf{A}_k \mathbf{x}_k -\mathbf{b}_k \Vert}\} \cdot (\mathbf{A}_k \mathbf{x}_k -\mathbf{b}_k)$

Algorithm1Linearized Bregman for SPLS-RTM

On-the-fly source estimation

In practice, we unfortunately do not have access to the source time function $\mathbf{q}$ required by Algorithm 1. Following our earlier work on source estimation in time-harmonic imaging and full-waveform inversion (Leeuwen et al., 2011; Tu and Herrmann, 2015a), we propose an approach during which we estimate the source-time signature after each model update by solving a least-squares problem that matches predicted and observed data via a time-domain filter.

To keep our time-domain wave equation solvers with finite differences numerically stable (In our implementation, we used Devito (https://www.devitoproject.org) for our time-domain finite difference simulations and gradient computations (Luporini et al., 2018; M. Louboutin et al., 2019) and JUDI (https://github.com/slimgroup/JUDI.jl) as an abstract linear algebra interface to our Algorithms (Philipp A. Witte et al., 2019b)), we introduce an initial guess for the source time function $\mathbf{q}_0$ with a bandwidth limited spectrum that is flat over the frequency range of interest. Under some assumptions on the source time function, we can write the true source time function as the convolution between the initial guess and the unknown filter $\mathbf{w}$ —i.e., we have $\mathbf{q}=\mathbf{w}\ast\mathbf{q}_0$ where the symbol $\ast$ denotes the temporal convolution. Because we assume one and the same source time function for all shots, we can write $\begin{equation} \Jmat_{i}(\mathbf{m}_0, \mathbf{w} \ast \mathbf{q}_{0}) = \mathbf{w} \ast \Jmat_{i}(\mathbf{m}_0, \mathbf{q}_{0}) \label{SrcApp} \end{equation}$ for all sources $i=1\cdots n_s$ . In this expression, we make use of linearity of the wave equation with respect to its source. To simplify the notation, we also overload the temporal convolution (denoted by the symbol $\ast$ ) to apply to all data—i.e. all traces in the shot records.

Based on the above relationship, we propose to solve for $\mathbf{w}$ after each linearized Bregman iteration (line 10 of Algorithm 2) via $\begin{equation} \begin{aligned} \min_{\mathbf{w}} \ & \sum_{i\in\mathcal{I}_k} \Vert \mathbf{w} \ast \Jmat_{i}(\mathbf{m}_0, \mathbf{q}_{0}) \mathbf{C}^T \mathbf{x} - \delta \mathbf{d}_i\Vert ^2_{2} + \Vert \mathbf{r}\odot (\mathbf{w} \ast \mathbf{q}_{0}) \Vert^2_{2} \end{aligned} \label{sub22} \end{equation}$ with $\mathcal{I}_k \subset [1 \cdots n_s], \vert \mathcal{I} \vert = n_{s}^\prime$ a randomly chosen shot subset of shot records.

To prevent overfitting while fitting the generated data $\mathbf{\tilde{b}}_k$ at the $k\text{th}$ iteration to the observed data $\mathbf{b}_k$ , we include a penalty term $\mathbf{r}$ consisting of an exponential weighting vector as given by: $\begin{equation} \mathbf{r}(t)= \nu + \log (1+e^{\alpha (t-t_0)}). \label{Weight} \end{equation}$ In this expression, the scalar $\alpha$ determines the rate of growth after $$t=t_0$$ . We choose $$t_0$$ such that oscillations related to overfitting are suppressed after this time. This prevents overfitting and ensures the filters $\mathbf{w}_k$ to be short such that the estimated source time function $\mathbf{q}=\mathbf{w}_k\ast\mathbf{q}_0$ remains short as well. The weight parameter $\nu$ penalizes the energy of the estimated source $\mathbf{q}$ , which also helps to alleviate the ill-conditioning of this sub-problem.

We summarize the different steps of our approach in Algorithm 2 below. As earlier, we solve the sparsity-promoting optimization problem via linearized Bregman iterations, which now include a correlation (correlation denoted by the symbol $\star$ is the adjoint of convolution) with the current estimate for source time function correction ( $\mathbf{w}_k$ ) in line 8. We initialize the source time function correction with a discrete Delta distribution ( $\mathbf{w}_0=\mathbf{\delta}$ ). We refer to this method with on-the-fly source estimation as sparsity-promoting LS-RTM with source estimation (SPLS-RTM-SE).

   1. Initialize $\mathbf{x}_0 = \mathbf{0}$ , $\mathbf{w}_0=\mathbf{\delta}$ , $\mathbf{z}_0 = \mathbf{0}$ , $\mathbf{q}_0$ , $\mathbf{r}(t)$ , batchsize $n_{s}^\prime\ll n_s$ ,   $\mathbf{r}$
   2. for   $k=0,1, \cdots$
   3.     Randomly choose shot subsets   $\mathcal{I}_k \subset [1 \cdots n_s], \vert \mathcal{I} \vert = n_{s}^\prime$
   4.      $\mathbf{A}_k = \{\Jmat_i ( \mathbf{m}_0,\mathbf{q}_{0} ) \mathbf{C}^{\top}\}_{i\in\mathcal{I}_k}$
   5.      $\mathbf{b}_k = \{\mathbf{\delta d}_i\}_{i\in\mathcal{I}_k}$
   6.      $\mathbf{\tilde{b}}_k = \mathbf{A}_k \mathbf{x}_k$
   7.      $t_k = \Vert \mathbf{\tilde{b}}_k - \mathbf{b}_k\Vert^{2}_{2} / \Vert \mathbf{A}_k^{\top} (\mathbf{\tilde{b}}_k - \mathbf{b}_k)\Vert^{2}_{2}$
   8.      $\mathbf{z}_{k+1} = \mathbf{z}_k - t_{k} \mathbf{A}^\top_{k} \Big( \mathbf{w}_k {\star} \mathcal{P}_\sigma (\mathbf{w}_k \ast \mathbf{\tilde{b}}_k - \mathbf{b}_k) \Big)$
   9.      $\mathbf{x}_{k+1}=S_{\lambda}(\mathbf{z}_{k+1})$
   10.     $\mathbf{w}_{k+1} = \argmin_{\mathbf{w}} \Vert \mathbf{w} \ast \tilde{\mathbf{b}}_{k} - \mathbf{b}_{k}\Vert ^2_{2} + \Vert \mathbf{r}\odot (\mathbf{w} \ast \mathbf{q}_0) \Vert^2_{2}$
  11. end
  12. Output: $\hat{\mathbf{q}}=\mathbf{w}_{k+1}\ast \mathbf{q}_0$ and $\hat{\dm}=\mathbf{C}^\top\mathbf{x}_{k+1}$

Algorithm2LB for LS-RTM with source estimation

Numerical experiments

In this experiment section, we demonstrate the viability of our approach by means of carefully designed synthetic examples. We start by showing that linearized Bregman iterations with on-the-fly source estimation are indeed able to jointly estimate the source and the sparse vector of image curvelet coefficients. Next, we consider an imaging experiment on the Marmousi model emphasizing the importance of including the source function and the influence of noise. We conclude by introducing a practical workflow that is capable of handling salt-related imaging problems.

Stylized example

While imaging with linearized Bregman (LB) iterations has resulted in high-fidelity true-amplitude images in complex models (Philipp A Witte et al., 2019), the viability of this alternative sparsity-promoting approach has not yet been verified in combination with on-the-fly source estimation. For this purpose, we examine the performance of LB with source estimation on a simplified stylized example. As we can see, Equation $\ref{SrcApp}$ implies a bilinear dependence of the reflected data on both the unknown filter $\mathbf{w}$ and the curvelet coefficient vector $\mathbf{x}$ . It is well known that this sort of bilinear dependence can give rise to ambiguities even though the vector $\mathbf{x}$ is sparse.

We exemplify this seismic bilinear relationship be defining $\mathbf{W}\mathbf{A}\mathbf{x}=\mathbf{b}$ , where the matrix $\mathbf{A} \in \mathcal{R}^{20000 \times 10000}$ is ill-conditioned, with $\text{rank}(\mathbf{A})=500$ . The sparse vector $\mathbf{x}\in \mathcal{R}^{10000\times 1}$ has only $$20$$ random non-zero elements. A block of the tall matrix, $\mathbf{A}_i \in \mathcal{R}^{500\times 10000}, i\in [1\dots 40]$ serves as a proxy for the LB modeling operator $\mathbf{J}_i$ for the $i^\mathrm{th}$ shot with only one single trace. We implement the trace-by-trace convolution via a Toeplitz matrix defined in terms of the filter $\mathbf{w}\in \mathcal{R}^{500\times 1}$ acting on each $\mathbf{A}_i\mathbf{x}$ . The multiplication of the convolution matrix $\mathbf{W}\in \mathcal{R}^{20000 \times 20000}$ with $\mathbf{Ax}$ compactly represents the repeated convolutions of the filter with all traces.

This example, designed to jointly invert for $\mathbf{x}$ and $\mathbf{w}$ , aims to exhibit the capability of our Algorithm 2 to carry out seismic imaging and on-the-fly source estimation. To demonstrate the effect of the penalty term in line 10 of Algorithm 2, we compare sparsity-promoting solutions for the fixed true wavelet to solutions with on-the-fly source estimation with and without the additional penalty. During each iteration, we randomly choose $10\%$ of the blocks of the tall matrix $\mathbf{A}$ and we run five passes through the data in total. After some parameter testing, we choose the following values for the penalty parameters: $\lambda=1,\nu=1$ and $\alpha=8$ . We find that different choices for these penalty parameters have little effect on our inversion results. Finally, the time parameter $$t_0$$ is set according to the approximate duration of the filter $\mathbf{w}$ , which in this case corresponds to a Ricker wavelet since we choose $\mathbf{q}_0$ to be a delta Dirac. We also initialize the filter $\mathbf{w}$ with a normalized Dirac. Because of the well-known amplitude ambiguity inherent to blind deconvolution problems, we normalize the $\ell_2$ -norms of the estimated source and reflectivity.

Pairs of estimated sparse “reflectivities” ( $\hat{\dm}$ ) and source functions ( $\hat{\mathbf{q}}=\mathbf{w}_{final}\ast\mathbf{q}_0$ ) after normalization are included in Figure 1 . We can draw the following conclusions from these results. First, for the noise-free data, the LB iterations are able to recover the sparse “reflectivity” and source function well up to a constant single amplitude factor, which we correct by normalizing its $\ell_2$ -norm. Second, the estimated source function and reflectivity become noisy (cf. the dotted line in Figure 1a and the dot line in Figure 1b ) when we do not include a penalty enforcing the estimated filter to be short in time. Finally, the method is robust with respect to noise as we can see from Figures 1c and 1d where $10\%$ Gaussian noise is added. This result stresses the importance of including the penalty.

Experiments on the modified Marmousi model

To illustrate the performance and robustness with respect to noise of the proposed SPLS-RTM-SE method, we consider a model with complex layered stratigraphy. We derive this imaging example from the well-known synthetic Marmousi model (Brougois et al., 1990), which is $$3.2$$ km deep and $$8.0$$ km wide, with a grid size of $5\times 5$ m. To avoid imaging artifacts, we use a background velocity that is kinematically correct. We simulate $$320$$ equally spaced sources positioned at a depth of $$25$$ m. We use a minimum phase source time function with its significant spectrum ranging from $$10$$ to $$40$$ Hz as shown in Figure 2. We use this type of source to generate linear data by applying the demigration operator ( $\Jmat_i ( \mathbf{m}_0,\mathbf{q} ),\, i=1\cdots n_s$ ) to a bandwidth limited medium perturbation $\dm$ given by the difference between two smoothings of the true medium (Huang et al., 2016). We record data at $$320$$ equally spaced co-located receivers. To assess the sensitivity to noise, we create two additional data sets by adding zero-centered Gaussian noise whose energies are $50\%$ and $200\%$ of the simulated linear data respectively.

Contrary to source estimation in the frequency domain, we need an initial source function $\mathbf{q}_0$ for the source time function (see Figure 2a and 2b where the initial source time function and its amplitude spectrum are depicted by dashed black lines). We need this initial source function to make sure that the finite-difference propagators remain stable. To make sure we do not exceed the valid frequency range of our simulations, we choose the frequency band of the initial source time function broader than the true one. To circumvent bias, we initialize the time function with a flat amplitude spectrum between $$20-50$$ Hz. To allow for a realistic scenario, we apply a phase shift to this initial guess making it mixed phase and non symmetric .

To carry out the alternating inversion for the reflectivity and unknown filter $\mathbf{w}$ , we run Algorithm 2 for $$40$$ iterations with a batch size of $$8$$ —i.e., we use $$8$$ randomly selected sources per iteration without replacement. The total number of wave equation solves is equivalent to touching each shot only once—i.e., we make one pass through the data. To improve the convergence of the inversion, we employ preconditioners in both the data and model domains (see Herrmann et al., 2009 for detail). To remove the imprint of the sources/receivers on the image, we also include a top mute to our operators. Also, we apply a mute to the data to suppress the dominating water bottom reflection and long offsets. Finally, we choose the thresholding parameter $\lambda$ to be $10\%$ of the maximum value of the first gradient to avoid unnecessary extra iterations resulting from a threshold value that is too large or small.

The estimated source functions $\hat{\mathbf{q}}=\mathbf{w}_{final}\ast\mathbf{q}_0$ and their amplitude spectra after applying an $\ell_2$ -norm normalization are included in Figure 2. Overall we can see that the source functions are well recovered despite the presence of noise. For a small amount of noise, the estimated spectrum is the same as the one obtained from the noise-free data while the source function obtained from data with a high noise level is less smooth, but closer to the true source function. Other than the fact that we are dealing with nonlinear blind deconvolution, we do not have an explanation for this behavior. While the noise dependence of the estimated source functions behaves somewhat aberrant, the recovered reflectivity behaves as expected (cf. Figures 3a and 3b for images obtained with the true source and with the initial guess and images 4a – 4c obtained with on-the-fly source estimation for noise-free and noisy data.) We can make the following observations from these experiments: first, it is important to image with the correct source even when the data is noise free. While our sparsity-promoting scheme is able to recover a high-resolution image (see Figure 3a) when the source function corresponds to the true source, the image quality deteriorates rapidly if the amplitude and phase spectra of the wavelet are wrong (see Figure 3b). Energy is no longer focussed and the shape and locations of the imaged reflectors are off. However, the results included in Figure 4 demonstrate that good results can be obtained when estimating the source function on the fly. The estimated reflectivity depicted in Figure 4a is close to the reflectivity obtained when we image with the true source function (cf. Figures 3a and 4). Moreover, the estimated images are, as expected, relatively insensitive to noise in the data albeit the imaged reflectivity for the high noise case somewhat deteriorated (cf. Figures 4a – 4c). Contrary to the imaging result for the wrong initial source function, the reflectors are positioned correctly and have the correct phase, shape, and amplitude, even in situations of substantial noise although at the expense of some remaining low- and high-frequency artifacts. The latter are related to the use of the curvelet transform and are to be expected. Overall, these results confirm the robustness of our imaging algorithm in the situation where there is significant noise in the data.

To arrive at the estimated images in Figure 4, we set the penalty parameters $\nu=1$ and $\alpha=8$ in Algorithm 2. After the first source estimation in the second iteration, we reset the coefficients $\mathbf{z}$ and $\mathbf{x}$ to zero to avoid spending too many iterations on correcting wrongly located reflectors from the first iteration in which the initial guess of the source wavelet is used. In addition to the visual quality of the estimated images, convergence plots for the relative error for the data residual (the relative $\ell_2$ -norm error between the observed data and the demigrated data for estimated reflectivity $\hat{\dm}$ convolved with the estimated filter, $\frac{\|\mathbf{w}_k\ast\tilde{\mathbf{b}}_k-\mathbf{b}_k \|_2}{\|\mathbf{b}_k\|_2}$ ) and the relative model error (the $\ell_2$ -norm error between the true reflectivity and the recovered reflectivity, $\frac{\|\hat{\delta\mathbf{m}}_k -\delta\mathbf{m}\|_2}{\|\delta\mathbf{m}\|_2}$ ) confirm our observation that Algorithm 2 is capable of providing high quality images in the absence of precise knowledge on the source function and in the presence of substantial noise. Our approach arrives at these least-squares images at the cost of a single data pass. Understandably, the algorithm starts off with a large relative residual and model error due to the wrong initial guess for the source function. As Algorithm 2 progresses, these relative errors continue to decay and are comparable to the convergence plots for the true source function. Because on-the-fly source estimation improves our ability to adapt to the data, the relative data residual for the noise-free case (dashed line) is even better then the relative error in case the source function is known (solid line). While encouraging, these results are obtained for a relatively simple imaging experiment and for data that is obtained with linearized modeling via demigration. In other words, we commit an inversion crime. In the next section, we will show that the proposed method also performs well in more complicated settings with nonlinear data.

Experiments on the Sigsbee model

Sparsity-promoting imaging algorithms such as SPLS-RTM (Algoritm 2) are designed to handle complex imaging scenarios with strong velocity contrasts and strong lateral velocity variations. Examples of such scenarios are salt plays where reflections underneath the salt are of interest. To demonstrate the viability of our imaging approach with on-the-fly source estimation in this scenario, we consider the challenging Sigsbee2A model of size $24.4\times 9.2$ km. This model contains a large salt body and a number of faults and point diffractors. To demonstrate the capability of our approach to handel this challenging situation, we simulate nonlinear data for a marine acquisition without a free surface. We model $$960$$ sources in total, with each shot record being recorded by an array of $$320$$ receivers with $$25$$ m receiver spacing, a maximum offset of $$8$$ km and a towing depth of $$15$$ m. We use a source wavelet with a peak frequency at $15\text{Hz}$ (see Figure 6) and we record for $$10$$ s.

As is customary during imaging under salt, we use a background velocity model that features salt with relatively strong and therefore reflecting boundaries. We approximate linear data by using this background velocity model to generate data, which we subtract from the simulated data in the true Sigsbee2A model (i.e. from the observed data). Due to the presence of salt in the background model, the incident wavefield contains reflections that give rise to unwanted tomographic low-frequency artifacts in the image. This problem is widely reported in the literature (e.g. Yoon and Marfurt (2006); Guitton et al. (2007)). To remove these imaging artifacts, we replace the conventional imaging condition for RTM by the inverse-scattering imaging condition (Stolk et al., 2012; Whitmore and Crawley, 2012; Philipp A. Witte et al., 2017). While this condition has been proven capable of removing tomographic artifacts during RTM (Whitmore and Crawley, 2012; Philipp A. Witte et al., 2017) and sparsity-promoting least-squares RTM (Philipp A. Witte et al., 2017), it changes the linearized forward operator (the Jacobian $\mathbf{J}_i$ ), resulting in an inconsistent system. Contrary to RTM with the conventional imaging condition, imaging with the inverse scattering imaging condition corresponds to estimating perturbations in the impedance, rather than in the velocity.

Unfortunately, the difference in which quantity is being imaged, is problematic for our proposed on-the-fly source estimation, which tries to correct for inconsistencies between observed and predicted data. Contrary to the situations where we use the conventional imaging condition, the data residual now contains contributions from the wrong wavelet and the linearized imaging condition, which leads to wrong estimates for the unknown source function. We overcome this problem via a hybrid iterative algorithm where we switch imaging conditions during the iterations as outlined in Algorithm 2. To estimate the source function, we first iterate with the conventional imaging condition. Since the convergence to the source function is fast, we switch after five iterations to the scattering imaging, but keep the estimated source function fixed. Basically, we jump from Algorithm 2 to Algorithm 1.

Results of this hybrid approach are summarized in Figures 6 – 8. As before, we compare our results with on-the-fly source estimation to SPLS-RTM for the true source function. The initial guess and estimated wavelets in Figure 6 again confirm the validity of our approach, yielding a reasonably accurate estimate for the source after only five iterations and subsequent normalization of the $\ell_2$ -norm. Imaging results obtained after twenty iterations with $10\%$ of the shots, which amounts to two data passes in total, are included in Figures 7 and 8. Unlike a typical RTM image, images obtained by SPLS-RTM are well resolved and contain true amplitude. This is because we invert the linearized modeling operator, which compensates for the source, finite aperture, and propagation effects. As before, we include preconditioners and mutes to improve the conditioning number of the linear system. Comparisons of Figure 7a, obtained with Algoritm 1 with the true source function, and Figure 7b, which we compute with our hybrid method switching from Algoritm 2 to Algoritm 1 after five iterations, show near identical results, thus confirming the validity of the proposed approach. These observations are confirmed by trace-by-trace comparisons in Figure 8. Similar to the Marmousi experiments, we set the penalty parameters $\alpha=8$ , and $\nu=1$ , and the thresholding parameter $\lambda$ is set according to $10\%$ of the maximum absolute amplitude level of $\mathbf{z}_1$ .

Discussion

Due to its computational costs, sparsity-promoting LS-RTM has been an expensive proposition and is for this reason not yet widely adapted while this method is capable of achieving images with high-amplitude fidelity and fewer artifacts. With the proposed work, we are able to come up with an alternative low-cost approach combining techniques from stochastic optimization and sparsity-promotion with on-the-fly source estimation using the technique of variable projection. Compared to earlier work, addressing the memory demands of (LS-)RTM via on-fly-Fourier transforms (Philipp A Witte et al., 2019), we tackle the problem of on-the-fly source estimation in the time domain. Because we use industry-strength time-domain finite-difference propagators provided by Devito (Luporini et al., 2018; M. Louboutin et al., 2019) and exposed in the Julia programming language by JUDI (Philipp A. Witte et al., 2019b), our approach scales in principle to large 3D industrial problems. While we address the importance of estimating source function, we believe that the sensitivity of LS-RTM to errors in the background velocity model needs to be studied as well albeit early work on time-harmonic LS-RTM showing some robustness with respect to these errors (Tu and Herrmann, 2015a). Combining our approach with the method of on-the-fly Fourier transforms is also a topic that needs further study.

While carrying out full scale 3D (LS-)RTM experiments is generally out of reach for academia where access to large high-performance compute is often limited, recent work on a serverless implementation of RTM (Philipp A. Witte et al., 2019a, 2019c) has shown that industry-scale workloads can be run in the cloud leveraging the power of Devito. In the not too distant future, we plan to demonstrate the presented method on an industry-scale imaging problem using tilted transversely-isotropic propagators. This would truly exemplify the power of modern code bases and linear algebra abstractions as utilized by JUDI (Philipp A. Witte et al., 2019b). This framework gives us flexibility for instance to switch to more involved 3D propagators or to estimating source-time functions that are allowed to vary along the survey.

Finally, since sparsity-promoting LS-RTM carries out inversions, we expect to be able to obtain images from sparsely sampled data, e.g. data collected with sparse ocean bottom nodes and (multi-)source vessel simultaneous recordings. We plan to report on these aspects in the not too distant future.

Conclusion

We proposed a scalable time-domain approach to sparsity-promoting least-squared reverse time migration with on-the-fly source estimation in principle suitable for industrial 3D imaging problems. The presented approach leverages recently developed techniques from convex optimization and variable projection that greatly reduce costs and the necessity to provide an estimate for the source function. As a result, our approach is capable of generating high-fidelity true-amplitude images including source estimates at the cost of roughly one to two migrations involving all data.

By means of carefully designed experiments in 2D, we were able to demonstrate that our method is capable of handling noisy data and complex imaging settings such as salt. We were able to image under salt, which is often plagued by low-frequency tomographic artifacts, by switching between applying the conventional imaging condition initially, followed by iterations that apply the inverse-scattering condition. In this way, we estimated the source function first while creating an artifact-free image with later iterations during which the imaging condition was switched while keeping the source function fixed.

Because the presented method relies on time-domain propagators, we anticipate it will be able to scale to large 3D industrial imaging problems. Because 3D imaging with full-azimuthal sparse data typically provided good illumination of the reservoir, we expect the proposed methodology to produce high fidelity results at a cost of roughly one to two reverse time migrations involving all shots.

Acknowledgements

This research was funded by the Georgia Research Alliance and the Georgia Institute of Technology. And this work is a collaborative effort of all the co-authors. We confirm that there is no conflict of interest to declare.

Aoki, N., and Schuster, G. T., 2009, Fast least-squares migration with a deblurring filter: Geophysics, 74, WCA83–WCA93.

Aravkin, A. Y., Leeuwen, T. van, and Tu, N., 2013, Sparse seismic imaging using variable projection: In 2013 iEEE international conference on acoustics, speech and signal processing (pp. 2065–2069). IEEE.

Aravkin, A. Y., Leeuwen, T. van, Calandra, H., and Herrmann, F. J., 2012, Source estimation for frequency-domain fWI with robust penalties: In 74th eAGE conference and exhibition incorporating eUROPEC 2012.

Baysal, E., Kosloff, D. D., and Sherwood, J. W., 1983, Reverse time migration: Geophysics, 48, 1514–1524.

Bednar, J. B., Bednar, C. J., and Shin, C., 2006, Two-way vs one-way: A case study style comparison: In SEG technical program expanded abstracts 2006 (pp. 2343–2347). Society of Exploration Geophysicists.

Brougois, A., Bourget, M., Lailly, P., Poulet, M., Ricarte, P., and Versteeg, R., 1990, Marmousi, model and data: In EAGE workshop-practical aspects of seismic data inversion (pp. cp–108). European Association of Geoscientists & Engineers.

Cai, J.-F., Osher, S., and Shen, Z., 2009, Convergence of the linearized Bregman iteration for L1-norm minimization: Mathematics of Computation, 78, 2127–2136.

Candès, E. J., and others, 2006, Compressive sampling: In Proceedings of the international congress of mathematicians (Vol. 3, pp. 1433–1452). Madrid, Spain.

Candès, E. J., and Wakin, M. B., 2008, An introduction to compressive sampling: IEEE Signal Processing Magazine, 25, 21–30.

Chen, S. S., Donoho, D. L., and Saunders, M. A., 2001, Atomic decomposition by basis pursuit: SIAM Review, 43, 129–159.

Dong, S., Cai, J., Guo, M., Suh, S., Zhang, Z., Wang, B., and Li, e Z., 2012, Least-squares reverse time migration: Towards true amplitude imaging and improving the resolution: In SEG technical program expanded abstracts 2012 (pp. 1–5). Society of Exploration Geophysicists.

Donoho, D. L., 2006, Compressed sensing: IEEE Transactions on Information Theory, 52, 1289–1306.

Fang, Z., Wang, R., and Herrmann, F. J., 2018, Source estimation for wavefield-reconstruction inversion: Geophysics, 83, R345–R359.

Guitton, A., Kaelin, B., and Biondi, B., 2007, Least-squares attenuation of reverse-time-migration artifacts: Geophysics, 72, S19–S23.

Haber, E., Chung, M., and Herrmann, F., 2012, An effective method for parameter estimation with PDE constraints with multiple right-hand sides: SIAM Journal on Optimization, 22, 739–757.

Herrmann, F. J., and Li, X., 2012, Efficient least-squares imaging with sparsity promotion and compressive sensing: Geophysical Prospecting, 60, 696–712.

Herrmann, F. J., Brown, C. R., Erlangga, Y. A., and Moghaddam, P. P., 2009, Curvelet-based migration preconditioning and scaling: Geophysics, 74, A41–A46.

Herrmann, F. J., Tu, N., and Esser, E., 2015, Fast “online” migration with compressive sensing: In Proceedings of the 77th eAGE conference and exhibition.

Hou, J., and Symes, W. W., 2016, Accelerating extended least-squares migration with weighted conjugate gradient iteration: Geophysics, 81, S165–S179.

Huang, Y., Nammour, R., and Symes, W., 2016, Flexibly preconditioned extended least-squares migration in shot-record domain: Geophysics, 81, S299–S315.

Leeuwen, T. van, Aravkin, A. Y., and Herrmann, F. J., 2011, Seismic waveform inversion by stochastic optimization: International Journal of Geophysics, 2011.

Leeuwen, T. van, Aravkin, A. Y., Herrmann, F. J., Li, M., Rickett, J., and Abubakar, A., 2014, Reply to the discussion: Geophysics, 79, X11–X15.

Lorenz, D. A., Schopfer, F., and Wenger, S., 2014, The linearized bregman method via split feasibility problems: Analysis and generalizations: SIAM Journal on Imaging Sciences, 7, 1237–1262.

Louboutin, M., Lange, M., Luporini, F., Kukreja, N., Witte, P. A., Herrmann, F. J., … Gorman, G. J., 2019, Devito (v3.1.0): An embedded domain-specific language for finite differences and geophysical exploration: Geoscientific Model Development, 12, 1165–1187.

Luporini, F., Lange, M., Louboutin, M., Kukreja, N., Hückelheim, J., Yount, C., … Gorman, G. J., 2018, Architecture and performance of devito, a system for automated stencil computation: CoRR, abs/1807.03032. Retrieved from http://arxiv.org/abs/1807.03032

Mulder, W. A., and Plessix, R.-E., 2004, A comparison between one-way and two-way wave-equation migration: Geophysics, 69, 1491–1504.

Nemeth, T., Wu, C., and Schuster, G. T., 1999, Least-squares migration of incomplete reflection data: Geophysics, 64, 208–221.

Powell, W. B., 2014, Clearing the jungle of stochastic optimization: In Bridging data and decisions (pp. 109–137). Informs.

Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model: Geophysics, 64, 888–901.

Rickett, J., 2013, The variable projection method for waveform inversion with an unknown source function: Geophysical Prospecting, 61, 874–881.

Schuster, G., 1983, Least-squares cross-well migration: In SEG technical program expanded abstracts 1993 (pp. 110–113). Society of Exploration Geophysicists.

Stolk, C., De Hoop, M., and others, 2012, Linearized inverse scattering based on seismic reverse time migration: Journal de Mathématiques Pures et Appliquées, 98.

Tu, N., and Herrmann, F. J., 2015a, Fast imaging with surface-related multiples by sparse inversion: Geophysical Journal International, 201, 304–317. Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/Journals/GeophysicalJournalInternational/2014/tu2014fis/tu2014fis.pdf

Tu, N., and Herrmann, F. J., 2015b, Fast least-squares imaging with surface-related multiples: Application to a north sea data set: The Leading Edge, 34, 788–794.

Tu, N., Aravkin, A. Y., Leeuwen, T. van, and Herrmann, F. J., 2013, Fast least-squares migration with multiples and source estimation: Proceedings of the 75th eAGE conference and exhibition.

Whitmore, N., 1983, Iterative depth migration by backward time propagation: In SEG technical program expanded abstracts 1983 (pp. 382–385). Society of Exploration Geophysicists.

Whitmore, N., and Crawley, S., 2012, Applications of rTM inverse scattering imaging conditions: In SEG technical program expanded abstracts 2012 (pp. 1–6). Society of Exploration Geophysicists.

Witte, P. A., Louboutin, M., Jones, C., and Herrmann, F. J., 2019a, Serverless seismic imaging in the cloud:

Witte, P. A., Louboutin, M., Kukreja, N., Luporini, F., Lange, M., Gorman, G. J., and Herrmann, F. J., 2019b, A large-scale framework for symbolic implementations of seismic inversion algorithms in julia: Geophysics, 84, F57–F71.

Witte, P. A., Louboutin, M., Luporini, F., Gorman, G. J., and Herrmann, F. J., 2019, Compressive least-squares migration with on-the-fly fourier transforms: Geophysics, 84, 1–76.

Witte, P. A., Louboutin, M., Modzelewski, H., Jones, C., Selvage, J., and Herrmann, F. J., 2019c, An event-driven approach to serverless seismic imaging in the cloud: Retrieved from https://slim.gatech.edu/Publications/Public/Submitted/2019/witte2019TPDedas/witte2019TPDedas.html

Witte, P. A., Yang, M., and Herrmann, F. J., 2017, Sparsity-promoting least-squares migration with the linearized inverse scattering imaging condition: Proceedings of the 79th eAGE conference and exhibition.

Yang, M., Witte, P., Fang, Z., and Herrmann, F., 2016, Time-domain sparsity-promoting least-squares migration with source estimation: In SEG technical program expanded abstracts 2016 (pp. 4225–4229). Society of Exploration Geophysicists.

Yin, W., 2010, Analysis and generalizations of the linearized bregman method: SIAM Journal on Imaging Sciences, 3, 856–877.

Yoon, K., and Marfurt, K. J., 2006, Reverse-time migration using the Poynting vector: Exploration Geophysics, 37, 102–107.

Zeng, C., Dong, S., and Wang, B., 2014, Least-squares reverse time migration: Inversion-based imaging toward true reflectivity: The Leading Edge, 33, 962–968.