\[ \usepackage{acronym} \acrodef{RTM}{Reverse-Time Migration} \def\op#1{\mathbf{#1}} \def\monoop#1{\mathbf{#1}_i} \def\undop#1{\underline{\mathbf{#1}}} \def\vector#1{\mathbf{#1}} \def\monovec#1{\mathbf{#1}_i} \def\undvec#1{\underline{\mathbf{#1}}} \def\argmin{\mathop{\rm argmin}} \def\min{\mathop{\rm minimize}} \def\infim{\mathop{\rm inf}} \]

Source estimation with surface-related multiples—fast ambiguity-resolved seismic imaging

Ning Tu¹, Aleksandr Aravkin², Tristan van Leeuwen³, Tim Lin¹, and Felix J. Herrmann¹
¹ Earth, Ocean and Atmospherical Sciences, University of British Columbia, Vancouver, British Columbia, Canada; ² Department of Applied Mathematics, University of Washington, Seattle, WA; ³ Mathematical Institute, Utrecht University, P.O. Box 80010, 3508 TA Utrecht, The Netherlands

Abstract

We address the problem of obtaining a reliable seismic image without prior knowledge of the source wavelet, especially from data that contain strong surface-related multiples. Conventional reverse-time migration requires prior knowledge of the source wavelet, which is either technically or computationally challenging to accurately determine; inaccurate estimates of the source wavelet can result in seriously degraded reverse-time migrated images, and therefore wrong geological interpretations. To solve this problem, we present a “wavelet-free” imaging procedure that simultaneously inverts for the source wavelet and the seismic image, by tightly integrating source estimation into a fast least-squares imaging framework, namely compressive imaging, given a reasonably accurate background velocity model. However, this joint inversion problem is difficult to solve as it is plagued with local minima and the ambiguity with respect to amplitude scalings, because of the multiplicative, and therefore nonlinear, appearance of the source wavelet in the otherwise linear formalism. We have found a way to solve this nonlinear joint-inversion problem using a technique called variable projection, and a way to overcome the scaling ambiguity by including surface-related multiples in our imaging procedure following recent developments in surface-related multiple prediction by sparse inversion. As a result, we obtain without prior knowledge of the source wavelet high-resolution seismic images, comparable in quality to images obtained assuming the true source wavelet is known. By leveraging the computationally efficient compressive-imaging methodology, these results are obtained at affordable computational costs compared with conventional processing work flows that include surface-related multiple removal and reverse-time migration.

Introduction

Conventional reverse-time migration (RTM, Baysal et al., 1983) requires the knowledge of the source wavelet as prior information, which is used during the forward propagation of the source wavefield. Unfortunately, for field seismic data, this knowledge is either technically (e.g., relying on statistical assumptions, Ulrych et al., 1995) or computationally (e.g., as used in full-waveform inversion, Virieux and Operto, 2009) challenging to accurately determine during the pre-processing procedure. Inaccurate estimates of the source wavelet will introduce errors to the forward propagated source wavefield, and later to the seismic image during the cross-correlation of the source and receiver wavefields. For example, wrong estimates in the phase of the source wavelet can result in misplaced structures in the seismic image, and therefore lead to wrong geological interpretations.

To eliminate the dependence of seismic imaging on the knowledge of the source wavelet, we are motivated to incorporate source estimation into seismic imaging with minimal computational overhead. We achieve this objective via the joint inversion of the seismic image and the source wavelet, by extending the least-squares-migration formalism (Nemeth et al., 1999). Because the forward modelling of the seismic data is linear with respect to the source wavelet, this type of joint inversion problem is known as the separate least-squares problem (Golub and Pereyra, 1973, 2003; Kaufman, 1975; Aleksandr Y. Aravkin and van Leeuwen, 2012). To reduce the excessive simulation cost of conventional least-squares migrations, we leverage curvelet-domain sparsity of the seismic image (Herrmann et al., 2008a), and tightly integrate source estimation into a computationally efficient least-squares imaging formalism (Herrmann and Li, 2012), also known as the compressive imaging method. As a result, we obtain high-resolution least-squares migrated images with simulation costs (in terms of wave-equation solves, not the overall computing time) that roughly equal conventional reverse-time migration, whereas the computational overhead introduced by source estimation is negligible compared to the wave-equation solves.

Although a separable least-squares problem can usually be effectively solved using variable projection (Golub and Pereyra, 2003), applying variable projection to the proposed joint inversion procedure is complicated by the following two issues. First, the compressive imaging formalism adopts a sparsity-promoting objective function, which is different from typical separable least-squares problems (Golub and Pereyra, 2003). We follow van den Berg and Friedlander (2008) and A. Y. Aravkin et al. (2013), and derive an alternative problem formulation that turns the sparsity-promoting objective into a sparse constraint. Second, there is an ambiguity in the amplitude scaling between the model perturbations and the estimated source wavelet, which cannot be resolved using variable projection alone, as in any blind-deconvolution type of problem (Stockham Jr et al., 1975). Inspired by recent developments in Primary Estimation by Sparse Inversion (EPSI, G. J. A. van Groenestijn and Verschuur, 2009; Lin and Herrmann, 2013), we propose to resolve the ambiguity by incorporating surface-related multiples in the inversion. In this way we leverage the self-consistency between the primary events and their corresponding surface-related multiples, and obtain a properly scaled source wavelet that leads to the separation of primaries and the surface-related multiples. As any other imaging algorithm, our proposed method relies on the knowledge of a reasonably accurate background model.

The variable projection method has found its applications in a variety of geophysical problems in recent years, for example, in characterization of P-S wave conversion (Fomel et al., 2003), in velocity model building (T. van Leeuwen and Mulder, 2009; Peters et al., 2014; Peters and Herrmann, 2014), in impedance reconstruction (Métivier et al., 2011; Métivier, 2011), and notably in source estimation during full-waveform inversion (Aleksandr Y. Aravkin et al., 2012; Rickett, 2013; M. Li et al., 2013). In all these applications, an objective function that measures the data misfit is used for the variable projection technique to be applicable. The misfit is usually measured using the $\ell_2$ -norm (i.e., a least-squares formulation), but other differentiable misfit functions can also be used (Aleksandr Y. Aravkin et al., 2012). In this paper, however, we minimize the $\ell_1$ -norm of the solution vector to promote sparsity, with the least-squares data-fitting term acting as a constraint. To the authors’ knowledge, this work is the first instance of applying the variable projection technique to an optimization problem with a sparsity-promoting objective.

Regarding the use of multiples in source estimation, successful data-space applications have been demonstrated in the literature of Surface-Related Multiple Elimination (SRME, Verschuur et al., 1992), and EPSI (G. J. A. van Groenestijn and Verschuur, 2009; Lin and Herrmann, 2013; Esser et al., 2015). Based on the same physical principles as described by the SRME relation (Berkhout, 1993; Guitton, 2002; Muijs et al., 2007; N. D. Whitmore et al., 2010; Verschuur, 2011; Liu et al., 2011; Wong et al., 2014; Zhang and Schuster, 2014; Tu and Herrmann, 2015a; Davydenko et al., 2015), we incorporate surface-related multiples in sparsity-promoting seismic imaging including source estimation, and demonstrate why incorporation of the surface-related multiples mitigates the scaling ambiguity that plagues source estimation in seismic imaging. By optimizing in the model space rather than in the data space, we gain the following benefits: (i) we have access to the physical locations of the entire subsurface model, and can therefore overcome some limitations that SRME/EPSI imposes on data acquisition, such as (approximate) source/receiver co-location; (ii) the unknown model parameters are of much smaller dimensions compared to seismic data, which enables us to subsample the source wavefields to relax the dense data-acquisition requirement or the computation cost (Tu and Herrmann, 2015a). Aside from mitigating the scaling ambiguity, multiples also provide extra illumination coverage that complements primaries, especially along the cross-line direction in 3D seismic surveys (Long et al., 2013; S. Lu et al., 2014). However, properly imaging the surface-related multiples that contain many different orders of surface reflections calls for an inversion approach to suppress spurious artifacts from the multiples (Verschuur, 2011; Wong et al., 2014; Zhang and Schuster, 2014; Tu and Herrmann, 2015a), which can be computationally challenging. Tu and Herrmann (2015a) extended the compressive imaging approach by Herrmann and Li (2012) to account for surface-related multiples in a computationally efficient way, assuming prior knowledge of a properly scaled source wavelet. Verschuur (2011) also showed that simultaneous imaging of the primary and multiple wavefields requires the exact knowledge of the source wavelet. Davydenko et al. (2015) proposed a two-step approach to first image the multiples and then the primaries by source estimation. In this work, we are motivated to obviate the need for the exact knowledge of source wavelet, and jointly image the primary and multiple wavefields, by integrating source estimation into the method by Tu and Herrmann (2015a) using variable projection (Golub and Pereyra, 2003; Aleksandr Y. Aravkin and van Leeuwen, 2012).

Paper outline

The paper is organized as follows. First, we formulate the joint sparsity-promoting source- and image-estimation problem, followed by a technical discussion on how to solve $\ell_1$ -norm minimization problems with source estimation via variable projection. Next, we explain why including surface-related multiples into the formulation helps mitigate the scaling ambiguity during imaging and source estimation. We conclude by demonstrating the advantages of the proposed method over conventional RTM, using carefully elaborated synthetic examples.

Problem formulation and method

As we propose the joint inversion approach in the compressive-imaging framework, we will first briefly review the compressive-imaging formalism and formulate the source estimation problem. Afterwards, we will explain how to solve the problem using variable projection.

Compressive imaging

Given the knowledge of the source wavelet, compressive imaging aims to obtain least-squares migrated images in a computationally efficient manner, by solving the following Basis Pursuit De-Noise (BPDN) problem (Herrmann and Li, 2012): $\begin{equation} \begin{aligned} \mathrm{BPDN}(\vector{w},\sigma): \quad & \min_{\vector{x}} \|\vector{x}\|_1 \\ & \mathrm{subject\ to} \quad \sum_{i\in\Omega} \sum_{j\in\Sigma} \|\undvec{d}_{i,j}-\nabla\op{F}[\vector{m}_{0},{w}_i\undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}}\leq\sigma^2. \end{aligned} \label{eq:bpw} \end{equation}$ In this formulation, vector $\vector{w}$ represents the Fourier spectrum of the source wavelet, and the tolerance parameter $\sigma$ allows for data mismatch due to noise in the data and modelling errors. Vector $\vector{x}$ is the curvelet coefficients of the image $\delta\vector{m}$ , i.e., $\delta\vector{m}=\op{C}^*\vector{x}$ with $\op{C}$ the curvelet synthesis operator (Candès and Donoho, 2004), and $\delta\vector{m}$ itself being the perturbations over a given gridded background velocity model $\vector{m}_0$ parameterized by the square of the slowness. In the data-fitting constraint of the above optimization program, $$i$$ indexes the discretized frequencies, and $$j$$ indexes the different source experiments. Instead of using all $$n_f$$ discretized frequencies and $$n_s$$ source experiments, Herrmann and Li (2012) proposed to subsample the monochromatic source experiments to reduce the simulation cost in forward modelling: notations $\Omega$ and $\Sigma$ denote the randomly selected $n_f^{\prime} \ll n_f$ frequencies and $n_s^{\prime} \ll n_s$ source experiments. By subsampling, the number of wave-equation solves in each iteration is reduced by a factor of $(n_f*n_s)/(n_f^{\prime}*n_s^{\prime})$ . While a brief explanation of drawing the simultaneous source experiments is to multiply a tall (i.e., more rows than columns: the number of rows is the number of sequential sources and the number of columns is the number of simultaneous sources) source encoding matrix on the right-hand-side of the source matrix, we refer to Herrmann and Li (2012) and Tu and Herrmann (2015a) on the details of the subsampling procedure. The underlined variables $\undvec{d}_{i,j}$ and ${w}_i\undvec{s}_{j}$ represent the $$j$$ th column of the subsampled primary-only receiver wavefields (i.e., simultaneous data) and point-source wavefields (i.e., simultaneous sources) respectively. We assume that all sources share the same source signature $\vector{w}$ , while it is possible to extend our work to allow for source estimation for each shot separately, by leveraging the fact that the unknown source wavelets still have a much smaller dimension compared to the model parameters and the seismic data. The operator $\nabla\op{F}$ represents the linearized Born scattering operator. Note that the source weight $$w_i$$ is separable from the source vector $\undvec{s}_j$ , i.e., $\nabla\op{F}[\vector{m}_{0},{w}_i\undvec{s}_{j}]\delta\vector{m}={w}_i\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\delta\vector{m}$ . Throughout this paper, we form the operator $\nabla\op{F}$ based on the two-way wave equation, therefore the application of $\nabla\op{F}^*$ to data residuals produces reverse-time migrated images (Lailly, 1983). The above formulation means that among all possible solutions, the optimization program looks for the sparsest curvelet coefficients that, after curvelet synthesis and linearized modelling with the simultaneous sources, predict the simultaneous observed data up to a noise threshold specified by $\sigma$ . With the above compressive-imaging formalism in place, we now continue to formulate the source estimation problem.

Source estimation

Now we identify the source wavelet $w_i, i\in \Omega$ (collected in the vector $\vector{w}$ ) as an additional unknown in the compressive imaging formulation, i.e., we have $\begin{equation} \begin{aligned} \mathrm{BPDN}(\sigma): \quad & \min_{\vector{x},\vector{w}} \|\vector{x}\|_1 \\ & \mathrm{subject\ to} \quad \sum_{i\in\Omega} \sum_{j\in\Sigma} \|\undvec{d}_{i,j}-{w}_i\nabla\op{F}[\vector{m}_{0}, \undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}} \leq \sigma^2. \end{aligned} \label{eq:bp} \end{equation}$ While conceptually attractive, $\mathrm{BPDN}(\sigma)$ does not lend itself to develop tractable algorithms. Following early work by van den Berg and Friedlander (2008), we interchange the objective and constraint to arrive at an extension—remember the formulation now includes the source as an unknown, which appears nonlinearly in the $\mathrm{BPDN}(\sigma)$ program, of the Least Absolute Shrinkage and Selection Operator (LASSO, Tibshirani (1996)) formulation: $\begin{equation} \begin{aligned} \mathrm{LASSO}(\tau): \quad & \min_{\vector{x},\vector{w}} f(\vector{x},\vector{w})\doteq\sum_{i\in\Omega} \sum_{j\in\Sigma} \|\undvec{d}_{i,j}-{w}_i\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}} \\ & \mathrm{subject\ to} \quad \|\vector{x}\|_1\leq\tau. \end{aligned} \label{eq:lso} \end{equation}$ As shown by A. Y. Aravkin et al. (2013), the set of minimizers of problem $\mathrm{BPDN}(\sigma)$ and $\mathrm{LASSO}(\tau)$ coincide, given the condition that each minimizer of problem $\mathrm{BPDN}(\sigma)$ satisfies its constraint to equality (i.e., $f(\vector{x},\vector{w}) = \sigma^2$ ). Not coincidently, when this condition is met, each minimizer also satisfies the constraint of the $\mathrm{LASSO}(\tau)$ problem to equality, meaning that we can look for such a $\tau$ that the infimum of $f(\vector{x},\vector{w})$ with $|\vector{x}|_1\leq\tau$ has a value that equals $\sigma^2$ . Because the objective function of problem $\mathrm{LASSO}(\tau)$ adopts the canonical form of a separable least-squares problem (Golub and Pereyra, 2003), we can now solve the problem using variable projection combined with projection onto a convex set. In the next few sections, we will first discuss how we solve problem $\mathrm{LASSO}(\tau)$ with a given $\tau$ , and then discuss how we compute the right $\tau$ so that problem $\mathrm{LASSO}(\tau)$ converges to the same solution as problem $\mathrm{BPDN}(\sigma)$ .

Variable projection

To incorporate variable projections for the source wavelet into $\ell_1$ -norm sparsity promoting imaging, let us first consider iterations that involve soft thresholding that undergirds this type of $\ell_1$ -norm optimization. Model parameters at the $k^{th}$ iteration are in this case projected onto the $\ell_1$ -norm ball $\|\vector{x}\|_1\leq \tau$ , and the involved projected-gradient step is given by $\begin{equation} \vector{x}^{k+1}=\mathrm{P}_\mathcal{X}\left[\vector{x}^k+\lambda\nabla_{\vector{x}} f(\vector{x},\vector{w})\bigg|_{\vector{x}=\vector{x}^k,\vector{w}=\vector{w}^k}\right]. \label{eq:pg} \end{equation}$ In this expression, $\lambda$ is a line-search parameter, $\nabla_{\vector{x}} f(\vector{x},\vector{w})\bigg|_{\vector{x}=\vector{x}^k,\vector{w}=\vector{w}^k}$ is the gradient of the objective function in Equation $\eqref{eq:lso}$ with respect to $\op{x}$ at the $k^{\mathrm{th}}$ iteration, and $\mathrm{P}_\mathcal{X}$ denotes the projection onto the feasible set $\mathcal{X}=\{\vector{x}:\|\vector{x}\|_1\leq\tau\}$ .

Contrary to conventional $\ell_1$ -norm promoting imaging where $\vector{w}$ is assumed to be known, we need to evaluate this gradient $\nabla_{\vector{x}} f(\vector{x},\vector{w})\bigg|_{\vector{x}=\vector{x}^k,\vector{w}=\vector{w}^k}$ with variable projection at each iteration for unknown $\vector{w}^k$ ’s. During each iteration, we accomplish this by solving the following optimization problem for each frequency: $\begin{equation} \min_{{w}_i}\quad\sum_{j\in\Sigma}\|\undvec{d}_{i,j}-{w}_i\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}}, \label{eq:vp} \end{equation}$ which permits the following closed-form solution for the source wavelet (Pratt, 1999): $\begin{equation} \widetilde{w}_i (\vector{x}) = \frac{\sum_{j\in\Sigma}\langle\undvec{d}_{i,j},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}{\sum_{j\in\Sigma}\langle\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}. \label{eq:src} \end{equation}$ Here, the angular brackets $\langle\cdot,\cdot\rangle$ denote the inner product of two vectors, and $\widetilde{w}_i (\vector{x}), i\in\Omega$ denote complex-valued estimates of the source wavelet at different angular frequencies. The above solution enables us to eliminate the unknown source wavelet term in problem $\mathrm{LASSO}(\tau)$ , rendering the problem to solely and nonlinearly depend on $\vector{x}$ : $\begin{equation} \begin{aligned} & \min_{\vector{x}} \overline{f}(\vector{x})\doteq\sum_{i\in\Omega} \sum_{j\in\Sigma} \|\undvec{d}_{i,j}-\widetilde{w}_i(\vector{x})\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}} \\ & \mathrm{subject\ to} \quad \|\vector{x}\|_1\leq\tau. \end{aligned} \label{eq:llso} \end{equation}$ At first inspection, the above problem becomes more complicated than $\mathrm{LASSO}(\tau)$ as we need to evaluate the derivative of $\widetilde{w}_i(\vector{x}), i\in\Omega$ with respect to $\vector{x}$ . However, Aleksandr Y. Aravkin and van Leeuwen (2012) (Corollary 2.3) have proved that a stationary point of the above problem remains a stationary point of problem $\mathrm{LASSO}(\tau)$ . Furthermore, we have (Aleksandr Y. Aravkin and van Leeuwen, 2012, Theorem 2.1) $\begin{equation} \nabla_{\vector{x}}\overline{f}(\vector{x})=\nabla_{\vector{x}}f(\vector{x},\widetilde{\vector{w}}(\vector{x})). \label{eq:grad} \end{equation}$ The above equation means that in each iteration, after vector $\vector{x}$ is modified according to Equation $\eqref{eq:pg}$ , we can compute $\nabla_{\vector{x}}\overline{f}(\vector{x})$ by evaluating $\nabla_{\vector{x}}f(\vector{x},\vector{w})$ of problem $\mathrm{LASSO}(\tau)$ at $\vector{w}=\widetilde{\vector{w}}(\vector{x})$ via Equation $\eqref{eq:src}$ .

While there is no guarantee the variable projection approach will yield the globally optimal solution as the problem remains non-convex, numerical experiments show that in most cases it enables the optimization program to converge within less iterations (Golub and Pereyra, 2003). Rickett (2013) also showed its superior performance over the simultaneous descent method in source estimation for full-waveform inversion applications. Furthermore, the projections themselves (cf. Equation $\eqref{eq:src}$ ) are computationally affordable as they do not involve additional expensive operations such as wave-equation solves.

Relaxing the $\ell_1$ -norm constraint

In this section, we describe our strategy to select the right $\tau$ so we actually solve problem $\mathrm{BPDN}(\sigma)$ by solving problem $\mathrm{LASSO}(\tau)$ . We follow the methodology proposed by van den Berg and Friedlander (2008) in $\mathrm{SPG}\ell_1$ , where a series of $\tau$ ’s are found by solving a root-finding problem on the Pareto trade-off curve. With the unknown source wavelet, we find the $\tau$ ’s by solving the value function $\nu(\tau)\doteq\infim f(\vector{x},\vector{w})|_{|\vector{x}|_1\leq\tau}=\sigma^2$ , which yields the following $\ell_1$ -norm constraint for the $(l+1)^{th}$ LASSO subproblem using the Newton’s method with an initial guess of $\tau^0=0$ : $\begin{equation} \tau^{l+1} = \tau^{l}-\frac{\nu(\tau^l)-\sigma^2}{\nu^\prime(\tau^l)}. \label{eq:nu} \end{equation}$ Using this approach, we arrive at the solution of $\mathrm{BPDN}(\sigma)$ by solving a series of $\mathrm{LASSO}(\tau)$ subproblems for gradually increasing $\tau$ ’s. While $\nu(\tau^k)$ in the above equation can be evaluated straightforwardly with the previous $\tau$ estimate, the evaluation of $\nu^\prime(\tau^k)$ is more difficult, because the nonlinearity introduced by the unknown source wavelet violates the linearity assumption on the forward model, which is needed to compute this value (A. Y. Aravkin et al., 2013). However, by treating $\vector{w}$ as fixed, we approximate $\nu^\prime(\tau^k)$ by (van den Berg and Friedlander, 2008; A. Y. Aravkin et al., 2013): $\begin{equation} \nu^\prime(\tau^k)\approx-\|\sum_{i\in\Omega}\sum_{j\in\Sigma}\op{C}\nabla\op{F}^*[\vector{m}_0,{w}_i\undvec{s}_j]\, (\undvec{d}_{i,j}-\nabla\op{F}[\vector{m}_0,{w}_i\undvec{s}_j]\op{C}^*\vector{x})\|_{\infty}. \label{eq:nuprime} \end{equation}$ With numerical examples, we will show that this approximation works reasonably well.

Acceleration by redrawing random subsets

Except for the extra variable projection step, solving $\mathrm{LASSO}(\tau)$ is similar to compressive imaging (Herrmann and Li, 2012), during which computational costs can be reduced by working with randomized subsets of data. As shown by Herrmann and Li (2012) and Tu and Herrmann (2015a), solutions of $\mathrm{BPDN}(\vector{w};\sigma)$ for a given source wavelet can be accelerated significantly by selecting new independent randomized subsets of frequencies and simultaneous sources, after each corresponding LASSO subproblem is solved. We found empirically that working with these new independent subsets leads to faster decay of the model error and to improved robustness with respect to possible linearization errors (Tu et al., 2013)—i.e., $\undvec{d}_{i,j}-\op{F}[\vector{m}_0,{w}_i\undvec{s}_j]\approx\nabla\op{F}[\vector{m}_0,{w}_i\undvec{s}_j]\delta\vector{m}$ . Both findings can be explained because independent subsets tend to break correlations between errors in the solution and the randomly selected subsets of data (Herrmann, 2012).

While these empirical findings lead to a computationally feasible and robust compressive imaging scheme, we need to justify that working with different randomized subsets does not interact adversely with the proposed source estimation approach. Aside from additional empirical evidence from various examples included below, which suggests that there is no measurable adverse effect, we argue that (i) numerical evaluation of the value function, $\nu^\prime(\tau)$ , and its derivative $\nu^\prime(\tau)$ remain similar as long as we keep the number of frequencies and simultaneous sources the same (evidenced by Herrmann and Li, 2012, Figure 2(c)); and (ii) we only draw new subsets after each $\mathrm{LASSO}(\tau^{l})$ is solved. The latter argument assumes that estimates of the source at the $l^{th}$ subproblem have converged, which is reasonable to make because the number of knowns in Equation $\eqref{eq:src}$ for each frequency far exceeds the dimensionality of the single unknown complex-value for the source.

Resolving the scaling ambiguity with multiples

Despite the fact that there are indications that we arrived at a practical algorithm to estimate source wavelet as well as high-resolution least-squares images, fundamental issues related to the ambiguity with respect to amplitude scalings remain, which are intrinsic to any blind-deconvolution type of problems where both the source wavelet and the seismic image are unknown (Stockham Jr et al., 1975). However, compared to source estimation in data space using the convolution model (Ulrych et al., 1995), source estimation in the image space does not suffer from ambiguities with respect to global phase shifts, which can be explained by the multiplicity of data and move-out characteristics that are specific to the background-velocity model being used. As stated in the introduction, we assume this background model to be given and to be relatively accurate.

Unfortunately, the same observation does not apply to the ambiguity with respect to amplitude scalings, which correspond mathematically to an invariance of our objective functions with respect to amplitude scalings by $\alpha\in\mathbb{R}^+$ —i.e., we have $\begin{equation} f(\vector{x},\vector{w})=f(\frac{1}{\alpha}\vector{x},\alpha\vector{w}). \label{eq:amab} \end{equation}$ This invariance results in amplitude ambiguity in the source estimation, which originates from the fact that the forward model in $\mathrm{LASSO}(\tau)$ , i.e., ${w}_i\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}$ , is bi-linear (i.e., when one variable is fixed, the forward model is linear w.r.t. the second variable) w.r.t. the source wavelet $\vector{w}$ and the solution vector $\vector{x}$ .

Unfortunately, including sparsity constraints via the $\ell_1$ -norm or even via the non-convex $\ell_1/\ell_2$ -norm (Esser et al., 2015), are inadequate to resolve the above amplitude ambiguity. We therefore alternatively resort to the physics of the free surface. By incorporating the free surface, our problem becomes nonlinear because the forward model now includes a “feedback loop” (Verschuur et al., 1992), which generates surface-related multiples from primaries scaled by the (unknown) source wavelet. Extending the physics in this way allows us to mitigate the scaling ambiguity by using observed surface-related multiples to estimate the source wavelet as proposed during EPSI (G. J. A. van Groenestijn and Verschuur, 2009; Lin and Herrmann, 2013). Following recent work by Tu and Herrmann (2015a), we propose to do this by incorporating predictions of surface-related multiples into our formulation for compressive imaging.

Compressive imaging with total upgoing wavefields

If we ignore internal multiples, the total upgoing wavefield, including primaries and surface-related multiples, can be modelled by including the observed upgoing wavefields $\vector{u}_{i,j}$ at the water surface as areal sources in the linearized Born scattering operator (Tu and Herrmann, 2015a), yielding $\begin{equation} \vector{u}_{i,j} \approx \nabla\op{F}[\vector{m}_0, w_i\vector{s}_j-\vector{u}_{i,j}]. \label{eq:lfmm} \end{equation}$ As the water surface has a reflection coefficient of approximately $$-1$$ , we place the downgoing spatially impulsive source wavefields $w_i\vector{s}_j$ by “areal” sources $w_i\vector{s}_j-\vector{u}_{i,j}$ , i.e., we include the downgoing receiver wavefields at the water surface $-\vector{u}_{i,j}$ into the source wavefields. We obtain $\vector{u}_{i,j}$ after careful pre-processing to the observed seismic data, such as applying source-receiver reciprocity (to fill in missing traces), up-down decompositions (for receiver-side deghosting), and extrapolation of the upgoing wavefield from the receiver level to the free surface (see e.g., Verschuur et al., 1992). Note that source ghosts should be kept intact in the data (Verschuur et al., 1992). As a result, we arrive at a formulation that remains conducive to sparsity-promoting imaging with source estimation via variable projection. Because prediction for the multiples are carried out by the wave-equation solver, this forward model is also computationally viable compared to processing work flows that involve separate multiple-prediction/separation and RTM-imaging procedures.

To arrive at our final formulation, we first replace our objective by $\begin{equation} f(\vector{x},\vector{w})=\sum_{i\in\Omega} \sum_{j\in\Sigma}\|\undvec{u}_{i,j}-\nabla\op{F}[\vector{m}_{0},{w}_i\undvec{s}_{j}-\undvec{u}_{i,j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}}, \label{eq:fmul} \end{equation}$ which we obtain by substituting the primary wavefield with the total upgoing wavefield and the spatially impulsive sources with areal sources containing the total downgoing wavefield at the surface. Given this new definition for the objective, we proceed by solving the $\mathrm{LASSO}(\tau)$ subproblems with variable projections. For this end, we solve for all (simultaneous) sources and for each frequency the following problem: $\begin{equation*} \min_{{w}_i}\quad\sum_{j\in\Sigma}\|\undvec{u}_{i,j}-\nabla\op{F}[\vector{m}_{0},-\undvec{u}_{i,j}]\op{C}^*\vector{x}-{w}_i\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\|_{\mathrm{2}}^{\mathrm{2}},\quad i\in\Omega, \end{equation*}$ which in turn permits the following analytic solution: $\begin{equation} \tilde{{w}}_i (\vector{x}) = \frac{\sum_{j\in\Sigma}\langle\undvec{u}_{i,j}-\nabla\op{F}[\vector{m}_{0},-\undvec{u}_{i,j}]\op{C}^*\vector{x},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}{\sum_{j\in\Sigma}\langle\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}. \label{eq:srcmul} \end{equation}$ To arrive at the modified solution for the source wavelet above, we substitute the primary-only wavefield by an estimate for the primaries given by the total total upgoing wavefield minus the current prediction for the surface-related multiples—i.e., $\undvec{d}_{i,j}\mapsto \undvec{u}_{i,j}-\nabla\op{F}[\vector{m}_{0},-\undvec{u}_{i,j}]\op{C}^*\vector{x}$ . The multiple predictions do not require information on the source wavelet; rather, they are carried out by solving wave-equations with the areal sources given by $-\undvec{u}_{i,j}$ . With these substitutions, we are able to estimate the source wavelet by variable projection at additional costs of a single linearized forward modelling operation to predict the multiples.

In return of the increased computational costs, including surface-related multiples during imaging with source estimation has several key advantages, namely (i) including the multiples mitigates the ambiguity in the source estimation, which leads to a scaling of the source wavelet (and the image) that leads to the prediction and separation of the primary and multiple wavefields, which makes it possible to map multiple energy onto the true image; (ii) in seismic imaging, multiples provide extra illumination coverage from primaries (as each receiver acts as a virtual source), especially along the cross-line direction in 3D seismic data (Long et al., 2013; Tu and Herrmann, 2015b).

Putting it all together

With the building blocks of our sparsity-promoting imaging with source estimation and multiples in place, we summarize our proposed imaging scheme in Algorithm (1).

Input and initialization [Line 1-6]

As discussed above, our proposed inversion approach requires the knowledge of a reasonably accurate background velocity model $\vector{m}_0$ . As discussed in Tu et al. (2013), we choose a zero $\sigma$ to prevent the optimization algorithm from being terminated prematurely (Line 3). The initial guess of the source wavelet is simply an impulse at zero time, which corresponds to a flat Fourier spectrum with unit amplitude and zero phase (Line 6). Both the solution vector $\vector{x}$ and the sparsity level $\tau$ are simply initialized as zeros.

Main loop [Line 7-16]

In the main loop we solve a series of $\mathrm{LASSO}(\tau^l)$ subproblems. For each subproblem, we draw new independent subsets of randomized frequencies and simultaneous sources (Line 8). We update the sparsity parameter $\tau^l$ using Newton’s method (Line 9), and solve for the model parameters collected in vector $\vector{x}$ (Line 10) as well as the source wavelet collected in vector $\vector{w}$ (Line 13) by variable projection. Note that we do not impose any assumption on the phase of the source wavelet (e.g., minimum phase assumption as used in Robinson, 1967) during source estimation. We terminate the optimization program when the pre-specified maximal number of iterations $k_{\mathrm{max}}$ is reached.

1. Input:
2. total upgoing wavefield $\vector{u}$ , background velocity model $\vector{m}_0$ ,
3. tolerance $\sigma=0$ , iteration limit $k_{\mathrm{max}}$
4. Initialization:
5. iteration index $k\leftarrow 0$ , LASSO subproblem index $l\leftarrow 0$ , $\tau^0\leftarrow 0$ , $\vector{x}^0\leftarrow\mathbf{0}$
6. $$w_i=1$$ for all $i\in 1,\cdots,n_f$
7. while $k<k_{\mathrm{max}}$ do
8.      $\Omega_l$ , $\Sigma_l$ , $\undvec{u}_{i,j}$ , $\undvec{s}_j\leftarrow$ new independent draw
9.      $\tau^l\leftarrow$ determined from $\tau^{l-1}$ and $\sigma$ using Newton’s method
10.      $\vector{x}^l\leftarrow\begin{cases}\mathop{\mathrm{argmin}}_{\vector{x}}\sum_{i\in\Omega_l, j\in\Sigma_l}\|\undvec{u}_{i,j}-\nabla\monoop{F}[\vector{m}_0,w_i(\vector{x})\undvec{s}_{j}-\undvec{u}_{i,j}]\op{C}^*\vector{x}\|_2^2 \\ \mathrm{subject\ to\ } \|\vector{x}\|_1\leq \tau^l \end{cases}$
11.           //warm start with $\vector{x}^{l-1}$ , solved in $$k^l$$ iterations
12.           //in each iteration, update the source by
13.            ${w}_i(\vector{x}) = \frac{\sum_{j\in\Sigma}\langle\undvec{u}_{i,j}-\nabla\op{F}[\vector{m}_{0},-\undvec{u}_{i,j}]\op{C}^*\vector{x},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}{\sum_{j\in\Sigma}\langle\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x},\nabla\op{F}[\vector{m}_{0},\undvec{s}_{j}]\op{C}^*\vector{x}\rangle}$
14.      $k\leftarrow k+k^l, l\leftarrow l+1$
15. end while
16. Output: model perturbations $\delta\vector{x}=\op{C}^*\vector{x}$

Algorithm1Fast imaging with source estimation and multiples

Examples

To validate the efficacy of the proposed imaging scheme with source estimation, we conduct a series of synthetic examples designed to illustrate key aspects of our algorithm including robustness to noise or modelling errors, resolving scaling ambiguity with multiples, and sensitivity to the initial guess of the source wavelet.

Imaging with primaries only

Most conventional seismic data processing work flows use the primary wavefield as the input for migration, i.e., after a de-multiple procedure to the observed data, for example using SRME (Verschuur et al., 1992). We therefore first apply our method to primary-only data to demonstrate its advantages over RTM when used in conventional data work flows.

Experiment setup

We use a 2D slice of the SEG/EAGE salt model. We pad 10 grid points at the top of the model so that the water layer becomes slightly thicker, to better retain the water velocity near the surface when we smooth the true model to get the background model. In this way we can model and therefore remove the direct waves more accurately. The true and the background velocity models are shown in Figure 1. The model is 3.9 km deep and 15.7 km long, with a grid spacing of 24.38 m (80 ft). We use a fixed-spread configuration, and put 323 co-located sources and receivers with a 48.77 m lateral grid spacing (i.e., every other grid point) at a depth of 24.38 m (i.e., the second vertical grid point). We use a Ricker wavelet with a peak frequency of 5 Hz, and the peak of the wavelet is at 0.25 s. We record for 8 seconds, which in the frequency domain yields 96 discretized frequencies up to 12 Hz (roughly the highest frequency that can be used for a 80-ft grid spacing to avoid numerical dispersion).

Ideal case

To rule out the influence of any imperfection in the data, we first test our method with an idealized setup, i.e., we synthesize and invert the primary-only data using the same linearized modelling engine. To obtain a baseline image, we run a conventional RTM by assuming that we know the true source wavelet (Figure 2a). To remove the low-wavenumber artifacts in the RTM image, we apply a high-pass filtering along the depth dimension (Mulder and Plessix, 2003). Using the true source wavelet, we also obtain a least-squares migrated image using the compressive-imaging approach (Herrmann and Li, 2012), shown in Figure 2b. To demonstrate the importance of using an accurate source wavelet, we obtain the second least-squares migrated image using compressive imaging with a wrong source wavelet, shown in Figure 2c. The wrong source wavelet has a time advance of 0.1 s (i.e., a phase shift) compared with the true source wavelet. We obtain the third least-squares migrated image using the proposed method, shown in Figure 2d.

We can see that compared with conventional RTM (Figure 2a), the proposed approach produces an image of higher spatial resolution given the true source wavelet (Figure 2b). The higher spatial resolution is achieved because the least-squares migration, through many iterations, gradually removes the effect of the Gauss-Newton Hessian on the seismic image (we refer to the “basic equations” section of Nemeth et al., 2000 for more details). However, the image quality can be significantly compromised when the wavelet is wrong. In this case the shifted phase of the wavelet not only produces an image contaminated with subsampling-related noises, but also leads to a shift of the subsurface structures in depth (Figure 2c), which can result in erroneous interpretations. With the proposed source-estimation approach, we obtain a faithful image (Figure 2d), which is comparable to the image obtained using the true source wavelet (Figure 2b).

Remarks on the experiment: In terms of computational cost, we use 16 randomly selected frequencies, 16 simultaneous sources, and 60 iterations for all inversion results; as a result, the four images in Figure 2 involve roughly the same number of wave-equation solves. Note that source estimation by evaluating Equation $\eqref{eq:src}$ does not involve extra wave-equation solves other than the ones needed for linearized forward modelling required by the imaging step. For all inversion results in this and later sections, after the inversion is finished, we apply a curvelet thresholding to remove residual incoherent noises in the image (Herrmann et al., 2008b). The threshold is chosen in such a way that the thresholded noise does not contain noticeable coherent energy. As our solution vector $\vector{x}$ is already in the curvelet domain, extra computation incurred in this thresholding step is minimal.

As a quality-control (QC) procedure, we also plot in Figure 3a and 3b the spectra of the estimated source wavelet. For an intuitive interpretation, we also obtain the waveform of the wavelet and show it in Figure 3c. Because reliable source estimates output by the algorithm only exist at frequencies that are sampled in the last $\mathrm{LASSO}(\tau)$ subproblem, we obtain the waveform of the source wavelet by first computing the spectrum of the source wavelet at all frequencies using Equation $\eqref{eq:src}$ with the inverted seismic image, and then applying the inverse Fourier transform to the spectrum (this procedure is also applied to all later examples). Because of the scaling ambiguity (of the image as well as the source wavelet) we discussed above, we normalize the amplitude (i.e., the $\ell_{\inf}$ norm) of both the true and the estimated source wavelets to compare them. The source estimates are highly accurate except for the absolute amplitude scaling.

More realistic case

Compared with data synthesized by linearized modelling, field seismic data are contaminated with all sorts of incoherent and coherent noises. While most incoherent noises can be removed in data pre-processing, coherent noises, such as internal multiples, are difficult to identify and remove and can therefore degrade the quality of the final seismic image. In least-squares migrations, as the objective is to match the predicted linearized data and the observed data, other sources of errors also include linearization errors (i.e., as the name suggests, the errors incurred during linearization of the wave equation) and errors by approximating earth physics using, for example, the acoustic wave-equation. To mimic some of these noises and errors, which we will generally refer to as “modelling errors” hereafter, we choose to use iWave, a 2D time-domain finite-difference acoustic modelling engine (Terentyev et al., 2014), to simulate the observed data, and use our in-house frequency-domain finite-difference acoustic modelling engine to invert the data.

To test the robustness of our method to these “modelling errors” as defined above, we compare the following examples. We again get the baseline image using conventional RTM with the true wavelet (Figure 4a). Next we obtain two least-squares migrated images: one with the true source wavelet, shown in Figure 4b; the second one with source estimation, shown in Figure 4c.

We observe that, in the presence of these modelling errors in the data, (i) the proposed source-estimation method still yields an image that is comparable to the one obtained with the true source wavelet; (ii) both images are still of higher spatial resolution than the conventional RTM image. The modelling errors (especially internal multiples in this case because of the high velocity contrast around the salt boundary) do result in some imaging artifacts beneath the salt structure, but they do not particularly cause problems for the source-estimation approach. The above observations lead to the conclusion that the proposed source-estimation approach is relatively robust with respect to coherent modelling errors in the primary-only data.

Remarks on the experiment: In generating the data with iWave, we use an absorbing boundary condition at the surface to avoid generating surface-related multiples. The observed data are the difference between the data modelled with the true model and the background model, and therefore contain internal multiples. We use the same inversion parameters as the above set of examples, and as a result, the computational costs of the three images in Figure 4 remain roughly the same.

In Figure 5 we again compare the true wavelet and the estimated wavelet as a QC procedure. We can see that in this case the accuracy of the estimated source wavelet degrades gracefully compared with the idealized case, although there exists some difference, mainly in the phase (see Figure 5b and 5c), between the two wavelets. To understand the cause of this difference, we compare in Figure 5d traces that contain strong top-salt reflections from two data sets: one synthesized using our linearized frequency-domain modelling and another one using iWave. We can see that the slight phase shift can also be observed from the data, most likely due to linearization errors especially by smoothing the salt structure and the difference between iWave and our in-house modelling engine. As the wavelet is estimated to fit the observed data, it is understandable that it contains such a phase error.

As demonstrated by the two sets of examples above, our proposed source estimation method, when applied to conventional primary-only data, produces seismic images that are comparable to images obtained using true source wavelets. The method is also relatively robust to modelling errors such as linearization errors and difference between different modelling engines. However, as we discussed above, the scaling ambiguity cannot be solved by imaging with primaries alone because of the blind-deconvolutional nature of the problem. Next we will demonstrate how we resolve the ambiguity using surface-related multiples.

Scaling images with multiples

To validate our proposal to incorporate surface-related multiples in the imaging procedure to resolve the amplitude ambiguity, we apply the proposed method to data that contain surface-related multiples. As above, we study the performance of the proposed approach both with an ideal data set and with a more realistic data set.