--- title: Wavefield recovery with limited-subspace weighted matrix factorizations author: | Yijun Zhang^1^, Shashin Sharan^2^, Oscar Lopez^4^, Felix J. Herrmann^1,2,3^ \ ^1^ Department of Electrical & Computer Engineering, Georgia Institute of Technology \ ^2^ Department of Earth & Atmospheric Sciences, Georgia Institute of Technology\ ^3^ School of Computational Science and Engineering, Georgia Institute of Technology \ ^4^ Optimization and Uncertainty Quantification, Sandia National Laboratories bibliography: - abstract.bib --- ### Abstract: Modern-day seismic imaging and monitoring technology increasingly rely on dense full-azimuth sampling. Unfortunately, the costs of acquiring densely sampled data rapidly become prohibitive and we need to look for ways to sparsely collect data, e.g. from sparsely distributed ocean bottom nodes, from which we then derive densely sampled surveys through the method of wavefield reconstruction. Because of their relatively cheap and simple calculations, wavefield reconstruction via matrix factorizations has proven to be a viable and scalable alternative to the more generally used transform-based methods. While this method is capable of processing all full azimuth data frequency by frequency slice, its performance degrades at higher frequencies because monochromatic data at these frequencies is not as well approximated by low-rank factorizations. We address this problem by proposing a recursive recovery technique, which involves weighted matrix factorizations where recovered wavefields at the lower frequencies serve as prior information for the recovery of the higher frequencies. To limit the adverse effects of potential overfitting, we propose a limited-subspace recursively weighted matrix factorization approach where the size of the row and column subspaces to construct the weight matrices is constrained. We apply our method to data collected from the Gulf of Suez, and our results show that our limited-subspace weighted recovery method significantly improves the recovery quality. \vspace*{-0.45cm} ## Introduction Seismic data acquisition plays a key role in the initial phase of oil & gas exploration. It also represents a significant budget item for monitoring of carbon sequestration. For these reasons, it is a challenge to come up with new acquisition methodologies that improve acquisition productivity [@mosher2014increasing] without sacrificing data quality. Randomized acquisition according to the principles of compressive sensing [@herrmann2012fighting] in combination with large-scale wavefield reconstruction algorithms [@kumar2015efficient] has proven a viable tool to improve the acquisition productivity both in marine and land seismic settings. So far, many of the employed approached of wavefield reconstruction are based transform-domain sparsity, which is deigned to explore local smoothness typically in small windows in up to five dimensions. While these approaches have been applied successfully on production data, they do no exploit redundancies present in the data over long distances. Recovery techniques based on low-rank matrix factorizations [@kumar2015efficient] do not suffer from this shortcoming because this method works with monochromatic frequency slices that contain data from the complete survey instead of working within small windows limiting the apperture. By organizing the data in the appropriate domain, e.g. midpoint-offset domain for seismic lines, monochromatic frequency slices permit approximations in low-rank form, which can be used to recover fully sample wavefields from subsampled data. While low-rank factorizations have been employed successfully for low and midrange frequencies, their performance deteriorates at high frequencies because monochromatic frequency slices can no longer be approximated accurately by low-rank factorizations. In this work, we overcome this problem by using the fact that factorizations at neighboring frequencies live in close-by subspaces. As described in early work by @aravkin2013robust; @eftekhari2018weighted\, this property can be exploited by introducing matrix weights defined in terms of factorizations of near-by frequency slices. Recent work by @zhang2019high took this initial a step further by proposing a recursive approach where factorizations of frequency slices at lower frequencies are used as weight for factorizations at the higher frequencies starting at the low frequencies and working its way up. While this approach has had some success (see e.g. @zhang2019high), there is challenge related to the fact that high frequencies require higher rank factorizations and this can lead to overfitting when using this higher rank throughout. We avoid this overfitting, by adapting the rank of the weighting matrices such that overfitting is avoided. We do this by actively limiting the row and column subspaces of the weight matrices. Because we avoid overfitting, we are able to further improve the wavefield recovery. We also introduce an alternative formulation where the weight matrices are moved from the constraint, as in @kumar2015efficient, to the data misfit objective, which leads to a significant improvement (``20`` to ``25`` times speedup) computational efficiency. We organize our paper as follows. First, we review the recursively weighted wavefield recovery via matrix factorization including the new formulation where the weight appear in the data misfit term. Next, we discuss how to limit the subspace of our weighted matrix factorizations. We conclude by demonstrating our approach on a field data example from the Gulf of Suez, which shows improved recovery quality compared to conventional recursively weighted matrix completion. \vspace*{-0.45cm} ## Methodology We start by introducing wavefield reconstruction via weighted matrix factorization. To improve computational efficiency, we move the weight matrices to the data misfit term so we no longer have to carry out numerically expensive weighted projections as in [@aravkin2013robust]. Aside from allowing for a much more computationally efficient implementation, this alternative formulation also forms the basis for our limited-subspace approach designed to prevent overfitting at the low frequencies. ###Weighted low-rank matrix factorization Our proposed extension to wavefield reconstruction via recursively weighted matrix factorization derives from earlier work by @kumar2015efficient, @aravkin2013robust, and @zhang2019high\, where we solve ```math #eqWlrmf & \minim_{\vector{X}_i} \quad \|\vector{Q}\vector{X}_i\vector{W}\|_*\\ & \text{subject to} \quad \|\mathcal{A}(\vector{X}_i) - \vector{b}_i\|_2 \leq \tau ``` to within a noise-level dependent data misfit tolerance ``\tau``\. In this expression, the matrix ``\vector{X}_i`` corresponds to a monochromatic frequency slice in the midpoint/offset domain (in case of ``2``D) at the ``i\mathrm{th}`` frequency (``i \in [1, \cdots , n_f]`` with with ``n_f`` the number of frequencies). During the wavefield recovery, fully sampled frequency slices are represented by the complex valued matrix, ``\vector{X} \in \mathbb{C}^{n_f \times n_m \times n_h}`` where ``n_m`` is the number of midpoints and ``n_h`` the number of offsets. The symbol ``\mathcal{A}(\cdot)`` stands for the subsampling operator, which collects monochromatic data at the observed source/receiver combinations into the vector ``\vector{b}_i``. Given these observations, we solve for the fully sampled ``\mathbf{X}_i`` for each frequency by minimizing equation [#eqWlrmf] with weight matrices ``\vector{Q}`` and ``\vector{W}`` given by ```math #eqprojl \vector{Q} = {w}_{1}\vector{U}\vector{U}^H + \vector{U}^\perp \vector{U}^{{\perp}{H}} ``` and ```math #eqprojr \vector{W} = {w}_{2}\vector{V}\vector{V}^H + \vector{V}^\perp \vector{V}^{{\perp}{H}}. ``` In these expressions for the weight matrices, the ``\vector{U} \in \mathbb{C}^{n_m \times r}`` and ``\vector{V}\in \mathbb{C}^{n_h \times r}`` are the column and row subspaces that derive from the low-rank factorization of the nearby frequency slice. ``\vector{U}`` and ``\vector{V}`` have orthonormal columns that span top column and row subspaces of nearby frequency slice. Because these weight matrices include information on the subspaces of the current factorization, they serves as prior information aiding the wavefield recovery via the weighted nuclear norm minimization (denoted by ``\|\mathbf{QXW}\|_\ast=\sum_{j=1}^{r}\sigma_j`` with ``\sigma_j`` the ``j^{\text{th}}`` singular value). Depending on whether we have confidence in the fact that the neighboring frequency slice has an overlapping subspace, we chose the weights ``w_1`` and ``w_2`` close to ``0`` if we have confidence and close to ``1`` if we do not. While the above weighted formulation has resulted in major improvements in the recovery when reliable information on a neighboring frequency slice is available [@kumar2015efficient, @aravkin2013robust, and @zhang2019high], the minimization in equation #eqWlrmf is complicated by the presence of the weighting matrices in the nuclear norm objective. As a result, the minimization becomes computationally expensive. To avoid this complication, we replace the optimization variable by ``\vector{\bar{X}}_{i} = \vector{Q}\vector{X}_i\vector{W}``, and rewrite equation #eqWlrmf as ```math #eqWlrmfv2 & \minim_{\vector{\bar{X}}_i} \quad \|\vector{\bar{X}}_i\|_* \\ & \text{subject to} \quad \|\mathcal{A}({\vector{Q}^{-1}}\vector{\bar{X}}_i{\vector{W}^{-1}}) - \vector{b}_i\|_2 \leq \tau ``` where the modified weighting matrices ```math #eqWrow {\vector{Q}^{-1}} = \frac{1}{w_1} \vector{U}\vector{U}^H + \vector{U}^\perp \vector{U}^{{\perp}{H}} ``` and ```math #eqWcol {\vector{W}^{-1}} = \frac{1}{w_2} \vector{V}\vector{V}^H + \vector{V}^\perp \vector{V}^{{\perp}{H}} ``` are moved from the objective to the data misfit constraint. To reflect that we changed the problem, we introduced barred quantities from which the solution original solution can be readily computed---i.e., we recover the solution ``\vector{X}_i=\vector{Q}^{-1}\vector{\bar{X}}_i\vector{W}^{-1}`` since ``\vector{\bar{X}}_i=\vector{Q}\vector{X}_i\vector{W}`` solves the above optimization problem. Compared to equation #eqWlrmf\, this new formulation does not require nuclear norm projections onto weighted matrices while its solution is equivalent to equation #eqWlrmf\. Like the original formulation, our new formulation lends also itself to be cast into a low-rank (``r\ll \max(n_m,n_h)``) factorized form so that expensive SVDs are avoided in the nuclear norm. After factorization our wavefield reconstruction involves ```math #eqwlrf & \minim_{\vector{\bar L}_i, \vector{\bar R}_i} \quad \frac{1}{2} {\left\| \begin{bmatrix} \vector{\bar L}_i \\ \vector{\bar R}_i \end{bmatrix} \right\|}_F^2 \\ & \text{subject to} \quad \|\mathcal{A}{({\vector{Q}^{-1}} \vector{\bar L}_i \vector{\bar R}_i^{H} {\vector{W}^{-1}})} - \vector{b}_i\|_{2} \leq \epsilon, ``` where the symbol ``^{H}`` denotes the Hermitian transpose and ``\|\cdot\|_F`` is the Frobenius norm (2-norm of the vectorized matrix) [@kumar2015efficient; @aravkin2013robust; @zhang2019high]. Compared to the original representation for frequency slices, the above factored form is compressed since it entails the low-rank pair ``\{\mathbf{\bar L}_i,\,\mathbf{\bar R}_i\}`` ,where ``\vector{\bar{X}}_i=\vector{\bar L}_i \vector{\bar R}_i^{H}``, and does not rely on storage and manipulation of the original and dense optimization variable ``\mathbf{X}_i`` or ``\mathbf{\bar X}_i``. Despite gains in computation, because of the factored form and redefined data misfit term, challenges remain with recursive weighted matrix factorizations [@zhang2019high] at the high frequencies and as we will show these have to do with overfitting. ###Limited subspace weighted implementation To reduce approximation errors at the high frequencies, we can increase the rank of the factorization throughout. While increasing the rank leads to better approximations at the high frequencies adapting this higher rank at the lower frequencies can lead to overfitting. The resulting poor reconstructions at the lower frequencies can in turn have a detrimental effect on the reconstruction at higher frequencies, which information from the lower frequencies as the recursive algorithm sweeps from the low to the high frequencies. By choosing the rank for the limited subspace, we reduce the size of the subspaces of the weight matrices to prevent overfitting at the lower frequencies. In equations [#eqprojl], [#eqprojr], [#eqWrow] and [#eqWcol], we notice that the size of the weight matrices ``\vector{Q}`` and ``\vector{W}`` are independent of rank ``r``. Therefore, we can use a limited subspace to remove the influence of overfitting and get better results. By limited subspace, we mean that at a given frequency slice, instead of using a rank ``r`` for row and column subspaces ``\vector{U}`` and ``\vector{V}`` respectively, we can use a lower rank ``r_s``. In this way, we can choose higher rank ``r`` to reconstruct each frequency but use lower rank ``r_s`` to construct the weight matrices (``\vector{Q}`` and ``\vector{W}``). By choosing smaller rank for the subspaces, we mitigate the negative influence of overfitting. Therefore, in the limited-subspace method, we are free to choose smaller values for the ``r_s`` for each frequency slice and higher values for the rank ``r`` for the factorization itself (not for the weights) for each frequency. \vspace*{-0.45cm} ## Numerical Experiments To demonstrate the advocacy of the proposed method, we use ``2``D field seismic data acquired in the Gulf of Suez with number of sources, ``{N}_{s}=355``, and number of receivers, ``{N}_{r}=355``. The total number of time samples in this dataset is ``{N}_{t}=1024`` and the sampling interval is ``0.004\, \mathrm{s}``. We use a jittered subsampling [@herrmann2008non] mask to remove ``75 \%`` of the sources to obtain the subsampled data. When data is organized in the midpoint-offset domain, we know that randomized jittered subsampling method breaks the inherent low-rank property of seismic data while controlling the largest gap size of the subsampled data [@herrmann2008non]. Controlling largest gap is important because very large gaps are not suitable for wavefield reconstruction using sparsity-promotion or low-rank matrix completion. We use the weighted method as described by @zhang2019high to reconstruct frequency slices starting at ``10\, \mathrm{Hz}`` and working our way up to ``70\, \mathrm{Hz}``. We use constant rank across all the frequencies for weight matrices and matrix factorization. We base these choices for ``r_s