--- title: Large scale high-frequency wavefield reconstruction with recursively weighted matrix factorizations runninghead: High-frequency wavefield recovery author: | Shashin Sharan^1^, Yijun Zhang^2^, Oscar Lopez^4^ and Felix J. Herrmann^1,2,3^ \ ^1^ School of Earth and Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA \ ^2^ School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA \ ^3^ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA \ ^4^ Optimization and Uncertainty Quantification, Sandia National Laboratories, Albuquerque, NM bibliography: - abstract.bib --- ## Abstract: Acquiring seismic data on a regular periodic fine grid is challenging. By exploiting the low-rank approximation property of fully sampled seismic data in some transform domain, low-rank matrix completion offers a scalable way to reconstruct seismic data on a regular periodic fine grid from coarsely randomly sampled data acquired in the field. While wavefield reconstruction have been applied successfully at the lower end of the spectrum, its performance deteriorates at the higher frequencies where the low-rank assumption no longer holds rendering this type of wavefield reconstruction ineffective in situations where high resolution images are desired. We overcome this shortcoming by exploiting similarities between adjacent frequency slices explicitly. During low-rank matrix factorization, these similarities translate to alignment of subspaces of the factors, a notion we propose to employ as we reconstruct monochromatic frequency slices recursively starting at the low frequencies. While this idea is relatively simple in its core, to turn this recent insight into a successful scalable wavefield reconstruction scheme for 3D seismic requires a number of important steps. First, we need to move the weighting matrices, which encapsulate the prior information from adjacent frequency slices, from the objective to the data misfit constraint. This move considerably improves the performance of the weighted low-rank matrix factorization on which our wavefield reconstructions is based. Secondly, we introduce approximations that allow us to decouple computations on a row-by-row and column-by-column basis, which in turn allow to parallelize the alternating optimization on which our low-rank factorization relies. The combination of weighting and decoupling leads to a computationally feasible full-azimuth wavefield reconstruction scheme that scales to industry-scale problem sizes. We demonstrate the performance of the proposed parallel algorithm on a 2D field data and on a 3D synthetic dataset. In both cases our approach produces high-fidelity broadband wavefield reconstructions from severely subsampled data. ## Introduction For economical extraction of hydrocarbon resources from the subsurface and to prevent hazardous situations oil and gas companies often rely on imaging and accurate estimates of the earth's subsurface physical parameters such as wavespeed, density, etc. To obtain subsurface images and to recover these physical parameters, practitioners apply a series of processing steps to the raw seismic data collected from field during seismic data acquisition. Some of these processing steps, such as migration, demultiple, etc. require seismic data to be finely sampled and preferably on a regular grid. Unfortunately, seismic data acquisition on a fine regular grid is often prohibitively expensive and also operationally complex. Therefore, a general practice adapted by the oil and gas industry is to acquire seismic data on a coarse irregular grid, followed by wavefield reconstruction to a finer grid. In this work, we consider wavefield reconstruction from randomized samples taken from a periodic grid. The reader is referred to [@lopez2016off] for an off-the-grid extension of presented wavefield reconstruction methodology. In recent years, several methods for wavefield reconstruction have been developed. Many of these methods perform wavefield reconstruction in a transformed domain involving Fourier [@xu2005anti], Radon [@bardan1987trace], wavelet [@villasenor1996seismic], or curvelet [@herrmann2008non] domain. To a varying degree, these transforms promote sparsity on seismic data, which is a key component of wavefield reconstruction based on compressive sensing [@candes2006stable; @donoho2006cs]. While powerful, sparsity-based wavefield reconstruction does not scale well to 3D seismic where the data volumes become prohibitively large when structure is promoted along more than three dimensions, e.g. along all four source and receiver coordinates. By exploiting low-rank properties of matrices and tensors [@kumar2014GEOPemc; @oropeza2011simultaneous], some of these high dimensional challenges have been overcome by building on early work of @recht2010guaranteed\, who extended some of the ideas of compressed sensing to matrices. Similar to CS, matrix completion relies on the fact that the underlying fully sampled data organized in a matrix permits a low-rank approximation. As in CS, randomized sampling breaks the low-rank structure, which informs (convex) optimization techniques that recover wavefields by minimizing the rank. @kumar2014GEOPemc and @dasilva2015optimization exploited this property and formalized matrix-/tensor-based wavefield reconstructions that are practical for large-scale seismic datasets [@kumar2017highly]. As demonstrated in the work by @kumar2014GEOPemc, low-rank matrix completion methods work well when reconstructing seismic data at the lower angular frequencies but the recovery quality degrades when we move to the higher frequencies (``>15\, \mathrm{Hz}``). This degradation in recovery quality is caused by the property that high-frequency frequency slices are not well approximated by low-rank matrix factorization [@kumar2014GEOPemc; @aravkin2014fast]. Unfortunately, techniques such as multiple elimination and migration need access to high-frequency data to create high-fidelity artefact-free high-resolution images. This is especially true when physical properties are of interest in areas of complex geology. To meet the challenges of recovering seismic data at high frequencies, we build on earlier work by @aravkin2014fast and @eftekhari2018weighted who discussed how to improve the performance of low-rank matrix completion by including prior information in the form of weighting matrices. The weighting matrices are projections spanned by the row and column subspaces (and their complements) of a low-rank matrix factorization of a matrix that is close to the to-be-recovered matrix. As with weighted ``\ell_1``-norm minimization, these weighting matrices improve the wavefield recovery if the principle angle between the subspaces of the weighting matrices and the to-be-recovered matrix is small. Conceptually, this is the matrix counterpart of weighted ``\ell_1``-norm minimization proposed by @mansour2012improved\. @aravkin2014fast and @eftekhari2018weighted showed that wavefield recovery via matrix completion can be improved when low-rank factorizations from adjacent frequency slices are used to define these weighting matrices. @aravkin2014fast used this principle assuming access to the low-rank factorization of an adjacent frequency slice using a modified version of the spectral-projected gradient algorithm [@vandenBerg2008probing]. Also, @eftekhari2018weighted showed that for small principle angles, these weighing matrices reduce sampling requirement for successful data reconstruction by a logarithmic factor in comparison to the sampling requirement for the conventional matrix completion method. While the initial results on wavefield reconstruction via weighted matrix completion were encouraging, the presented approach was not very practical because it relied on having access to the weights. In addition, the optimization relied on a computationally expensive optimization algorithm. We overcome these shortcomings by proposing a parallelizable recursive method that uses a recently developed alternating minimization procedure [@xu2013block; @jain2013low] proposed by @lopez2015EAGErma\. Thanks to the recursive reconstruction, as first proposed by @zhang2019SEGhigh\, and the improved optimization we will demonstrate that we are able to improve the performance of our wavefield reconstruction algorithm. The outline of our paper is as follows. We first provide a short overview of the principles of wavefield reconstruction via matrix completion. We follow this brief exposition by describing the challenges of high-frequency wavefield reconstruction and how these challenges can be addressed through weighted matrix completion. After this introduction, we describe how to derive a formulation in factored form, which allows to drastically reduce the problem size rendering our approach practical for 3D seismic data. In particular, we describe how our algorithm can be parallelized and applied to a large-scale high-frequency seismic wavefield reconstruction problem. ## Wavefield reconstruction via weighted matrix completion According to the seminal work of @recht2010guaranteed, matrices that exhibit low-rank structure can be recovered from random missing entries through a nuclear norm minimization procedure. During the optimization the sum of the singular values is minimized. As long as the randomized subsampling decreases the rate of decay of the singular values, this type of minimization allows for the recovery of matrices that are well approximated by low-rank matrices when fully sampled. @kumar2014GEOPemc used this principle to recover frequency slices from seismic lines in the midpoint-offset domain or from 3D seismic data permuted in non-canonical form [@dasilva2015optimization]. In either case, the resulting frequency slice can be approximated accurately by a low-rank matrix factorization. To illustrate the underlying principle of wavefield reconstruction via matrix completion, we consider a ``12\, \mathrm{Hz}`` monochromatic frequency slice assembled from a 2D line acquired in the Gulf of Suez. Figure #SRMH\, includes the real part of this frequency slice in the source-receiver and midpoint-offset domain after removing ``75\%`` of the sources via jittered subsampling [@hennenfent2008simply]. Compared to uniform random subsampling, jittered subsampling controls the maximal spatial gap between sources, which favors wavefield reconstruction. While the monochromatic data contained in these frequency slices is comparable, the behavior of the singular values before and after subsampling is very different before and after transforming to the midpoint-offset domain (juxtapose Figures #SingulSRMHa and #SingulSRMHb). The singular values for the matricization in the shot-domain (denoted by the dashed lines) decay slowly when fully sampled and fast when subsampled, which can be understood since removing rows or columns from a matrix reduces the rank. The converse is true for data in the midpoint-offset domain, which shows a fast decay of the singular values for the fully sampled data and a slow decay after randomized subsampling. The latter creates favorable conditions for recovery via matrix completion [@kumar2014GEOPemc] via ```math #eqNucmin \vector{X} := \argmin_{\vector{Y}} \quad \|\vector{Y}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{Y}) - \vector{B}\|_{F} \leq \epsilon, ``` which promotes low-rank matrices. #### Figure: {#SRMH} ![](GoS_Figures/Obs_12Hz_SR.png){#SRMHc width=50%} ![](GoS_Figures/Obs_12Hz_MH.png){#SRMHd width=50%} : ``12.0\, \mathrm{Hz}`` frequency slice extracted from 2D seismic data acquired in Gulf of Suez. Data with ``75 \%`` missing random jittered sources in *(a)* source-receiver domain and *(b)* in midpoint-offset domain. #### Figure: {#SingulSRMH} ![](GoS_Figures/Singular_12Hz_full_SR_MH.png){#SingulSRMHa width=50%} ![](GoS_Figures/Singular_12Hz_obs_SR_MH.png){#SingulSRMHb width=50%} : Decay of singular values for ``12.0\, \mathrm{Hz}`` frequency slice in source-receiver and midpoint-offset domain for *(a)* full data and for *(b)* subsampled data with ``75 \%`` missing sources. By solving this minimization problem, we aim to recover the minimum nuclear norm (``\|\vector{X}\|_* = \sum\sigma_i`` with the sum running over the singular values of ``\vector{X}``) of the complex-valued data matrix ``\vector{X} \in \mathbb{C}^{m \times n}``, with ``m`` offsets and ``n`` midpoints. Aside from minimizing the nuclear-norm objective, the minimizer fits the observed data ``\vector{B} \in \mathbb{C}^{m \times n}`` at the sampling locations to within some tolerance ``\epsilon`` measured by the Frobenious norm---i.e., ``\|\vector{D}\|_{F} = \sqrt{\sum_j \sum_k {D}_{jk}^2}`` for a matrix ``\vector{D}.`` In this expression, the linear operator ``\mathcal{A}`` implements the sampling mask putting zeros at source (and possibly receiver) locations that are not collected in the field. Here, the matrix ``\vector{Y}`` is the optimization variable. Equation [#eqNucmin] is similar to the classic Basis Pursuit DeNoising problem [BPDN, @vandenBerg2008probing] and can be solved with a modified version of the SPG``\ell_1`` algorithm adapted for nuclear-norm minimization [@aravkin2014fast]. To solve problem [#eqNucmin], SPG``\ell_1`` solves a series of constrained subproblems during which the nuclear-norm constraint is relaxed to fit the observed data. ### The challenge of high-frequency wavefield recovery Wavefield reconstruction via matrix completion (cf. problem #eqNucmin) relies on the assumption that the singular values of monochromatic data organized in matrix decay rapidly. For the lower frequencies (``< 15.0\, \mathrm{Hz}``) this is indeed the case but unfortunately this assumption no longer holds for the higher frequencies. To illustrate this phenomenon, we compare in Figure #SingulMH the decay of the singular values for the two matricizations of Figure #SingulSRMH at ``12.0\, \mathrm{Hz}`` and ``60.0\, \mathrm{Hz}``. While the singular values at ``12.0\, \mathrm{Hz}`` indeed decay quickly this is clearly no longer the case at ``60.0\, \mathrm{Hz}`` (juxtapose solid lines in Figures #SingulMHa and #SingulMHb) where the singular values for the fully sampled data decay more slowly. This slower decay at the high frequencies is caused by the increased complexity and oscillatory behavior exhibited by data at higher temporal frequencies. Despite the fact that the randomized source subsampling slows the decay down, the slower decay of the fully sampled data leads to poor wavefield reconstruction (Figure #Res60c) and unacceptable large residuals (Figure #Res60d) at ``60.0\, \mathrm{Hz}``. ### Weighted matrix completion As Figures #SingulMH\, #Res60c\, and #Res60d illustrate, the success of wavefield reconstruction by minimizing the nuclear norm (cf. equation #eqNucmin) hinges on rapid decay of the singular values an assumption violated at the higher frequencies. This shortcoming can, at least in part, be overcome by using prior information from a related problem in the form of weights, an approach initially put forward by @aravkin2014fast and further theoretically analyzed by @eftekhari2018weighted\. In its original form, the weights were derived from the reconstruction of the wavefield at a neighboring temporal frequency, which leads to a significant improvement for the reconstruction and the residual plotted in Figures #Res60e and #Res60f\, respectively. By applying this approach recursively from low to high frequencies, @zhang2019SEGhigh improved the reconstruction even further judged by the quality of Figure #Res60g and the size of the residual plotted in Figure #Res60h\. In this work, we further extend this result by reformulating the optimization problem and by introducing a parallel algorithm that limits communication. #### Figure: {#SingulMH} ![](GoS_Figures/Singular_12Hz_obs_full_MH.png){#SingulMHa width=50%} ![](GoS_Figures/Singular_60Hz_obs_full_MH.png){#SingulMHb width=50%} : Singular value decay for fully sampled and subsampled data (``75 \%`` missing sources) in midpoint-offset domain for *(a)* ``12.0\, \mathrm{Hz}`` and *(b)* ``60.0\, \mathrm{Hz}`` frequency slice #### Figure: {#Res60} ![](GoS_Figures/Interp_conv_60Hz_SR.png){#Res60c width=50%} ![](GoS_Figures/Diff_conv_60Hz_SR.png){#Res60d width=50%}\ ![](GoS_Figures/Interp_pair_wt_60Hz_SR.png){#Res60e width=50%} ![](GoS_Figures/Diff_pair_wt_60Hz_SR.png){#Res60f width=50%}\ ![](GoS_Figures/Interp_wt_60Hz_SR.png){#Res60g width=50%} ![](GoS_Figures/Diff_wt_60Hz_SR.png){#Res60h width=50%} : Wavefield reconstruction comparison for a 60 Hz frequency slice. *(a)* Reconstructed wavefield from ``75\%`` subsampling. *(b)* residual with a poor S/R = ``2.83`` dB. *(c)* Reconstructed wavefield using the recovery at the adjacent lower frequency as weights and *(d)* improved residual with S/R = ``5.08`` dB. *(e)* and *(f)* the same but now with the weighting scheme applied recursively with significantly improved S/R = ``8.72`` dB. We obtained the above weighted wavefield reconstructions by minimizing [@aravkin2014fast; @eftekhari2018weighted] ```math #eqWeighted \vector{X} := \argmin_{\vector{Y}} \quad \|\vector{Q}\vector{Y}\vector{W}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{Y}) - \vector{B}\|_F \leq \epsilon, ``` where the weighting matrices ``\vector{Q} \in \mathbb{C}^{m \times m}`` and ``\vector{W} \in \mathbb{C}^{n \times n}`` are projections given by ```math #eqQ \vector{Q} = {w}_{1}\vector{U}\vector{U}^H + \vector{U}^\perp \vector{U}^{{\perp}^{H}} ``` and ```math #eqV \vector{W} = {w}_{2}\vector{V}\vector{V}^H + \vector{V}^\perp \vector{V}^{{\perp}^{H}}, ``` where the symbol ``^H`` denotes complex transpose. These projections are spanned by the row and column subspaces ``\vector{U}``, ``\vector{V}`` and their orthogonal complements ``\vector{U}^\perp`` and ``\vector{V}^\perp``. Also, these subspaces ``\vector{U}``, ``\vector{V}`` have orthonormal columns making ``\vector{U}\vector{U}^H`` and ``\vector{V}\vector{V}^H`` orthogonal projections. The pair of matrices ``\{\mathbf{U},\mathbf{V}\}`` are low rank and can be obtained from the factorization of a lower adjacent frequency slice. The choice for the weights ``{w}_1`` and ``{w}_2`` in equations [#eqQ] and [#eqV] depends on the similarity between the corresponding row and column subspaces of the two adjacent frequency slices. We follow @eftekhari2018weighted and quantify this similarity by the largest principle angle between these subspaces. The smaller this angle, the more similar the subspaces from the two adjacent frequency slices will be. In situations where the adjacent frequency slices are near orthogonal---i.e., have a near ``90^\circ`` angle, we choose ``{w}_{1}\uparrow 1`` and ``{w}_{2} \uparrow 1`` so that the weighting matrices ``\vector{Q}`` and ``\vector{W}`` become identity matrices. In that case, the weighting matrices should not add information---i.e., the solution of problem [#eqWeighted] should become equivalent to solving the original problem in equation [#eqNucmin]. Conversely, when the subspaces are similar---i.e., they have an angle ``\ll 90^\circ``, then the ``{w}_{1}`` and ``{w}_{2}`` should be chosen small such that we penalize solutions more in the orthogonal complement space. Depending on our confidence in the given factorization, we chose these weights close to one when we have little confidence and close to zero when we have more confidence. While replacing the nuclear-norm objective in equation [#eqNucmin] by its weighted counterpart in equation #eqWeighted is a valid approach responsible for improvements reported in Figure #Res60\, its solution involves non-trivial weighted projections [see equation ``7.3`` in @aravkin2014fast]. These computationally costly operations can be avoided by rewriting optimization problem #eqWeighted in a slightly different form where the weights are moved from the objective to the data constraint---i.e., we have ```math #eqWeightednew \vector{\bar X} := \argmin_{\vector{\bar Y}} \quad \|\vector{\bar Y}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{Q}^{-1}\vector{\bar Y}\vector{W}^{-1}) - \vector{B}\|_F \leq \epsilon. ``` In this formulation, the optimization is carried out over the new variable ``\vector{\bar Y} = \vector{Q}\vector{Y}\vector{W}``. After solving for this variable, we recover the solution of the original problem ``\vector{X}`` from ``\vector{\bar X}`` as follows: ``\vector{X} = \vector{Q}^{-1}\vector{\bar X}\vector{W}^{-1}``. We arrived at this formulation by using the fact that the matrices ``\vector{Q}`` and ``\vector{W}`` are invertible (for non-zeros weights ``{w}_{1}`` and ``{w}_{2}``) with inverses given by ```math #eqQinv \vector{Q}^{-1} = \frac{1}{w}_{1}\vector{U}\vector{U}^H + \vector{U}^\perp \vector{U}^{{\perp}^{H}} ``` and ```math #eqWinv \vector{W}^{-1} = \frac{1}{w}_{2}\vector{V}\vector{V}^H + \vector{V}^\perp \vector{V}^{{\perp}^{H}}. ``` Because we moved the weighting matrices to the data constraint, we no longer have to project onto a more complicated constraint as in @aravkin2014fast\, which results in solutions of equation [#eqWeightednew] at almost the same computational costs as in the original formulation (Equation #eqNucmin). This formulation forms the basis for our approach to wavefield reconstruction that is capable of handling the large data volumes of 3D seismic. ## Scalable multi-frequency seismic wavefield reconstruction So far, our minimization problems relied on explicit formation of the data matrix and on the singular-value decomposition (SVD, @aravkin2014fast) both of which are unfeasible for industry-scale 3D wavefield reconstruction problems. To address this issue, we discuss how to recast the above weighted matrix completion approach into factored form, which has computational benefits and, as we will show below, can still be parallelized. ### Weighted low-rank matrix factorization To avoid computing costly SVDs, we first cast the solution of equation [#eqWeightednew] into factored form: ```math #eqWeighteLRopt \vector{\bar L}, \vector{\bar R} := \argmin_{\vector{\bar L_\#}, \vector{\bar R_\#}} \quad \frac{1}{2} {\left\| \begin{bmatrix} \vector{\bar L_\#} \\ \vector{\bar R_\#} \end{bmatrix} \right\|}_F^2 \quad \text{subject to} \quad \|\mathcal{A}(\vector{Q}^{-1}\vector{\bar L_\#}\vector{\bar R_\#}^{H}{\vector{W}^{-H}}) - \vector{B}\|_{F} \leq \epsilon, ``` where ``\vector{\bar L} = \vector{Q}\vector{L}`` and ``\vector{\bar R} = \vector{W}\vector{R}``. Under certain technical conditions [@candes2009exact], which include choosing the proper rank ``r``, the factored solution, ``\vector{X}=\vector{L}\vector{R}^H`` with ``\vector{L} = \vector{Q}^{-1}\vector{\bar L}`` and ``\vector{R} = \vector{W}^{-1}\vector{\bar R}``, corresponds to the solution of the weighted problem included in equation [#eqWeighted]. Here, the matrices ``\vector{L} \in \mathbb{C}^{m \times r}`` and ``\vector{R} \in \mathbb{C}^{n \times r}`` are the low-rank factors of ``\vector{X}``. Using the property that the matrices ``\vector{W}^{H} = \vector{W}`` and ``\vector{Q}^{H} = \vector{Q}`` in equation [#eqWeighteLRopt] are idempotent, we replace ``\vector{W}^{-H}`` by ``\vector{W}^{-1}`` to avoid extra computation. Compared to the original convex formulation, equation #eqWeighteLRopt can be solved with alternating optimization, which is computationally efficient as evidenced from the runtimes plotted in Figure #Runtime as a function of temporal frequency. Of course, this approach only holds as long as the monochromatic data matrices can be well approximated by low rank matrices---i.e., ``r\ll\min(m,n)``. #### Figure: {#Runtime} ![](Parallel_Analysis_Figures/Runtime_Comparison_bnw.png){#Runtimea width=50%} : Runtime comparison plot: Solid black line shows runtime of the original weighted formulation and dashed black line shows runtime of the new weighted formulation for same number of iterations with same data residual at the end. While the above weighted formulation allows us to solve the problem in factored form, it needs access to the subspaces ``\{\mathbf{U},\mathbf{V}\}``, which requires computing the full SVD [@eftekhari2018weighted]. Since we cannot compute this full SVD, we instead orthogonalize the low-rank factors from adjacent frequency slices themselves by carrying out computationally cheap SVDs on the factors rather than on the full data matrix and then keeping the top ``r`` left singular vectors. This approach is justified because orthogonalizing low-rank factors allows approximating orthogonal subspaces spanned by the full frequency slice. The results presented in Figure #Res60 were obtained in factored form and demonstrated clearly how the incorporation of weight matrices improves recovery especially when these weight matrices are calculated recursively from low to high frequencies (juxtapose Figures #Res60e, #Res60f and #Res60g, #Res60h). This improvement is due to the fact that low-frequency data matrices can be better approximated by low-rank matrices, which improves the recovery and therefore the weighted reconstruction. ### Weighted parallel recovery The example in Figure #Res60 made it clear that wavefield reconstruction via matrix factorization improves when including weight matrices that carry information on the row and column subspaces. However, inclusion of these weight matrices makes it more difficult to parallelize the algorithm because the parallelized alternating optimization approach by @recht2013parallel and @lopez2015EAGErma no longer applies straightforwardly. That approach relies on decoupled computations on a row-by-row and column-by-column basis (see Figure [#Parallel]) in which case one alternates between minimizing the rows via ```math #eqpalunweightedL \vector{R}(l_1,:)^{H} := \argmin_{\vector{v}} \quad \frac{1}{2} {\left\| \vector{v} \right\|}^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_1}(\vector{L}\vector{v}) - \vector{B}(:,l_1)\| \leq \gamma ``` for ``l_1 = 1\cdots n`` and the columns via ```math #eqpalunweightedR \vector{L}(l_2,:)^{H} := \argmin_{\vector{u}} \quad \frac{1}{2} {\left\| \vector{u} \right\|}^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_2}((\vector{R}\vector{u})^{H}) - \vector{B}(l_2,:)\| \leq \gamma ``` for ``l_2 = 1\cdots m``. With this approach, the rows of the right factor ``\mathbf{R}`` are updated first by iterating over the rows via the index ``l_1=1\cdots n``. These updates are followed by updates on the rows of the left ``\mathbf{L}`` factor by iterating over the rows via the index ``l_2=1\cdots m``. Contrary to the serial problem, these optimizations are conducted on individual vectors ``\vector{v} \in \mathbb{C}^{r}`` and ``\vector{u} \in \mathbb{C}^{r}`` in parallel because they decouple---i.e., the ``l_1,l_2``th row of ``\vector{R}, \vector{L}`` only involve the ``l_1, l_2``th column/row of the observed data matrix ``\vector{B}`` and submatrices ``\mathcal{A}_{l_1}, \mathcal{A}_{l_2}`` that act on these columns/rows. To simplify notation, we introduced the symbol ``:`` to extract the ``l_1``th column, ``\vector{B}(:,l_1)``, or ``l_2``th row, ``\vector{B}(l_2,:)``. As before, we allow for the presence of noise by solving the optimizations to within a user-specified ``\ell_2``-norm tolerance ``\gamma``. #### Figure: {#Parallel} ![](GoS_Figures/parallel_rank1_new.png){#Paral1 width=100%}\ ![](GoS_Figures/parallel_rank2_new.png){#Paral2 width=100%} : Alternating minimization and decoupling. *(a)* Solving for the low-rank factor ``\vector{R}`` by using fixed factor ``\vector{L}`` and observed data ``\vector{B}``. *(b)* Solving for the ``l_1^{th}`` row of the low-rank factor ``\vector{R}`` by using rows (in black color) of the fixed factor ``\vector{L}`` corresponding to the non-zero entries (in black color) of the ``l_1^{th}`` column from the observed data ``\vector{B}``. Because the operations in equations #eqpalunweightedL and #eqpalunweightedR decouple, they allow for a parallel implementation that scales well for large-scale industrial 3D seismic problems. However, the decoupled formulation does not include weighting matrices limiting its usefulness for recovery problems at higher frequencies that require weighting matrices. Below we present a novel approach to ameliorate this problem in which we take equations #eqpalunweightedL and #eqpalunweightedR\ as a starting point and pre- and post multiply the data misfit terms by ``\mathbf{Q}`` and ``\mathbf{W}`` after including the weighting matrices as in equation #eqWeighteLRopt\. Next, we use the property that for large weights, the matrices ``\mathbf{Q}`` and ``\mathbf{W}`` nearly commute with the measurement operator ``\mathcal{A}``---i.e., we have ``\mathbf{Q}\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}W}^{-1})\approx\mathcal{A}(\mathbf{\bar{X}W}^{-1})`` and ``\mathcal{A}(\vector{Q}^{-1}\mathbf{\bar{X}W}^{-1})\vector{W}\approx \mathcal{A}(\vector{Q}^{-1}\mathbf{\bar{X}})`` where ``\mathbf{\bar{X}}`` represents the fully sampled data matrix or its factored form. With these approximations, we arrive at the following weighted iterations: ```math #eqwtRpar \vector{\bar R}(l_1,:)^{H} := \argmin_{\vector{\bar v}} \quad \frac{1}{2} \left\|\vector{\bar v}\right\|_2^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_1}(\vector{Q}^{-1}\vector{\bar L}\vector{\bar v}) - \vector{B}_R(:,l_1)\| \leq \gamma ``` for ``l_1 = 1\cdots n`` and ```math #eqwtLpar \vector{\bar L}(l_2,:)^{H} := \argmin_{\vector{\bar u}} \quad \frac{1}{2} \left\|\vector{\bar u}\right\|^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_2}((\vector{\bar R}\vector{\bar u})^{H}\vector{W}^{-1}) - \vector{B}_L(l_2,:)\| \leq \gamma ``` for ``l_2 = 1\cdots m``. In these expressions, we replaced the incomplete data matrix by ``\mathbf{B}_R=\mathbf{B}\mathbf{W}`` and ``\mathbf{B}_L=\mathbf{Q}\mathbf{B}``, respectively. This means that we pre- and post-multiply the observed monochromatic data matrix ``\mathbf{B}`` with ``\mathbf{Q}`` and ``\mathbf{W}`` before extracting its columns or rows. The above derivation is only valid if the above approximations involving commutations of the weight matrices with the sampling operator ``\mathcal{A}`` are sufficiently accurate. To verify that these approximations are indeed justified, we compare in Figure #commute their accuracy by comparing plots of ``\mathbf{Q}\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}W}^{-1})`` and ``\mathcal{A}(\mathbf{\bar{X}W}^{-1})`` for two different values of the weights in equations #eqQinv and #eqWinv\. As expected, for the small value ``w_{1,2}=0.25`` the weighting matrix ``\mathbf{Q}`` does not commute with the sampling matrix (see Figures #commuted -- #commutef). However, for ``w_{1,2}=0.75`` the approximation is reasonably accurate (see Figures #commutea -- #commutec). Similarly, in Figure [#commuteW] we compare plots of ``\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}W}^{-1})\mathbf{W}`` and ``\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}})`` for small and large weights. As before, for smaller weights ``w_{1,2} = 0.25``, the weighing matrix ``\mathbf{W}`` does not commute with the sampling matrix (see Figures #commuteWd -- #commuteWf). However, for ``w_{1,2}=0.75`` the approximation is again reasonably accurate (see Figures #commuteWa -- #commuteWc). Remember, the weights ``w_{1,2}`` reflect confidence we have in the weight matrices and are chosen small when we have confidence that the weighting matrices ``\mathbf{Q}`` and ``\mathbf{W}`` add information to the recovery. This means we need to select a value for the weights ``w_{1,2}`` that balances between how much prior information we want to invoke and how accurate the commutation relations need to be. Choosing small weights goes at the expense of large "commutation" errors while large weights leads to small "commutation" errors but limits the inclusion of the prior information via the weights. #### Figure: {#commute} ![](Parallel_Analysis_Figures/QUout_p25.png){#commuted width=33%} ![](Parallel_Analysis_Figures/QUin_p25.png){#commutee width=33%} ![](Parallel_Analysis_Figures/diff_QUinout_p25.png){#commutef width=33%}\ ![](Parallel_Analysis_Figures/QUout_p75.png){#commutea width=33%} ![](Parallel_Analysis_Figures/QUin_p75.png){#commuteb width=33%} ![](Parallel_Analysis_Figures/diff_QUinout_p75.png){#commutec width=33%} : Commutation test for small and large weights. *(a)* Subset of 3D frequency slice for ``\mathbf{Q}\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}W}^{-1})`` for ``w_{1,2}=0.25``; *(b)* the same but now for ``\mathcal{A}(\mathbf{\bar{X}W}^{-1})``; *(c)* difference plot between *(a)* and *(b)*; *(d)-(f)* the same as *(a)*-*(c)* but now for ``w_{1,2}=0.75``. #### Figure: {#commuteW} ![](Parallel_Analysis_Figures/QVout_p25.png){#commuteWd width=33%} ![](Parallel_Analysis_Figures/QVin_p25.png){#commuteWe width=33%} ![](Parallel_Analysis_Figures/diff_QVinout_p25.png){#commuteWf width=33%}\ ![](Parallel_Analysis_Figures/QVout_p75.png){#commuteWa width=33%} ![](Parallel_Analysis_Figures/QVin_p75.png){#commuteWb width=33%} ![](Parallel_Analysis_Figures/diff_QVinout_p75.png){#commuteWc width=33%} : Commutation test for small and large weights. *(a)* Subset of 3D frequency slice for ``\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}W}^{-1})\mathbf{W}`` for ``w_{1,2}=0.25``; *(b)* the same but now for ``\mathcal{A}(\mathbf{Q}^{-1}\mathbf{\bar{X}})``; *(c)* difference plot between *(a)* and *(b)*; *(d)-(f)* the same as *(a)*-*(c)* but now for ``w_{1,2}=0.75``. Although, the decoupled equations [#eqwtRpar] and [#eqwtLpar] can now be parallelized over the rows of the low-rank factors ``\vector{\bar R}`` and ``\vector{\bar L}``, they come at additional computational cost. Unlike sparse observed data collected in the matrix ``\mathbf{B}``, the data matrices ``\mathbf{B}_R`` and ``\mathbf{B}_L`` are dense (have all non-zero entries) because of the multiplications by ``\mathbf{W}`` and ``\mathbf{Q}``. However, when the weights ``w_{1,2}`` are relatively large we observe that both dense matrices ``\mathbf{B}_L``, ``\mathbf{B}_R`` (Figure [#obstestb] and #obstestd) can be well approximated by the sparse observed data matrix ``\vector{B}`` judged by the difference plots in Figure #obstest\. With this approximation, equations [#eqwtRpar] and [#eqwtLpar] can be solved computationally efficiently. #### Figure: {#obstest} ![](Parallel_Analysis_Figures/obs_data.png){#obstesta width=33%} ![](Parallel_Analysis_Figures/obsQU_data.png){#obstestb width=33%} ![](Parallel_Analysis_Figures/diff_obsQU_data.png){#obstestc width=33%}\ ![](Parallel_Analysis_Figures/obsQV_data.png){#obstestd width=33%} ![](Parallel_Analysis_Figures/diff_obsQV_data.png){#obsteste width=33%} : Accuracy of sparse approximation for weights ``w_{1,2} = 0.75``, *(a)* Subset of 3D frequency slice for sparse observed data ``\mathbf{B}``; *(b)* the same but now for the dense matrix ``\vector{B}_L``; *(c)* difference plot between *(a)* and *(b)*; *(d)* Subset of 3D frequency slice for the dense matrix ``\vector{B}_R``; *(e)* difference plot between *(a)* and *(d)*. While the above formulation allows us to carry out weighted factored wavefield recovery in parallelized form, we observed that taking inverses of ``\mathbf{Q}`` and ``\mathbf{W}`` in the data misfit objective (see equation #eqWeighteLRopt) leads to inferior recovery because these involve reciprocals of the weights (see equations #eqQinv and #eqWinv). The value range of these reciprocals is no longer contained to the interval ``(0, 1]`` and this can lead to numerical problems during the recovery. To circumvent this issue, we propose an alternative but equivalent form for the weighted formulation with weights defined as ```math #eqparQ \vector{\widehat{Q}} = \vector{U}\vector{U}^H + {w}_{1}\vector{U}^\perp \vector{U}^{{\perp}^{H}} = {w}_{1}\vector{Q}^{-1}, ``` ```math #eqparW \vector{\widehat{W}} = \vector{V}\vector{V}^H + {w}_{2}\vector{V}^\perp \vector{V}^{{\perp}^{H}} = {w}_{2}\vector{W}^{-1} ``` With these alternative definitions, we can as before approximate ``\mathbf{\widehat{Q}}^{-1}\mathcal{A}(\mathbf{\widehat{Q}}\mathbf{\bar{X}}\mathbf{\widehat{W}})`` by ``\mathcal{A}(\mathbf{\bar{X}\widehat{W}})`` and ``\mathcal{A}(\vector{\widehat{Q}\bar{X}\widehat{W}})\mathbf{\widehat{W}}^{-1}`` by ``\mathcal{A}(\vector{\widehat{Q}\bar{X}})``, yielding the following decoupled parallellizable equations for the factors ```math #eqwtRparv2 \vector{\bar R}(l_1,:)^{H} := \quad & \argmin_{\vector{\bar v}} \quad \frac{1}{2} \left\|\vector{\bar v}\right\|^2 \\ & \text{subject to} \\ & \|\mathcal{A}_{l_1}(\widehat{\vector{Q}}\vector{\bar L}\vector{\bar v}) - {w_1}{w_2}\vector{B}(:,l_1)\| \leq {w_1}{w_2}\gamma ``` for ``l_1=1\cdots n`` and ```math #eqwtLparv2 \vector{\bar L}(l_2,:)^{H} := \quad & \argmin_{\vector{\bar u}} \quad \frac{1}{2} \left\|\vector{\bar u}\right\|^2 \\ & \text{subject to} \\ & \|\mathcal{A}_{l_2}((\vector{\bar R}\vector{\bar u})^{H}\vector{\widehat{W}}) - {w_1}{w_2}\vector{B}(l_2,:)\| \leq {w_1}{w_2}\gamma ``` for ``l_2=1\cdots m`` . Equations #eqwtRparv2 and #eqwtLparv2 form the basis for our recovery approach summarized in Algorithm #alg:Altmin below, which corresponds to ```math #eqWeightednew2 \minim_{\vector{\bar X}} \quad \|\vector{\bar X}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{\widehat{Q}}\vector{\bar X}\vector{\widehat{W}}) - w_1w_2\vector{B}\|_F \leq w_1w_2\epsilon, ``` which is equivalent to equation #eqWeightednew as we show in Appendix A\. ### Algorithm: Alternating minimization {#alg:Altmin} | 0. **Input:** `Observed Data` ``\vector{B},`` `rank parameter` ``{r},`` `acquisition mask` ``\mathcal{A},`` `priors` ``\vector{\widehat{Q}}, \vector{\widehat{W}} `` `& initial guess` ``\vector{\bar L}^\left(0 \right)`` | 1. **for** ``k=0,\, 1,\, 2,\cdots,\,N-1`` `//solve for rows of` ``\vector{\bar R}`` `&` ``\vector{\bar L}`` `in parallel` | 2. ``\vector{\bar R}^\left(k+1 \right)(l_1,:)^{H} := \underset{\vector{\bar v}}{\arg \min} \quad \frac{1}{2} \left\|\vector{\bar v}\right\|^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_1}(\widehat{\vector{Q}}\vector{\bar L}^\left(k \right)\vector{\bar v}) - {w_1}{w_2}\vector{B}(:,l_1)\| \leq {w_1}{w_2}\gamma`` | 3. ``\vector{\bar L}^\left(k+1 \right)(l_2,:)^{H} := \underset{\vector{\bar u}}{\arg \min} \quad \frac{1}{2} \left\|\vector{\bar u}\right\|^2 \quad \text{subject to} \quad \|\mathcal{A}_{l_2}((\vector{\bar R}^\left(k+1 \right)\vector{\bar u})^{H}\vector{\widehat{W}}) - {w_1}{w_2}\vector{B}(l_2,:)\| \leq {w_1}{w_2}\gamma`` | 4. **end for** | 5. ``\vector{L} = \frac{1}{w_1}\vector{\widehat{Q}}\vector{\bar L}`` | 6. ``\vector{R} = \frac{1}{w_2}\vector{\widehat{W}}\vector{\bar R}`` | **Output:** Recovered wavefield in factored form ``\{\mathbf{L},\, \mathbf{R}\}``. Caption: Weighted minimization via Alternating minimization. In Algorithm #alg:Altmin, **Line 2** corresponds to solving for each row of the low-rank factor ``\vector{\bar R}^\left(k+1 \right)`` at the ``\left(k+1 \right)^{th}`` iteration using the estimate of low-rank factor ``\vector{\bar L}^\left(k \right)`` from the ``\left(k \right)^{th}`` iteration. Similarly, **Line 3** corresponds to solving for each row of the low-rank factor ``\vector{\bar L}^\left(k+1 \right)`` at the ``\left(k+1 \right)^{th}`` iteration using the estimate of low-rank factor ``\vector{\bar R}^\left(k+1 \right)``. Finally, **Lines 5** and **6** correspond to retrieving the low-rank factors ``\vector{L}`` and ``\vector{R}`` from ``\vector{\bar L}`` and ``\vector{\bar R}``, respectively. ## Case studies We now conduct a series of experiments to evaluate the accuracy of the proposed weighted wavefield reconstruction methodology. In all cases, we have access to the ground truth fully sampled data. This allows us to assess the accuracy by means of visual inspection and S/R's (Signal to noise ratio). Our examples include the seismic line from the Gulf of Suez we discussed earlier and a complex full-azimuth synthetic 3D dataset. ### Gulf of Suez field data: 2D example To evaluate the performance of our recursively weighted wavefield recovery method on field data, we conduct an experiment on a 2D line from the Gulf of Suez. The fully sampled split-spread dataset consists of ``1024`` time samples, acquired with ``354`` sources and ``354`` receivers. Time is sampled at ``0.004\, \mathrm{s}``. The source-receiver spacing is ``25\, \mathrm{m}``. To test our algorithm, we reconstruct this 2D line from randomly subsampled traces, which we obtain by removing ``75\%`` of the sources via optimal jittered subsampling [@herrmann2008non]. We assess the performance of recursively weighted matrix factorization by comparing wavefield recovery with and without weighting as a function of the angular frequency. To avoid the impact of noise at the low frequencies, we start the recovery at ``7.0\, \mathrm{Hz}``. Since this is a small problem, we reconstruct the frequency slices by performing ``150`` iterations of the SPG-LR algorithm [@vandenBerg2008probing; @aravkin2014fast] for each frequency. We compare wavefield reconstructions with and without weights the results of which are summarized in Figure [#GoS_SNR]. From these results we can see that above ``17``Hz, the wavefield reconstruction clearly benefits from including weights for reconstructions carried out with the same number of iterations but without weighting. For comparison purposes, we also reconstruct missing data using the conventional matrix factorization method. For fairness of comparison, we once again use ``150`` iterations of SPG-LR algorithm for each frequency slice. As expected other than few lower frequency slices where conventional method performs well, we get improvements in signal to noise ratio (Figure [#GoS_SNR]) across all other frequency slices with our recursively weighted approach. #### Figure: {#GoS_SNR} ![](GoS_Figures/SNR_comparison_unwt_wt.png){#GoS_SNRa width=50%} : *(a)* Signal to noise ratio comparison of conventional (solid black line) and recursively weighted method (dashed black line) with (``w_{1,2} = 0.75``) for all frequencies Figures [#GoS_crgc] and [#GoS_crg_deepc] include the shallow and deeper parts of a reconstructed common receiver gather extracted from results based on the conventional method yielding ``S/R = 6.9\, \mathrm{dB}``. In the data residual plots (Figures [#GoS_crgd] and [#GoS_crg_deepd]), we observe signal leakage and noise due to reconstruction artifact in both shallow and deeper parts. By signal leakage we mean that there are coherent events in the data residual plot indicating incomplete reconstruction of data. Also, at far offsets we observe signal leakage in the difference plot. Far offset data is important for FWI (Full Waveform Inversion) purposes since it contains turning waves. On the other hand, we observe better reconstructed data in the common receiver gather (Figures [#GoS_crge] and [#GoS_crg_deepe]) extracted from reconstructed data using the recursively weighted approach with improved ``S/R`` of ``11.7\, \mathrm{dB}``. Its corresponding data residual plot (Figures [#GoS_crgf] and [#GoS_crg_deepf]) shows less signal leakage in comparison to its conventional counterpart. Even at far offsets, we observe better reconstruction of signal. #### Figure: {#GoS_crg} ![](GoS_Figures/True_Shall_CRG_Rcvr25.png){#GoS_crga width=50%} ![](GoS_Figures/Obs_Shall_CRG_Rcvr25.png){#GoS_crgb width=50%}\ ![](GoS_Figures/Unwt_Shall_CRG_Rcvr25.png){#GoS_crgc width=50%} ![](GoS_Figures/Diff_unwt_Shall_CRG_Rcvr25.png){#GoS_crgd width=50%}\ ![](GoS_Figures/Wtp75_Shall_CRG_Rcvr25.png){#GoS_crge width=50%} ![](GoS_Figures/Diff_wtp75_Shall_CRG_Rcvr25.png){#GoS_crgf width=50%} : Wavefield reconstruction in common receiver gather domain in the shallow part. *(a)* True data, *(b)* Observed data with ``75\%`` missing sources. *(c)* Reconstructed data using the conventional method with ``S/R = 6.9\, \mathrm{dB}`` and *(d)* corresponding difference with respect to the true data. *(e)* Reconstructed data using the recursively weighted method with ``S/R = 11.7\, \mathrm{dB}`` and *(f)* corresponding difference with respect to the true data. #### Figure: {#GoS_crg_deep} ![](GoS_Figures/True_deep_CRG_Rcvr25.png){#GoS_crg_deepa width=50%} ![](GoS_Figures/Obs_deep_CRG_Rcvr25.png){#GoS_crg_deepb width=50%}\ ![](GoS_Figures/Unwt_deep_CRG_Rcvr25.png){#GoS_crg_deepc width=50%} ![](GoS_Figures/Diff_unwt_deep_CRG_Rcvr25.png){#GoS_crg_deepd width=50%}\ ![](GoS_Figures/Wtp75_deep_CRG_Rcvr25.png){#GoS_crg_deepe width=50%} ![](GoS_Figures/Diff_wtp75_deep_CRG_Rcvr25.png){#GoS_crg_deepf width=50%} : Wavefield reconstruction in common receiver gather domain in the deeper part. *(a)* True data, *(b)* Observed data with ``75\%`` missing sources. *(c)* Reconstructed data using the conventional method and *(d)* corresponding difference with respect to the true data. *(e)* Reconstructed data using the recursively weighted method and *(f)* corresponding difference with respect to the true data. ### Synthetic Compass model data: 3D example In 2D seismic surveys, receivers only measure wavefields traveling in the vertical plane along sources and receivers. Therefore, we fail to capture reflections out of the 2D source-receiver plane. This lack of recording of out of plane scattering ultimately affects the quality of the subsurface image, especially in regions where there is strong lateral heterogeneity. To capture 3D effects most of the seismic exploration surveys are 3D nowadays during which sources and receivers are spread along the surface rather than confined to a single line. To evaluate the performance of our recursively weighted low-rank matrix factorization methodology in this more challenging 3D setting, we consider synthetic 3D data simulated on the Compass model [@jones2012EAGEbcs]. We choose this model because it contains velocity kickbacks, strong reflectors, and small wavelength details constrained by real well-log data collected in the North Sea. Because of the complexity of this dataset, which mimics marine acquisition with a towed array, we face similar challenges in wavefield reconstruction as we would face dealing with real 3D field data. The authors @dasilva2015optimization also used this 3D dataset to evaluate their tensor-based wavefield reconstruction algorithm based on the Hierarchical Tucker decompositions. For this experiment, we use a subset of the total data volume of ``501 \times 201 \times 201 \times 41 \times 41`` gridpoints---i.e., ``n_t \times n_{rx} \times n_{ry} \times n_{sx} \times n_{sy}`` along the time, receiver ``x``, receiver ``y``, source ``x``, and source ``y`` directions. Here, ``n_t`` is the number of samples along time, ``n_{rx}``, ``n_{ry}`` are number of receivers along ``x`` and ``y`` directions respectively and ``n_{sx}``, ``n_{sy}`` are number of sources along ``x`` and ``y`` directions respectively. In both spatial directions, the spacing between the adjacent sources is ``150.0\, \mathrm{m}`` and ``25.0\, \mathrm{m}`` between adjacent receivers. The sampling interval along time is ``0.01\, \mathrm{s}``. To get the subsampled data, we remove ``75\%`` of the receivers from jittered locations [@herrmann2008non]. We use this incomplete data as input to our recursively weighted wavefield reconstruction scheme. Before proceeding further, let us first briefly discuss the organization of the data in which we will carry out the wavefield reconstructions. While we could in principle transform the data into the midpoint offset domain as in the 2D case, we follow @dasilva2015optimization and @demanet2006THcwa and exploit the fact that monochromatic 3D frequency slices rearranged along the ``x`` and ``y``-coordinates for sources and receivers can be well approximated by a low-rank factorization. In this rearrangement the data is organized as a matrix with ``{S}_{x}, {R}_{x}`` and ``S_y, R_y`` coupled along the columns and rows respectively unfolded along is coordinate directions. Here ``S_{x,y}`` and ``R_{x,y}`` are the source and receiver coordinates along the ``x`` and ``y`` directions. After rearrangement in this non-canonical form, the frequency slices are low-rank while data with randomly missing receivers is not (juxtapose ``10.0\, \mathrm{Hz}`` frequency slices in Figures #Synth3Dc and #Synth3Dd and the singular value plots in Figures #SynthSing_3Da and #SynthSing_3Db). We choose ``10\, \mathrm{Hz}`` frequency slice as the changes in the rate of decay of singular values upon sampling at lower frequencies is more prominent in comparison to changes we observe at higher frequencies. This frequency choice allows us to better demonstrate the reasoning behind choosing ``{S}_{x}, {R}_{x}`` and ``S_y, R_y`` domain for reconstruction. In the canonical organization, missing receivers leads to missing rows and this decreases the rank (cf. solid lines in Figure #SynthSing_3D) in the non-canonical rearrangements the rank increases (cf. dashed lined in Figure #SynthSing_3D). The sudden drop in the singular values in the canonical arrangement is a direct consequence of the fact that removing complete rows or columns decreases the rank. From the behavior of the singular values before and after removal of the receivers, it is clear that the simple rearrangement of the data in the non-canonical organization can serve as the transform domain in which to recover that data via weighted low-rank factorization. #### Figure: {#Synth3D} ![](BGsynth3D_Figures/True_freq_full_zoom_Sx_Sy_10Hz.png){#Synth3Da width=100%}\ ![](BGsynth3D_Figures/True_freq_full_zoom_Sx_Rx_10Hz.png){#Synth3Db width=100%}\ ![](BGsynth3D_Figures/Obs_freq_full_zoom_Sx_Sy_10Hz.png){#Synth3Dc width=100%}\ ![](BGsynth3D_Figures/Obs_freq_full_zoom_Sx_Rx_10Hz.png){#Synth3Dd width=100%} : ``10.0\, \mathrm{Hz}`` Frequency slice from 3D data: *(a)* True and *(c)* observed data in ``S_x, S_y`` domain with ``75\%`` missing receivers. *(b)* True and *(d)* observed data in ``S_x, R_x`` domain with ``75\%`` missing receivers. Figures in left column show full data and in right column show data zoomed in the small black box. #### Figure: {#SynthSing_3D} ![](BGsynth3D_Figures/True_sing_10Hz.png){#SynthSing_3Da width=50%} ![](BGsynth3D_Figures/Obs_sing_10Hz.png){#SynthSing_3Db width=50%} : Singular values decay comparison for *(a)* fully sampled and *(b)* subsampled data with ``75\%`` missing receivers in ``S_x, S_y`` domain (solid black line) and ``S_x, R_x`` (dashed black line) domain for ``10.0\, \mathrm{Hz}`` frequency slice As before, we now perform the full-azimuth 3D wavefield reconstruction for each frequency slice using our proposed recursively weighted low-rank matrix factorization approach. Since this is a relatively large problem, we employ the parallel framework presented in the previous section (Algorithm #alg:Altmin) for ``4`` alternations with ``40`` iterations of SPG-``\ell_2`` per frequency slice. We choose these values because for a given rank parameter we observed better continuity of signals and lesser noise in the reconstructed data. In addition to setting the number of alternations, i.e. switching between Equations [#eqwtRparv2] and [#eqwtLparv2], the algorithm needs us to specify the rank of the factorization and the weights. Based on tests performed using different rank values and weights, we selected a rank of ``r=228`` and a value for the weights of ``w_{1,2}=0.75``, because they provide a good balance between quality of reconstructed data (in terms of continuity of events, lesser noise) and computational time. To avoid noise at the very low frequencies observed due to simulation artifacts, we start our recursively weighted from ``4.4\, \mathrm{Hz}``. For comparison, we also use conventional matrix completion for wavefield reconstruction. Here, we use the same number of alternations and SPG-``\ell_2`` iterations as before. We also use same rank of ``228``. For visualization purpose we show results in a common shot gather (Figure [#freq15_csga]) extracted from ``15\, \mathrm{Hz}`` frequency slice. Here we choose higher frequency of ``15\, \mathrm{Hz}`` instead of ``10\, \mathrm{Hz}`` to show how the recursively weighted method is able to give better reconstruction at high frequency in comparison to reconstructed data obtained from the conventional method. In Figure [#freq15_csgb] we show subsampled shot gather with ``75 \%`` missing receivers. Using the conventional method we get S/R of ``17.7\, \mathrm{dB}`` for the reconstructed data at ``15.0\, \mathrm{Hz}`` (Figure [#freq15_csgc]). Whereas, with the recursively weighted method we get improved S/R of ``19.9\, \mathrm{dB}`` (Figure [#freq15_csge]). We also observe less leakage of signal and less noise in the residual plots for the data reconstructed using recursively weighted method (Figure [#freq15_csgf]) in comparison to the data reconstructed using the conventional method (Figure [#freq15_csgd]). From Figure [#BGsynth_SNR3Da], we also observe improvement in the S/R of reconstructed data for all the frequencies with the recursively weighted method (dashed black line in Figure [#BGsynth_SNR3Da]) in comparison to its conventional counterpart (solid black line in Figure [#BGsynth_SNR3Da]). In Figures [#csg] and [#csgdeep] we also show comparison of the recursively weighted and conventional method in time domain common shot gather at earlier and later arrivals respectively. In Figure [#csg], we also show comparison of a time slice at ``1.6\, \mathrm{s}`` extracted from a 3D common shot gather. Figure [#csga] shows two common shot gathers extracted from the true data along ``x`` and ``y`` directions along with a time-slice on top left corner. Figure [#csgb] shows the corresponding observed data with missing receivers. We observe improved reconstruction of signals in the common shot gather (with ``S/R = 17.8\, \mathrm{dB}``) reconstructed from recursively weighted method (Figure [#csge]) in comparison to the reconstructed data from the conventional method (Figure [#csgc]) with ``S/R`` of ``15.3\, \mathrm{dB}``. Even in the residual plots we observe less leakage of signal with the recursively weighted method (Figure [#csgf]) in comparison to its conventional counterpart (Figure [#csgd]). In Figures [#csgdeepa] and [#csgdeepb], we show the same common shot gather at later time along ``x`` and ``y`` directions extracted from true and observed data respectively. We observe noise in the data and corresponding residual (Figures [#csgdeepc] and [#csgdeepd]) reconstructed from the conventional method. Whereas, we observe better reconstruction and less noise in the data reconstructed (Figures [#csgdeepe] and [#csgdeepf]) from the recursively weighted method. #### Figure: {#freq15_csg .wide} ![](BGsynth3D_Figures/True_freq_csg_15Hz.png){#freq15_csga width=25%} ![](BGsynth3D_Figures/Obs_freq_csg_15Hz.png){#freq15_csgb width=25%} ![](BGsynth3D_Figures/Figures_90p/Obs_freq_csg_90p_15Hz.png){#freq15_csg90pb width=25%}\ ![](BGsynth3D_Figures/Rec_unwt_freq_csg_15Hz.png){#freq15_csgc width=25%} ![](BGsynth3D_Figures/Diff_unwt_freq_csg_15Hz.png){#freq15_csgd width=25%} ![](BGsynth3D_Figures/Rec_wtp75_freq_csg_15Hz.png){#freq15_csge width=25%} ![](BGsynth3D_Figures/Diff_wtp75_freq_csg_15Hz.png){#freq15_csgf width=25%}\ ![](BGsynth3D_Figures/Figures_90p/Rec_unwt_freq_csg_90p_15Hz.png){#freq15_csg90pc width=25%} ![](BGsynth3D_Figures/Figures_90p/Diff_unwt_freq_csg_90p_15Hz.png){#freq15_csg90pd width=25%} ![](BGsynth3D_Figures/Figures_90p/Rec_wtp75_freq_csg_90p_15Hz.png){#freq15_csg90pe width=25%} ![](BGsynth3D_Figures/Figures_90p/Diff_wtp75_freq_csg_90p_15Hz.png){#freq15_csg90pf width=25%} : Full azimuth wavefield reconstruction comparison for ``15.0\, \mathrm{Hz}`` frequency slice in common shot domain. *(a)* True frequency slice. Subsampled frequency slice with *(b)* ``75\%`` missing receivers and *(c)* ``90\%`` missing receivers. Middle row represents reconstruction using observed data with ``75\%`` missing receivers. *(d)* Reconstructed data using conventional method with ``S/R = 17.7\, \mathrm{dB}`` and *(e)* corresponding data residual with respect to true data. *(f)* Reconstructed data using recursively weighted method with ``S/R = 19.9\, \mathrm{dB}`` and *(g)* corresponding data residual with respect to true data. Last row represents reconstruction using observed data with ``90\%`` missing receivers. *(h)* Reconstructed data using conventional method with ``S/R = 3.7\, \mathrm{dB}`` and *(i)* corresponding data residual with respect to true data. *(j)* Reconstructed data using recursively weighted method with ``S/R = 12.5\, \mathrm{dB}`` and *(k)* corresponding data residual with respect to true data. #### Figure: {#csg .wide} ![](BGsynth3D_Figures/True_Sx21Sy21T160_Rx101_Ry101_shall_mute.png){#csga width=25%} ![](BGsynth3D_Figures/Obs_Sx21Sy21T160_Rx101_Ry101_shall_mute.png){#csgb width=25%} ![](BGsynth3D_Figures/Figures_90p/Obs_Sx21Sy21T160_Rx101_Ry101_shall_mute.png){#csg_90pb width=25%}\ ![](BGsynth3D_Figures/Rec_unwt_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csgc width=25%} ![](BGsynth3D_Figures/Diff_unwt_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csgd width=25%} ![](BGsynth3D_Figures/Rec_wtp75_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csge width=25%} ![](BGsynth3D_Figures/Diff_wtp75_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csgf width=25%}\ ![](BGsynth3D_Figures/Figures_90p/Rec_unwt_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csg_90pc width=25%} ![](BGsynth3D_Figures/Figures_90p/Diff_unwt_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csg_90pd width=25%} ![](BGsynth3D_Figures/Figures_90p/Rec_wtp75_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csg_90pe width=25%} ![](BGsynth3D_Figures/Figures_90p/Diff_wtp75_Sx21Sy21T160_Rx101_Ry101_nosvd_shall_mute.png){#csg_90pf width=25%} : Full azimuth wavefield reconstruction in time domain for a common shot gather along with time slice at ``1.6\, \mathrm{s}``. *(a)* True data. Subsampled data with *(b)* ``75\%`` missing receivers and *(c)* ``90\%`` missing receivers. Middle row represents reconstruction using observed data with ``75\%`` missing receivers. *(d)* Reconstructed data using the conventional method with ``S/R = 15.3\, \mathrm{dB}`` and *(e)* corresponding data residual with respect to the true data. *(f)* Reconstructed data using the recursively weighted method with ``S/R = 17.8\, \mathrm{dB}`` and *(g)* corresponding data residual with respect to the true data. Last row represents reconstruction using observed data with ``90\%`` missing receivers. *(h)* Reconstructed data using the conventional method with ``S/R = 3\, \mathrm{dB}`` and *(i)* corresponding data residual with respect to the true data. *(j)* Reconstructed data using the recursively weighted method with ``S/R = 10.2\, \mathrm{dB}`` and *(k)* corresponding data residual with respect to the true data. #### Figure: {#csgdeep} ![](BGsynth3D_Figures/True_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepa width=33%} ![](BGsynth3D_Figures/Obs_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepb width=33%}\ ![](BGsynth3D_Figures/Rec_unwt_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepc width=33%} ![](BGsynth3D_Figures/Diff_unwt_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepd width=33%}\ ![](BGsynth3D_Figures/Rec_wtp75_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepe width=33%} ![](BGsynth3D_Figures/Diff_wtp75_Sx21Sy21_Rx101_Ry101_deep.png){#csgdeepf width=33%} : Full azimuth wavefield reconstruction in time domain for a common shot gather (deeper section). *(a)* True data. *(b)* Subsampled data with ``75\%`` missing receivers. *(c)* Reconstructed data using the conventional method and *(d)* corresponding difference with respect to the true data. *(e)* Reconstructed data using the recursively weighted method and *(f)* corresponding difference with respect to the true data. #### Figure: {#BGsynth_SNR3D} ![](BGsynth3D_Figures/SNR_unwt_wtp75.png){#BGsynth_SNR3Da width=50%} ![](BGsynth3D_Figures/Figures_90p/SNR_unwt_wtp75_90p.png){#BGsynth_SNR3D_90pa width=50%} : Signal to noise ratio comparison of conventional (solid black line) and recursively weighted (``w_{1,2} = 0.75``) method (dashed black line) for all the frequencies for *(a)* ``75\%`` and *(b)* ``90\%`` missing receiver scenarios. ### BG synthetic 3D data with ``90 \%`` missing receivers Next we test the ability of the recursively weighted method with a reduced number of samples. We subsample the BG synthetic 3D data by ``90\%`` using jitter subsampling, i.e. we use only ``10\%`` of receivers for wavefield reconstruction. We use ``4`` alternations and ``40`` inner iterations of SPG-``\ell_2`` in each alternation per frequency slice for both conventional and recursively weighted method. We use rank parameter of ``228`` for all the frequency slices. Like before we arrive at these values by inspecting the quality of reconstructed data based on the continuity of signal and attenuated noise in the reconstructed data. To avoid noise at lower frequencies we start recursively weighted method from ``4.4\, \mathrm{Hz}``. As evident from the signal to noise ratio plot (Figure [#BGsynth_SNR3D_90pa]), we observe improvement in data reconstruction quality across all the frequency slices using the recursively weighted method (dashed black line in Figure [#BGsynth_SNR3D_90pa]) in comparison to its conventional counterpart (solid black line in Figure [#BGsynth_SNR3D_90pa]). In a common shot gather extracted from a frequency slice at ``15\, \mathrm{Hz}``, we observe better continuity and less noise in the reconstructed wavefield (Figure [#freq15_csg90pe]) in comparison to the reconstruction obtained from its conventional counterpart (Figure [#freq15_csg90pc]). We observe more leakage of signal in the data residual with the conventional method (Figure [#freq15_csg90pd]) in comparison to the data residual obtained from recursively weighted method (Figure [#freq15_csg90pf]). In Figure [#csg] we compare data reconstruction in time domain using conventional (Figures [#csg_90pc] and [#csg_90pd]) and recursively weighted method (Figures [#csg_90pe] and [#csg_90pf]). We again observe better data reconstruction and reduced data residual with the recursively weighted method in comparison to reconstruction obtained from the conventional method. ## Discussion From the above case studies it is clear that recursively weighted low-rank matrix completion provides several benefits over conventional method. The reconstructed data preserves the signals and continuity of events even at high frequencies and at deeper sections where amplitude is very weak. We arrive at these results by exploiting the fact that fully sampled seismic data can be approximated by a low-rank matrix in some transform domain and randomized/jittered sampling degrades (or negatively affects) this low-rank property. We also exploit the fact that there is some degree of similarity between the subspaces of adjacent frequency slices of the seismic data. Our method, which uses recursively weighted low-rank matrix completion, outperforms its conventional counterpart in terms of quality of the reconstructed data specially at higher frequencies. At higher frequencies conventional low-rank matrix completion performs poorly because these increasingly complex matrices eventually violate our low-rank assumption. As we mentioned earlier, good quality high frequency content in the data is important for high-resolution imaging of earth's subsurface and also for inversion of earth's physical parameters with fine details. Weighted matrix completion was first introduced by @aravkin2014fast to improve the seismic data reconstruction quality of the conventional matrix completion framework. Here we have exploited the potential of weighted method by recursively reconstructing data from low to high frequencies. Also, we have made the original weighted method formulation proposed by @aravkin2014fast computationally efficient by switching the weights from objective to data misfit constraint function. Similarity between the adjacent frequency slices and appropriate choice of weights play key role in the success of recursively weighted method. Conventional method is easily parallelized over frequencies making it computationally very efficient. Whereas, the interdependence between frequency slices in the recursively weighted method does not allow us to parallelize the recursively weighted method over frequencies. This poses computational challenge especially for large scale 3D datasets. By using the strategies of alternating minimization and decoupling we have made the recursively weighted method computationally efficient for higher weights. Depending on the availability of computational resources, the recursively weighted method can be efficiently applied to large scale 3D datasets. Our parallel weighted framework partially exploits the benefits of weighted low-rank matrix factorization since it can be parallelized only for higher weights. Despite this, our numerical experiments demonstrate improvements in the reconstructed data quality across all the frequencies for 3D seismic data generated on a geologically complicated velocity model resembling part of true Earth’s subsurface. To exploit the full benefits of weighted method for large 3D datasets, our future work will focus on extending this methodology to exploit parallelism even for smaller weight values. By directly using the low-rank factors from a subsequent previous frequency slice to calculate the weight matrix, our recursively weighted framework avoids taking SVDs of the complete dataset to calculate its row and column subspaces. Our SVD free parallel weighted framework can be applied to industry scale large seismic datasets. With the advent of cloud computing there are plenty of computational resources available. But the main issue is to use these resources to optimize both the turnaround time and the budget. Therefore, next steps will be to re-engineer the weighted framework to efficiently use the cloud based computational resources using the ideas of serverless computing. For example, @witte2019SEGevent designed serverless computing architecture to perform large scale 3D seismic imaging. Both the datasets used for experiments have sources and receivers on a uniform grid but in reality this is not the situation. Because of environmental and operational constraints sources and receivers are often shifted from the uniform grid. If we do not take into account of this shift in our reconstruction framework then we can encounter poor performance of the reconstruction framework. By incorporating an extra operator [@lopez2016off] corresponding to these shifts from the uniform grid we can apply our weighted framework to field data recorded on a non-uniform grid. ## Conclusions While successful at the low to midrange frequencies, wavefield reconstruction based on matrix factorization fails at the higher frequency where seismic data is no longer low rank. We overcome this problem by exploiting similarities between low-rank factorizations of adjacent monochromatic frequency slices organized in a form that reveals the underlying low-rank structure of, the for budgetary and physical reasons not accessible, fully sampled data. During matrix factorization these similarities take the form of alignment of the subspaces in which the low-rank factors live. By introducing weight matrices that project these factors onto the nearby subspace of the adjacent frequency, the performance of the low-rank matrix factorization improves if this beneficial feature is used recursively starting at the lower frequencies. However, turning this approach into an algorithm that scales to industry-scale wavefield reconstruction problems for full-azimuth data requires a number of additional important steps. First, we need to avoid costly projections onto weighted constraints. We accomplish this by moving the weights to the data misfit. This simple reformulation results in an equivalent formulation, which is computationally significantly faster. Secondly, while the recursively applied weighting matrices improve the performance for the high frequencies, the introduction of these matrices does not allow for a row-by-row and column-by-column parallelization of the alternating minimization procedure we employ to carry out the matrix factorizations on which our low-rank wavefield reconstruction is based. We overcome this problem by balancing the emphasis we put on information from adjacent frequency slices with our ability to decouple the operations so that the algorithm can be parallelized. By means of carefully selected examples on a 2D field dataset and on a full-azimuth 3D dataset, we demonstrate the ability of the proposed algorithm to handle high frequencies. We also show that the proposed algorithm scales well to 3D problems with large percentages of traces missing. From these results, we argue that the proposed approach could be a valuable alternative to transform-based methods that are force to work on small multidimensional patches. ## Acknowledgment First and second authors have equal contribution to this paper. This research was funded by the Georgia Research Alliance and the Georgia Institute of Technology. ## Appendix A In this section, we justify our parallel implementation of the weighted matrix completion problem. Beginning at equation [#eqWeighted], our original weighted program, we will arrive at equations [#eqwtRparv2] and [#eqwtLparv2] which specify our implemented parallel counterpart. Recall equation [#eqWeighted] ```math \vector{X} := \argmin_{\vector{Y}} \quad \|\vector{Q}\vector{Y}\vector{W}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{Y}) - \vector{B}\|_F \leq \epsilon. ``` Because this is a convex program and ``\vector{Q}``,``\vector{W}`` are invertible when ``w_1,w_2>0``, we can show that ```math #WNN2 \begin{split} \vector{Q}\vector{X}\vector{W} := \argmin_{\vector{Y}} \quad \|\vector{Y}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{Q}^{-1}\vector{Y}\vector{W}^{-1}) - \vector{B}\|_F \leq \epsilon, \end{split} \tag{A$1$} ``` where ```math \vector{Q}^{-1} = \frac{1}{{w}_{1}}\vector{U}\vector{U}^H + \vector{U}^\perp \vector{U}^{{\perp}^{H}} ``` and ```math \vector{W}^{-1} = \frac{1}{{w}_{2}}\vector{V}\vector{V}^H + \vector{V}^\perp \vector{V}^{{\perp}^{H}}. ``` From a numerical perspective, we wish to avoid implementing the operators ``\vector{Q}^{-1}``, ``\vector{W}^{-1}`` due to the factors ``w_1^{-1}``, ``w_2^{-1}`` which may be large and cause algorithmic instability. Instead, by multiplying both sides of the constraint of equation #WNN2 by ``w_1w_2`` we obtain the equivalent program ```math #WNN3 \begin{split} \vector{Q}\vector{X}\vector{W} := \argmin_{\vector{Y}} \quad \|\vector{Y}\|_* \quad \text{subject to} \quad \|\mathcal{A}(\vector{\widehat{Q}}\vector{Y}\vector{\widehat{W}}) - w_1w_2\vector{B}\|_F \leq w_1w_2\epsilon, \end{split} \tag{A$2$} ``` where we have defined ```math \vector{\widehat{Q}} = \vector{U}\vector{U}^H + w_1\vector{U}^\perp \vector{U}^{{\perp}^{H}} ``` and ```math \vector{\widehat{W}} = \vector{V}\vector{V}^H + w_2\vector{V}^\perp \vector{V}^{{\perp}^{H}}. ``` Choosing a rank parameter ``r``, we now apply a factorization approach and solve ```math #eqApp1 \begin{split} \vector{\bar L}, \vector{\bar R} := \quad & \argmin_{\vector{\bar L_\#}, \vector{\bar R_\#}} \quad \frac{1}{2} {\left\| \begin{bmatrix} \vector{\bar L_\#} \\ \vector{\bar R_\#} \end{bmatrix} \right\|}_F^2 \\ & \text{subject to} \\ & \|\mathcal{A}(\vector{\widehat{Q}}\vector{\bar L_\#}\vector{\bar R_\#}^{H}{\vector{\widehat{W}}}) - {w}_{1}{w}_{2}\vector{B}\|_{F} \leq {w}_{1}{w}_{2}\epsilon, \end{split} \tag{A$3$} ``` which gives the approximation ``\vector{\bar L}{\vector{\bar R}}^H \approx \vector{Q}\vector{X}\vector{W}``. Given an initial left factor estimate, ``{\vector{\bar L}}^0``, we proceed with a block coordinate descent [@xu2013block] approach which at the ``k``-th iteration solves ```math #WNNaltR \begin{split} {\vector{\bar R}}^k := \argmin_{\vector{\bar R_\#}} \quad \|\vector{\bar R_\#}\|_F^2 \quad \text{subject to} \quad \|\mathcal{A}(\vector{\widehat{Q}}{\vector{\bar L}}^{k-1}\vector{\bar R_\#}^H\vector{\widehat{W}}) - w_1w_2\vector{B}\|_F \leq w_1w_2\epsilon, \end{split} \tag{A$4$} ``` and upon output switches to optimize over the left factor ```math #WNNaltL \begin{split} {\vector{\bar L}}^k := \argmin_{\vector{\bar L_\#}} \quad \|\vector{\bar L_\#}\|_F^2 \quad \text{subject to} \quad \|\mathcal{A}(\vector{\widehat{Q}}{\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{B}\|_F \leq w_1w_2\epsilon. \end{split} \tag{A$5$} ``` After ``k`` iterations, we obtain estimate ``{\vector{\bar L}}^k({\vector{\bar R}}^k)^H \approx \vector{Q}\vector{X}\vector{W}``. Our next goal is to approximately solve problems #WNNaltL and #WNNaltR in a distributed manner, to be implemented in a parallel computing architecture. To this end, we apply our approximate commutative property, i.e., ``\mathcal{A}(\vector{\widehat{Q}}\vector{Y}\vector{\widehat{W}}) \approx \mathcal{A}(\widehat{Q}\vector{Y})\vector{\widehat{W}}`` and ``\mathcal{A}(\vector{\widehat{Q}}\vector{Y}\vector{\widehat{W}}) \approx \vector{\widehat{Q}}\mathcal{A}(\vector{Y}\vector{\widehat{W}})`` for large values of ``w_1`` and ``w_2``. Using these approximations, we obtain ```math #WNN31 \begin{split} {\vector{\bar L}}^k \approx \argmin_{\vector{\bar L_\#}} \quad \|\vector{\bar L_\#}\|_F^2 \quad \text{subject to} \quad \|\vector{\widehat{Q}}\mathcal{A}({\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{\widehat{Q}}\vector{\widehat{Q}}^{-1}\vector{B}\|_F \leq w_1w_2\epsilon. \end{split} \tag{A$6$} ``` Define ``\vector{\widehat B}_L = \vector{\widehat{Q}}^{-1}\vector{B}``. Using the inequality property ``\|\vector{A}\vector{B}\|_F \leq \|\vector{A}\|\|\vector{B}\|_F`` for any two matrices, where ``\|\circ\|`` is the spectral norm, in the constraint, we see that ```math \|\vector{\widehat{Q}}\left(\mathcal{A}({\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{\widehat B}_L\right)\|_F \leq \|\vector{\widehat{Q}}\|\|\mathcal{A}({\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{\widehat B}_L\|_F \\ = \|\mathcal{A}({\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{\widehat B}_L\|_F. ``` The last equality holds since ``\|\vector{\hat{Q}}\| = \max\{1,w_1\} = 1``. Therefore, if we instead solve ```math #WNN3L \begin{split} {\vector{\tilde L}}^k := \argmin_{\vector{\bar L_\#}} \quad \|\vector{\bar L_\#}\|_F^2 \quad \text{subject to} \quad \|\mathcal{A}({\vector{\bar L_\#}}(\vector{\bar R}^k)^H\vector{\widehat{W}}) - w_1w_2\vector{\widehat B}_L\|_F \leq w_1w_2\epsilon, \end{split} \tag{A$7$} ``` we expect ``{\vector{\tilde L}}^k \approx {\vector{\bar L}}^k`` due to approximate commutativity and therefore ``{\vector{\tilde L}}^k`` is feasible for [#WNN31] . A similar argument can be established for the right factor, where we solve ```math #WNN3R \begin{split} {\vector{\tilde R}}^k := \argmin_{\vector{\bar R_\#}} \quad \|\vector{\bar R_\#}\|_F^2 \quad \text{subject to} \quad \|\mathcal{A}(\vector{\widehat{Q}}{\vector{\tilde L}^{k-1}}\vector{\bar R_\#}^H) - w_1w_2\vector{\widehat B}_R\|_F \leq w_1w_2\epsilon, \end{split} \tag{A$8$} ``` with ``\vector{\widehat B}_R = \vector{B}\vector{\widehat W}^{-1}``. The main advantage in computing iterates ``{\vector{\tilde R}}^k,{\vector{\tilde L}}^k``, rather than ``{\vector{\bar R}}^k,{\vector{\bar L}}^k``, is that these programs allow for a distributed implementation. The data matrices ``\mathbf{\widehat{B}}_R`` and ``\mathbf{\widehat{B}}_L`` in equations [#WNN3R] and [#WNN3L] are dense (have all non-zero entries) making computation expensive. However, when the weights ``w_{1,2}`` are relatively large we observe that both dense matrices ``\mathbf{\widehat{B}}_R``, ``\mathbf{\widehat{B}}_L`` can be well approximated by the sparse observed data matrix ``\vector{B}``. This leads to subproblems [#eqwtRparv2] and [#eqwtLparv2] and concludes our derivation. ```math_def \def\argmin{\mathop{\rm arg\,min}} \def\vec{\mbox{$\mathrm{vec}$}} \def\ivec{\mbox{$\mathrm{vec}^{-1}$}} %\newcommand{\rr}{{{r}}} \newcommand{\m}{{\mathsf{m}}} \newcommand{\PsDO}{\mbox{PsDO\,}} \newcommand{\Id}{\mbox{$\tensor{I}\,$}} \newcommand{\R}{\mbox{$\mathbb{R}$}} \renewcommand{\C}{\mbox{$\mathbb{C}$}} \newcommand{\E}{\mbox{$\mathbb{E}$}} \newcommand{\Z}{\mbox{$\mathbb{Z}$}} \newcommand{\DE}{:=} \newcommand{\Order}{\mbox{${\cal O}$}} \def\bindex#1{{\mathcal{#1}}} %\newcommand{\Id}{\mbox{$\tensor{I}\vector{d}\,$}} \def\pector#1{\mathrm{\mathbf{#1}}} \def\cector#1{#1} \def\censor#1{#1} \def\vector#1{\mathbf{#1}} \def\fvector#1{{\widehat{\vector{#1}}}} \def\evector#1{{\widetilde{\vector{#1}}}} \def\pvector#1{{\breve{\vector{#1}}}} \def\pector#1{\mathrm{#1}} \def\ctensor#1{\bm{\mathcal{#1}}} %\def\tensorm#1{{#1}} \def\tensorm#1{\bm{#1}} %\def\tensor#1{\bm{#1}}7 \def\tensor#1{\vector{#1}} %\def\cector#1{{\vector{\underline{#1}}}} \def\hensor#1{\tensor{#1}} \def\uensor#1{\underline{\bm{#1}}} %\def\hensor#1{{\boldsymbol{\mathsf{#1}}}} \def\hector#1{\vector{#1}} %\def\hector#1{{\boldsymbol{\mathsf{#1}}}} % \def\hfector#1{\hat{\boldsymbol{\mathsf{#1}}}} % \def\ctensor#1{{\tensor{\underline{\underline{\mathsf{#1}}}}}} \DeclareMathOperator\minim{\hbox{minimize}} \def\ftensor#1{{\widehat{\tensor{#1}}}} \def\calsor#1{{\boldsymbol{\mathcal{#1}}}} \def\optensor#1{{\boldsymbol{\mathcal{#1}}}} \def\hvector#1{\hat{\boldsymbol{\mathbf{#1}}}} \def\uvector#1{{\underline{\vector{#1}}}} \def\utensor#1{{\underline{\tensor{#1}}}} %\ \newcommand{\dd}{\ensuremath{\mathrm{d}}} \newcommand{\blockstack}{\operatorname{blockstack}} \newcommand{\diag}{\operatorname{diag}} \newcommand{\rs}[1]{\mathstrut\mbox{\scriptsize\rm #1}} \newcommand{\rr}[1]{\mbox{\rm #1}} \usepackage{algorithm2e} \def\minim{\mathop{\hbox{minimize}}} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{mathtools} ```