Source separation via SVD-free rank minimization in the hierarchical semi-separable representation

Haneet Wason^*, Rajiv Kumar^*, Aleksandr Y. Aravkin^†, and Felix J. Herrmann^*
^*Seismic Laboratory for Imaging and Modeling (SLIM), University of British Columbia, ^†IBM T.J. Watson Research Center, New York, USA

Abstract

Recent developments in matrix rank optimization have allowed for new computational approaches in the field of source separation or deblending. In this paper, we propose a source separation algorithm for blended marine acquisition, where two sources are deployed at different depths (over/under acquisition). The separation method incorporates the Hierarchical Semi-Separable structure (HSS) inside rank-regularized least-squares formulations. The proposed approach is suitable for large scale problems, since it avoids SVD computations and uses a low-rank factorized formulation instead. We illustrate the performance of the new HSS-based deblending approach by simulating an over/under blended acquisition, wherein uniformly random time delays (of $<$ 1 second) are applied to the one of the sources.

Introduction

The benefits of acquiring and processing over/under data are clear; the recorded bandwidth is extended at both low and high ends of the spectrum. The over/under acquisition method allows separation of the up- and downgoing wavefields at the source (or receiver) using a vertical pair of sources (or receivers) to determine wave direction (Moldoveanu et al., 2007). Acquiring data with over/under sources is an instance of simultaneous acquisition, which generates blended datasets that need to be separated into the individual, unblended datasets for further processing.

The challenge of source separation or deblending has been addressed by many researchers (Akerberg, Hampson, Rickett, Martin, & Cole, 2008; Huo, Luo, & Kelamis, 2009; Moore et al., 2008; Stefani, Hampson, & Herkenhoff, 2007), wherein the key observation has been that as long as the sources are fired at suitably randomly dithered times, the resulting interferences (or source crosstalk) will appear noise-like in specific gather domains such as common-offset and common-receiver, turning the separation problem into a (random) noise removal procedure. Inversion-type algorithms (Abma, Manning, Tanis, Yu, & Foster, 2010; Baardman & Borselen, 2013; Doulgeris, Bube, Hampson, & Blacquiere, 2012; Mahdad, Doulgeris, & Blacquiere, 2011; Moore, 2010) take advantage of sparse representations of coherent seismic signals. (Wason & Herrmann, 2013a, 2013b) proposed an alternate sampling strategy for simultaneous acquisition (time-jittered marine) that leverages ideas from compressed sensing (CS), addressing the deblending problem through a combination of tailored (blended) acquisition design and sparsity-promoting recovery via convex optimization using $\ell_1$ constraints.

More recently, rank-minimization based techniques have been used for source separation by Maraschini, Dyer, Stevens, & Bird (2012) and Cheng & Sacchi (2013). The general idea is to exploit the low-rank structure of seismic data when it is organized in a matrix. Low-rank structure refers to the small number of nonzero singular values, or quickly decaying singular values. Maraschini et al. (2012) followed the rank-minimization approach proposed by Oropeza & Sacchi (2011), who identified that seismic temporal frequency slices organized into a block Hankel matrix, in ideal conditions, is a matrix of rank $k$ , where $k$ is the number of different plane waves in the window of analysis. These authors showed that additive random noise increase the rank of the block Hankel matrix, and the authors presented an iterative algorithm that resembles seismic data reconstruction with the method of projection onto convex sets, where they use a low-rank approximation of the Hankel matrix via the randomized singular value decomposition (Halko, Martinsson, & Tropp, 2011; Liberty, Woolfe, Martinsson, Rokhlin, & Tygert, 2007) to interpolate seismic temporal frequency slices. While this technique may be effective interpolating data with a limited number of distinct dips, the approach requires embedding the data into an even larger space where each dimension of size $n$ is mapped to a matrix of size $n \times n$ and that forces them to work on small windows and now choosing the size of these windows becomes a factor.

In this paper, we avoid the direct approach to rank-minimization that involves computing singular value decompositions (SVD) of the matrices and follow instead the approach proposed by Aravkin, Kumar, Mansour, Recht, & Herrmann (2013); Kumar, Mansour, Aravkin, & Herrmann (2013) and the references therein. The key idea here is that the monochromatic frequency slices of the fully sampled data matrix have low-rank structure in the midpoint-offset (m-h) domain, whereas, data with random noise increases the rank of the resulting frequency slice in the m-h domain. Seismic frequency slices exhibit low-rank structure in the m-h domain at low-frequencies, but not at high-frequencies. This behaviour is due to the increase in oscillations as we move from low to high-frequency slices in the m-h domain, even though the energy remains focused around the diagonal (Kumar et al., 2013).

Rank-minimization in the high-frequency range requires extended formulations that incorporate low-rank structure. While Engquist & Ying (2010) propose to overcome this difficulty by proposing directional low-rank approximations for Green’s functions of the acoustic wave equation in 3D, we rely instead on the Hierarchically Semi-Separable matrix representation (HSS) method proposed by Chandrasekaran, Dewilde, Gu, Lyons, & Pals (2006) to represent frequency slices. The key idea in the HSS representation is that certain full-rank matrices, e.g., matrices that are diagonally dominant with energy decaying along the off-diagonals, can be represented by a collection of low-rank sub-matrices.

Following the same ideas, in an over/under simultaneous acquisition, we observe that monochromatic frequency slices of the fully sampled blended data matrix, with periodic firing times, have low-rank structure in the m-h domain, whereas, uniformly random time delays applied to one or both sources, increase the rank of the resulting frequency slice in the m-h domain. Hence, the blended data in over/under randomized acquisition is separated into its constituent source components using a fast optimization approach that combines the (SVD-free) matrix factorization approach recently developed by Lee, Recht, Salakhutdinov, Srebro, & Tropp (2010) with the Pareto curve approach proposed by Berg & Friedlander (2008).

The paper is organized as follows. We start by explaining the methodology to perform source separation by combining the HSS structure with factorization-based rank-regularized optimization formulations (Lee et al., 2010). Next, we demonstrate the successful implementation of the proposed source separation strategy on a simulated dataset. In addition, we also make comparisons with the sparsity-promoting deblending techniques.

Theory

The success of rank-minimization hinges on the fact that regularly sampled target dataset should exhibit a low-rank structure in rank-revealing “transform domain”. Kumar et al. (2013) showed that the frequency slices of unblended seismic data do not exhibit a low-rank structure in the source-receiver (s-r) domain since strong wavefronts extend diagonally across the s-r plane. However, transforming the data into the m-h domain results in a vertical alignment of the wavefronts, thereby reducing the rank of the frequency slice matrix. In the low frequency slices, this vertical alignment can be accurately approximated by a low-rank representation. On the other hand, high frequency slices include a variety of wave oscillations that increase the rank. However, Kumar et al. (2013) mentioned that it is possible to find accurate low-rank approximations of sub-matrices of the high-frequency slices by partitioning the data into a Hierarchical Semi-Separable (HSS) structure. The HSS structure first partitions a matrix into diagonal and off-diagonal submatrices. The same partitioning structure is then applied recursively to the diagonal sub-matrices only, since these sub-matrices still possess high rank structure.

In this paper, we consider an over/under blended acquisition scenario with two sources placed at different depths with an interval of 5 m. We observe that the monochromatic frequency slices of the blended data, with periodic firing times of sources, have low-rank structure in the m-h domain, while the frequency slices of the blended data with uniformly random firing time delays applied to the second source, do not. Note, that we have fully sampled data in both cases. Figures 1(a,b) show the monochromatic frequency slice at 25 Hz in the m-h domain of the blended data with periodic firing times of both sources, and with uniformly random firing time delays of the second source, respectively.

Randomization of the time delays increases the rank or makes the singular values decay less quickly in the m-h domain, an essential feature for rank-minimization techniques to be effective. To illustrate this behaviour, we plot the decay of the singular values of blended data in Figure 2. Note that uniformly random firing time delays does not noticeably change the decay of the singular values in the s-r domain, as expected, but significantly slow down the decay rate in the m-h domain as shown in Figures 2(a,b). Therefore, low-frequency slices of unblended datasets can be reconstructed by solving the rank-minimization problem in the m-h domain, while high-frequency slices can be reconstructed by first performing an HSS partitioning and then solving the rank-minimization problem on each partition in its respective m-h domain (Kumar et al., 2013).

Source separation

Let $X_0$ be a matrix in $\mathbb{C}^{n\times m}$ and let $\mathcal{A}$ be a linear measurement operator that maps from $\mathbb{C}^{n\times m} \rightarrow \mathbb{C}^p$ with $p \ll {n\times m}$ . Recht, Fazel, & Parrilo (2010) showed that under certain general conditions on the operator $\mathcal{A}$ , the solution to the rank-minimization problem can be found by solving the following nuclear-norm minimization problem: $min X | | X | | * s.t. ‖ A (X) - b ‖ 2 \leq ϵ,$

$\begin{equation} \label{BPDN} \tag{BPDN$_\epsilon$} \min_{X} ||X||_* \quad \text{s.t.} \quad \|\mathcal{A} (X) - b\|_2 \leq \epsilon, \end{equation}$ where

b

is a set of measurements,

\left\| X\right\|_* = \|\sigma\|_1

, and

\sigma

is the vector of singular values. In the case of over/under simultaneous acquisition, given two unblended data matrices

{X_1}

and

{X_2}

and blended data measurements

b

, we can redefine our systems of equations as

\begin{equation*} \begin{aligned} \overbrace{ \begin{bmatrix} {{M}}{\bf{\mathcal{S}}}^H & {{MT}}{\bf{\mathcal{S}}}^H \end{bmatrix} }^{\mathcal{A}} \overbrace{ \begin{bmatrix} {X}_1 \\ {X}_2 \end{bmatrix} }^{X} = {b} , \end{aligned} \end{equation*}

where

{M}

is the sampling operator,

\mathcal{S}

is the transformation operator from the s-r domain to the m-h domain,

^H

denotes the Hermitian transpose, and

T

is defined as a time delay operator which applies the uniformly random time delays to the second source.

In order to efficiently solve $\ref{BPDN}$ , we use an extension of the SPG $\ell_1$ solver (Berg & Friedlander, 2008) developed for the $\ref{BPDN}$ problem in Aravkin, Burke, & Friedlander (2012). The SPG $\ell_1$ algorithm finds the solution to the $\ref{BPDN}$ by solving a sequence of LASSO subproblems $min X ‖ A (X) - b ‖ 2 s.t. | | X | | * \leq τ,$

$\begin{equation} \label{LASSO} \tag{LASSO$_\tau$} \min_{X} \|\mathcal{A} (X) - b\|_2 \quad \text{s.t.} \quad ||X||_* \leq \tau, \end{equation}$ where

\tau

is updated by traversing the Pareto curve. Solving each LASSO subproblem requires a projection onto the nuclear norm ball

\left\| X\right\|_* \leq \tau

in every iteration by performing a singular value decomposition and then thresholding the singular values. In the case of large scale seismic problems, it becomes prohibitively expensive to carry out such a large number of SVDs. Instead, we adopt a recent factorization-based approach to nuclear-norm minimization (Lee et al., 2010; Recht & Ré, 2011; Rennie & Srebro, 2005). The factorization approach parametrizes the matrix (

X_1

X_2

)

\in \mathbb{C}^{n\times m}

as the product of two low rank factors (

L_1

L_2

)

\in \mathbb{C}^{n\times k}

and (

R_1

R_2

)

\in \mathbb{C}^{m\times k}

, such that,

\begin{equation} \label{product} \begin{split} X = \begin{bmatrix} {L_1R_1^H} \\ {L_2R_2^H} \\ \end{bmatrix} \end{split} \end{equation}

The optimization scheme can then be carried out using the factors (

L_1, L_2

) and (

R_1, R_2

) instead of (

X_1, X_2

), thereby significantly reducing the size of the decision variable from

2nm

2k(n+m)

when

k \ll m,n

. Rennie & Srebro (2005) showed that the nuclear norm obeys the relationship

\begin{equation} \label{nucInequality} \begin{split} \|X\|_* \leq \frac{1}{2}\left\|\begin{bmatrix}L_1 \\ R_1 \end{bmatrix}\right\|_F^2 + \frac{1}{2}\left\|\begin{bmatrix}L_2 \\ R_2 \end{bmatrix}\right\|_F^2 =: \Phi(L_1,R_1,L_2,R_2), \end{split} \end{equation}

where

\|\cdot\|_F^2

is the Frobenius norm of the matrix (sum of the squared entires).

Consequently, the LASSO subproblem can be replaced by $min L 1, R 1, L 2, R 2 ‖ A (X) - b ‖ 2 s.t. Φ (L 1, R 1, L 2, R 2) \leq τ,$

$\begin{equation} \label{subproblem} \min_{L_1,R_1,L_2,R_2} \|\mathcal{A}(X) - b\|_2 \quad \text{s.t.} \quad \Phi(L_1,R_1,L_2,R_2) \leq \tau\;, \end{equation}$ where the projection onto

\Phi(L_1,R_1,L_2,R_2) \leq \tau

is easily achieved by multiplying each factor (

L_1, L_2

) and (

R_1, R_2

) by the scalar

\sqrt{2\tau/\Phi(L,R)}

. By equation

\ref{nucInequality}

for each HSS sub-matrix in the m-h domain, we are guaranteed that

\|X\|_* \leq \tau

for any solution of

\ref{subproblem}

. Once the optimization problem is solved, we transform each sub-matrix back from the m-h to the s-r domain, where we concatenate all the sub-matrices to get the deblended monochromatic frequency data matrices. One of the advantages of HSS representation is that it works with the recursive partitioning of a matrix, and sub-matrices can be solved in parallel, speeding up the optimization formulation.

Numerical results

We illustrate the performance of our proposed algorithm on data generated from the BG compass model using the IWAVE software. With a source (and receiver) sampling of 25.0 m, one dataset is generated with a source-depth of 10.0 m (Figure 3(a)), while the other dataset has the source at 15.0 m depth. Each dataset has $N_t = 1024$ time samples, $N_r = 129$ receivers and $N_s = 129$ shots. To simulate the over/under blended acquisition scenario, the two datasets are (simply) summed for the periodic case, while uniformly random time delays between 0-1 second are applied to the dataset of the second source (Figure 3(b)) for the randomized case. Figure 3(c) shows the resulting blended shot gather. Since the time delays are less than a second, we call this scenario the low variability randomized acquisition. Using the proposed rank-minimization technique, we are able to successfully deblend the blended shot gathers into its constituent source components as illustrated in Figures 4(a,b). Figures 4(e,f) show the difference between the unblended shot gathers and the (recovered) deblended shot gathers of source 2 and source 1, respectively.

It is well known that seismic data admit sparse representations by curvelets that capture “wavefront sets” efficiently (E. J. Candès & Demanet, 2005; E. Candès, Demanet, Donoho, & Ying, 2006; Herrmann, Moghaddam, & Stolk, 2008; Smith, 1998). Therefore, we also compare the proposed rank-minimization based deblending algorithm with the sparsity-promoting deblending technique. Here, we use $\ref{BPDN}$ formulation to minimize the $\ell_1$ norm instead of the nuclear norm, and the transformation operator $\mathcal{S}$ is the 2-D curvelet operator. Figures 4(c,b) show the deblended shot gathers using sparsity-promoting based techniques and Figures 4(g,h) show the corresponding difference plots between the unblended shot gathers and the (recovered) deblended shot gathers of source 2 and source 1, respectively. It can be clearly seen that source separation via sparsity-promotion does not help to deblend the blended dataset especially in this low variability scenario. To understand this behaviour, we plot the decay of the curvelet coefficients (Figure 2c) of the monochromatic frequency slice at 25 Hz in the s-r domain, for the periodic and the randomized blended acquisition. For this low variability scenario, the randomization does not destroy the sparse structure of the data, i.e., does not decrease the decay of the curvelet coefficients significantly, which is an essential requirement for sparsity-promoting deblending techniques as shown in Wason & Herrmann (2013a); Wason & Herrmann (2013b), which can be termed as the high variability randomized acquisition scenario.

Conclusions

We have presented a new method for source separation in an over/under acquisition scenario, where we incorporate the HSS structure with factorization-based rank-regularized optimization formulations. We combine the Pareto curve approach for optimizing $\ref{BPDN}$ formulations with the SVD-free matrix factorization methods to solve the nuclear-norm optimization formulation. The proposed rank-minimization source separation technique helps in deblending the blended datasets in the low variability randomized acquisition scenarios as observed here. The experimental results demonstrate the potential benefit of this methodology. Incorporating HSS into this scheme is very promising, since we can attack high-rank structures using the partitioning, and in addition gain a computational advantage by optimizing each sub-block in parallel. In reality, seismic data are typically irregularly sampled along spatial axes, therefore, future work includes working with non-uniform sampling grids for 2-D and 3-D seismic.

Acknowledgments

We would like to thank the authors of IWAVE (a framework for wave simulation). This work was financially supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN 261641-06) and the Collaborative Research and Development Grant DNOISE II (CDRP J 375142-08). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BGP, BP, Chevron, ConocoPhillips, CGG, ION GXT, Petrobras, PGS, Statoil, Total SA, WesternGeco, Woodside.

Figure1Monochromatic frequency slice at 25 Hz in the m-h domain for blended data (a) with periodic firing times for both sources, and (b) with uniformly random firing times of the second source.

Figure2Decay of singular values for a frequency slice (at 25 Hz) of blended shot gather, with (a) periodic, and (b) uniformly random firing time delays of the second source. (c) Decay of the curvelet coefficients in s-r domain for case (a,b).

Figure3Unblended shot gather of source 1, (b) Unblended, uniformly random time-delayed shot gather of source 2, (c) Blended shot gather.

Figure4Deblended shot gathers of source 2 and source 1 using HSS based rank-minimization (a,b) and using curvelet based sparsity-promoting technique (c,d). (e,f,g,h) Difference of deblended shot gathers from unblended shot gathers (Figure 3(a,b)).

References

Abma, R., Manning, T., Tanis, M., Yu, J., & Foster, M. (2010). High quality separation of simultaneous sources by sparse inversion. In 72nd eAGE conference and exhibition.

Akerberg, P., Hampson, G., Rickett, J., Martin, H., & Cole, J. (2008). Simultaneous source separation by sparse radon transform. SEG Technical Program Expanded Abstracts, 27(1), 2801–2805. doi:10.1190/1.3063927

Aravkin, A. Y., Burke, J. V., & Friedlander, M. P. (2012). Variational properties of value functions. submitted to SIAM J. Opt., ArXiv:1211.3724.

Aravkin, A. Y., Kumar, R., Mansour, H., Recht, B., & Herrmann, F. J. (2013, May). Fast methods for denoising matrix completion formulations, with application to robust seismic data interpolation. Retrieved from http://arxiv.org/pdf/1302.4886v3.pdf

Baardman, R. H., & Borselen, R. G. van. (2013). Method and system for separating seismic sources in marine simultaneous shooting acquisition. Patent Application, EP 2592439 A2.

Berg, E. van den, & Friedlander, M. P. (2008). Probing the Pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 31(2), 890–912.

Candès, E. J., & Demanet, L. (2005). The curvelet representation of wave propagators is optimally sparse. Communications on Pure and Applied Mathematics, 58(11), 1472–1528. doi:10.1002/cpa.20078

Candès, E., Demanet, L., Donoho, D., & Ying, L. (2006). Fast discrete curvelet transforms. Multiscale Modeling & Simulation, 5(3), 861–899. doi:DOI:10.1137/05064182X

Chandrasekaran, S., Dewilde, P., Gu, M., Lyons, W., & Pals, T. (2006). A fast solver for HSS representations via sparse matrices. SIAM J. Matrix Analysis Applications, 29(1), 67–81. doi:http://dx.doi.org/10.1137/050639028

Cheng, J., & Sacchi, M. D. (2013). Separation of simultaneous source data via iterative rank reduction. In SEG technical program expanded abstracts (pp. 88–93). Retrieved from http://dx.doi.org/10.1190/segam2013-1313.1

Doulgeris, P., Bube, K., Hampson, G., & Blacquiere, G. (2012). Convergence analysis of a coherency-constrained inversion for the separation of blended data. Geophysical Prospecting, 60(4), 769–781. doi:10.1111/j.1365-2478.2012.01088.x

Engquist, B., & Ying, L. (2010). Fast directional algorithms for the Helmholtz kernel. J. Comput. Appl. Math., 234(6), 1851–1859. doi:10.1016/j.cam.2009.08.036

Halko, N., Martinsson, P.-G., & Tropp, J. A. (2011). Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2), 217–288.

Herrmann, F. J., Moghaddam, P. P., & Stolk, C. C. (2008). Sparsity- and continuity-promoting seismic imaging with curvelet frames. Journal of Applied and Computational Harmonic Analysis, 24(2), 150–173.

Huo, S., Luo, Y., & Kelamis, P. (2009). Simultaneous sources separation via multi-directional vector-median filter. SEG Technical Program Expanded Abstracts, 28(1), 31–35. doi:10.1190/1.3255522

Kumar, R., Mansour, H., Aravkin, A. Y., & Herrmann, F. J. (2013, September). Reconstruction of seismic wavefields via low-rank matrix factorization in the hierarchical-separable matrix representation. SEG. Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/Conferences/SEG/2013/kumar2013SEGHSS/kumar2013SEGHSS.pdf

Lee, J., Recht, B., Salakhutdinov, R., Srebro, N., & Tropp, J. (2010). Practical large-scale optimization for max-norm regularization. In Advances in neural information processing systems, 2010.

Liberty, E., Woolfe, F., Martinsson, P., Rokhlin, V., & Tygert, M. (2007). Randomized algorithms for the low-rank approximation of matrices, 104(51), 20167–20172. doi:10.1073/pnas.0709640104

Mahdad, A., Doulgeris, P., & Blacquiere, G. (2011). Separation of blended data by iterative estimation and subtraction of blending interference noise. Geophysics, 76(3), Q9–Q17. doi:10.1190/1.3556597

Maraschini, M., Dyer, R., Stevens, K., & Bird, D. (2012). Source separation by iterative rank reduction - theory and applications. In 74th eAGE conference and exhibition.

Moldoveanu, N., Combee, L., Egan, M., Hampson, G., Sydora, L., & Abriel, W. (2007). Over/under towed-streamer acquisition: A method to extend seismic bandwidth to both higher and lower frequencies. The Leading Edge, 26, 41–58. Retrieved from http://www.slb.com/~/media/Files/westerngeco/resources/articles/2007/jan07_tle_overunder.pdf

Moore, I. (2010). Simultaneous sources - processing and applications. In 72nd eAGE conference and exhibition.

Moore, I., Dragoset, B., Ommundsen, T., Wilson, D., Ward, C., & Eke, D. (2008). Simultaneous source separation using dithered sources. SEG Technical Program Expanded Abstracts, 27(1), 2806–2810. doi:10.1190/1.3063928

Oropeza, V., & Sacchi, M. (2011). Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis. Geophysics, 76(3), V25–V32.

Recht, B., & Ré, C. (2011). Parallel stochastic gradient algorithms for large-scale matrix completion. In Optimization online.

Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.

Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on machine learning (pp. 713–719). New York, NY, USA: ACM.

Smith, H. F. (1998). A Hardy space for Fourier integral operators. Journal of Geometric Analysis, 8(4), 629–653.

Stefani, J., Hampson, G., & Herkenhoff, F. (2007). Acquisition using simultaneous sources. In 69th eAGE conference and exhibition.

Wason, H., & Herrmann, F. J. (2013a, June). Ocean bottom seismic acquisition via jittered sampling. EAGE. Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2013/wason2013EAGEobs/wason2013EAGEobs.pdf

Wason, H., & Herrmann, F. J. (2013b, September). Time-jittered ocean bottom seismic acquisition. SEG. Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/Conferences/SEG/2013/wason2013SEGtjo/wason2013SEGtjo.pdf