Processing math: 100%

PDF Version Markdown Version

A scaled gradient projection method for total variation regularized full waveform inversion

Ernie Esser*, Tristan van Leeuwen, Aleksandr Y. Aravkin, and Felix J. Herrmann*
*University of British Columbia Dept. of Earth and Ocean Sciences, Centrum Wiskunde & Informatica, IBM T.J. Watson Research Center

Released to public domain under Creative Commons license type BY.
Copyright (c) 2014 SLIM group @ The University of British Columbia."

Abstract

We propose an extended full waveform inversion formulation that includes convex constraints on the model. In particular, we show how to simultaneously constrain the total variation of the slowness squared while enforcing bound constraints to keep it within a physically realistic range. Synthetic experiments show that including total variation regularization can improve the recovery of a high velocity perturbation to a smooth background model.

Introduction

Acoustic full waveform inversion in the frequency domain can be written as the following PDE constrained optimization problem (F. J. Herrmann et al., 2013; Tarantola, 1984; Virieux & Operto, 2009) minm,usv12Pusvdsv2  such that  Av(m)usv=qsv , where Av(m)usv=qsv denotes the discretized Helmholtz equation. Let s=1,...,Ns index the sources and v=1,...,Nv index frequencies. We consider the model, m, which corresponds to the reciprocal of the velocity squared, to be a real vector mRN, where N is the number of points in the spatial discretization. For each source and frequency the wavefields, sources and observed data are denoted by usvCN,qsvCN and dsvCNr respectively, where Nr is the number of receivers. P is the operator that projects the wavefields onto the receiver locations. The Helmholtz operator has the form Av(m)=ω2vdiag(m)+L , where ωv is angular frequency and L is a discrete Laplacian.

The nonconvex constraint and large number of unknowns make (1) a very challenging inverse problem. Since it is not practical to store all the wavefields, and it also is not always desirable to exactly enforce the PDE constraint, it was proposed in (T. van Leeuwen & Herrmann, 2013b, 2013a) to work with a quadratic penalty formulation of (1), formally written as minm,usv12Pusvdsv2+λ22Av(m)usvqsv2 . This is formal in the sense that a slight modification is needed to properly incorporate boundary conditions, and this modification also depends on the particular discretization used for L. As discussed in (T. van Leeuwen & Herrmann, 2013b), methods for solving the penalty formulation seem less prone to getting stuck in local minima when compared to solving formulations that require the PDE constraint to be satisfied exactly. The unconstrained problem is easier to solve numerically, with alternating minimization as well as Newton-like strategies being directly applicable. Moreover, since the wavefields are decoupled it isn’t necessary to store them all simultaneously when using alternating minimization approaches.

The most natural alternating minimization strategy is to iteratively solve the data augmented wave equation ˉusv(mn)=argminusv12Pusvdsv2+λ22Av(mn)usvqsv2 and then compute mn+1 according to mn+1=argminmsvλ22Lˉusv(mn)+ω2vdiag(ˉusv(mn))mqsv2 . This can be interpreted as a Gauss Newton method for minimizing G(m)=svGsv(m), where Gsv(m)=12Pˉusv(m)dsv2+λ22Av(m)ˉusv(m)qsv2 . Using a variable projection argument (Aravkin & Leeuwen, 2012), the gradient of G at mn can be computed by G(mn)=svRe(λ2ω2vdiag(ˉusv(mn))(ω2vdiag(ˉusv(mn))mn+Lˉusv(mn)qsv)) . A scaled gradient descent approach (Bertsekas, 1999) for minimizing G can be written as Δm=argminΔmRNsvΔmTGsv(mn)+12ΔmTHnsvΔm+cnΔmTΔmmn+1=mn+Δm , where Hnsv should be an approximation to the Hessian of Gsv and cn0. Note that this general form includes gradient descent in the case when H=0 and Newton’s method when H is the true Hessian and c=0. In (T. van Leeuwen & Herrmann, 2013b), a Gauss Newton approximation is used, where cn=0 and Hnsv=Re(λ2ω4vdiag(ˉusv(mn))diag(ˉusv(mn)) . Since the Gauss Newton Hessian approximation is diagonal, it can be incorporated into (8) with essentially no additional computational expense. This corresponds to the alternating procedure of iterating (4) and (5) at least for the formal objective that is linear in m, which it may not be in practice depending on how the boundary conditions are implemented.

Including Convex Constraints

To make the inverse problem more well posed we can add the constraint mC, where C is a convex set. For example, a box constraint on the elements of m could be imposed by setting C={m:m[B1,B2]}. The only modification of (8) is to replace the Δm update by Δm=argminΔmRNsvΔmTGsv(mn)+12ΔmTHnsvΔm+cnΔmTΔm  such that  mn+ΔmC . This ensures mn+1C but makes Δm more difficult to compute. The problem is still tractable if C is easy to project onto or can be written as an intersection of convex constraints that are each easy to project onto. This convex subproblem is unlikely to be a computational bottleneck relative to the expense of solving for ˉu(mn) (4), and in fact it could even speed up the overall method if it leads to fewer required iterations.

If the eigenvalues of the Hessian of G(m) are bounded for mC, cn in (10) can be adaptively updated so that it’s as small as possible while still guaranteeing that G(mn) is monotonically decreasing and limit points of {mn} are stationary points (Esser, Lou, & Xin, 2013). This approach works for fairly general choices of Hn. It also avoids doing an additional line search at each iteration. A line search can potentially be expensive because evaluating G(m) requires first computing ˉu(m). For the Gauss Newton approximation in (9) where Hn=svHnsv, adaptivity in the choice of cn is not needed, and we can take cn to be a fixed small value. This is the version we consider in the remainder of the paper.

Since Hn is diagonal and positive, it is straightforward to add the constraint m[B1,B2]. In fact Δm=argminΔmΔmTG(mn)+12ΔmTHnΔm+cnΔmTΔmsuch that mn+Δm[B1,B2] has a closed form solution in this case. Even with these bound constraints, the model recovered using the penalty formulation can still contain artifacts and spurious oscillations, as shown in Figure 2. A simple and effective way to reduce oscillations in m via a convex constraint is to constrain its total variation (TV) to be less than some positive parameter τ. TV penalties are widely used in image processing to remove noise while preserving discontinuities (Rudin, Osher, & Fatemi, 1992). It is also a useful regularizer in a wide variety of other inverse problems, especially when solving for piecewise constant or piecewise smooth unknowns. For example, TV regularization has been successfully used for electrical inverse tomography (Chung, Chan, & Tai, 2005) and inverse wave propagation (Akcelik, Biros, & Ghattas, 2002). Although the problem is similar, the formulation in (Akcelik et al., 2002) is different that what we are considering here.

Total Variation Regularization

If we represent m as a N1 by N2 image, we can define mTV=1hij(mi+1,jmi,j)2+(mi,j+1mi,j)2=ij1h[(mi,j+1mi,j)(mi+1,jmi,j)] , which is a sum of the l2 norms of the discrete gradient at each point in the discretized model. Assume Neumann boundary conditions so that these differences are zero at the boundary. We can represent mTV more compactly by defining a difference operator D such that Dm is a concatenation of the discrete gradients and (Dm)n denotes the vector corresponding to the discrete gradient at the location indexed by n, n=1,...,N1N2. Then we can define mTV=Dm1,2:=Nn=1(Dm)n . Returning to (8), if we add the constraints m[B1,B2] and mTVτ, then the overall iterations for solving minmG(m)such thatm[B1,B2] and mTVτ have the form Δm=argminΔmΔmTG(mn)+12ΔmTHnΔm+cnΔmTΔmsuch that mn+Δm[B1,B2] and mn+ΔmTVτmn+1=mn+Δm .

Solving the Convex Subproblems

An effective approach for solving the convex subproblems in (15) for Δm is to use a modification of the primal dual hybrid gradient (PDHG) method (Zhu & Chan, 2008) discussed in (Chambolle & Pock, 2011; Esser, Zhang, & Chan, 2010; He & Yuan, 2012; Zhang, Burger, & Osher, 2010) that finds a saddle point of L(Δm,p)=ΔmTG(mn)+12ΔmT(Hn+2cnI)Δm+pTD(mn+Δm)τp,2 for mn+Δm[B1,B2]. Here, ,2 denotes the dual norm of 1,2 and takes the max instead of the sum of the l2 norms. The modified PDHG method requires iterating pk+1=argminpτp,2pTD(mn+Δmk)+12δppk2Δmk+1=argminΔmΔmTG(mn)+12ΔmT(Hn+2cnI)Δm+ΔmTDT(2pk+1pk)+12αΔmΔmk2such that mn+Δm[B1,B2] . These iterations can be written more explicitly as pk+1=pk+δD(mn+Δmk)Π1,2τδ(pk+δD(mn+Δmk))Δmk+1=(Hn+ξnI)1max((Hn+ξnI)(B1mn),min((Hn+ξnI)(B2mn),G(mn)+ΔmkαDT(2pk+1pk))) , where ξn=2cn+1α and Π1,2τδ(z) denotes the orthogonal projection of z onto the ball of radius τδ in the 1,2 norm. Computing this projection is of equivalent difficulty as projecting onto a simplex, which can be done efficiently. The step size restriction required for convergence is αδ1DTD. If h is the mesh width, then it suffices to choose positive α and δ such that αδh28.

Numerical Experiments

We consider a 2D synthetic experiment with a roughly 200 by 200 sized model and a mesh width h equal to 10 meters. The synthetic velocity model shown in Figure 1a has a constant high velocity region surrounded by a slower smooth background. We use an estimate of the smooth background as our initial guess m0. Similar to Example 1 in (T. van Leeuwen & Herrmann, 2013b), we put Ns=34 sources on the left and Nr=81 receivers on the right as shown in Figure 1b. The sources qsv correspond to a Ricker wavelet with a peak frequency of 30Hz.

(a)
(b)
Figure1Synthetic velocity model (a) and source and receiver locations (b).

Data is synthesized at 18 different frequencies ranging from 3 to 20 Hertz. We consider both noise free and slightly noisy data. In the noisy case, random Gaussian noise was added to the data dv independently for each frequency index v and with standard deviations of .05dvNsNr. This may not be a realistic noise model, but it can at least indicate that the method is robust to a small amount of noise in the data.

Three different choices for the regularization parameter τ are considered: τopt, which is chosen to be the total variation of the true slowness squared, τlarge=1000τopt, which is large enough so that the total variation constraint has no effect, and τsmall=.875τopt, slightly smaller than what would seem to be the optimal choice. Note that by using the Gauss Newton step from (11) as an initial guess, the convex subproblem in (15) converges immediately in the τlarge case. The parameter λ for the PDE penalty is fixed at 1 for all experiments.

We loop through the frequencies from low to high in overlapping batches of two, starting with the 3 and 4Hz data, using the computed m as an initial guess for inverting the 4 and 5Hz data and so on. For each frequency batch, we compute 50 outer iterations each time solving the convex subproblem to convergence, stopping when max(pk+1pkpk+1,Δmk+1ΔmkΔmk+1)1e5.

Results of the six experiments are shown in Figure 2. In both the noise free and noisy cases, including TV regularization reduced oscillations in the recovered model and led to better estimates of the high velocity region. Counterintuitively, the locations of the discontinuities were better estimated from noisy data than from noise free data. This likely happened because the noise caused there to be larger discontinuities at the receivers which then strengthened the effect of the TV regularization elsewhere. This is supported by the experiments with τsmall, which show that increasing the amount of TV regularization leads to better resolution of the large discontinuities but at the cost of losing contrast and staircasing the smooth background. There is also a risk of completely missing small discontinuities when τ is chosen to be too small.

τ=τlarge
τ=τopt
τ=τsmall

τ=τlarge
τ=τopt
τ=τsmall
Figure2Recovered velocity from noise free data (first row) and noisy data (second row).

We also consider a synthetic experiment with simultaneous shots. The true velocity model is a 170 by 676 2D slice from the SEG/EAGE salt model shown in Figure 3a. A total of 116 sources and 676 receivers were placed near the surface. The problem size is reduced by considering Nss<Ns random mixtures of the sources qsv defined by ˉqjv=Nss=1wjsqsvj=1,...,Nss , where the weights wjsN(0,1) are drawn from a standard normal distribution. We modify the synthetic data according to ˉdjv=PA1v(m)ˉqjv and use the same strategy to solve the smaller problem minm,ujv12Pujvˉdjv2+λ22Av(m)ujvˉqjv2such thatm[B1,B2] and mTVτ . Starting with a good smooth initial model, results using two simultaneous shots with no added noise and for two values of τ are shown in Figure 3. With only two simultaneous shots the number of PDE solves is resuced by almost a factor of 60. TV regularization helps remove some of the artifacts caused by using so few simultaneous shots and in this case mainly reduces noise under the salt.

(a)
(b)
(c)
(d)
Figure3True velocity (a), initial velocity (b), and recovered velocity from noise free data consisting of two simultaneous shots with τ=τlarge (c) and τ=τsmall (d).

Conclusions and Future Work

We presented a computationally feasible scaled gradient projection algorithm for minimizing the quadratic penalty formulation for full waveform inversion proposed in (T. van Leeuwen & Herrmann, 2013b) subject to additional convex constraints. We showed in particular how to solve the convex subproblems that arise when adding total variation and bound constraints on the model. Synthetic experiments suggest that when there is noisy or limited data TV regularization can improve the recovery by eliminating spurious artifacts and by more precisely identifying the locations of discontinuities.

In future work, we still need to find a practical way of selecting the regularization parameter τ. The fact that significant discontinuities can be resolved using small τ suggests that a continuation strategy that gradually increases τ could be effective. It will be interesting to investigate to what extent this strategy can avoid undesirable local minima. A possible numerical framework for this could be along the lines of the SPOR-SPG method in (Berg & Friedlander, 2011). We also intend to study more realistic numerical experiments.

Acknowledgements

Thanks to Bas Peters for insights about the formulation and implementation of the penalty method for full waveform inversion, and thanks also to Polina Zheglova for helpful discussions about the boundary conditions. This work was financially supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN 261641-06) and the Collaborative Research and Development Grant DNOISE II (CDRP J 375142-08). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BGP, BP, Chevron, ConocoPhillips, CGG, ION GXT, Petrobras, PGS, Statoil, Total SA, WesternGeco, Woodside.

References

Akcelik, V., Biros, G., & Ghattas, O. (2002). Parallel multiscale Gauss-Newton-Krylov methods for inverse wave propagation. Supercomputing, ACM/IEEE …, 00(c), 1–15. Retrieved from http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=1592877

Aravkin, A. Y., & Leeuwen, T. van. (2012). Estimating nuisance parameters in inverse problems. Inverse Problems, 28(11), 115016. Retrieved from http://arxiv.org/abs/1206.6532

Berg, E. van den, & Friedlander, M. P. (2011). Sparse Optimization with Least-Squares Constraints. SIAM Journal on Optimization, 21(4), 1201–1229. doi:10.1137/100785028

Bertsekas, D. P. (1999). Nonlinear Programming (p. 780). Athena Scientific; 2nd edition. Retrieved from http://www.amazon.com/Nonlinear-Programming-Dimitri-P-Bertsekas/dp/1886529000/ref=sr\_1\_2?ie=UTF8\&qid=1395180760\&sr=8-2\&keywords=nonlinear+programming

Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision. Retrieved from http://link.springer.com/article/10.1007/s10851-010-0251-1

Chung, E. T., Chan, T. F., & Tai, X.-C. (2005). Electrical impedance tomography using level set representation and total variational regularization. Journal of Computational Physics, 205(1), 357–372. doi:10.1016/j.jcp.2004.11.022

Esser, E., Lou, Y., & Xin, J. (2013). A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications. SIAM Journal on Imaging Sciences, 6(4), 2010–2046. doi:10.1137/13090540X

Esser, E., Zhang, X., & Chan, T. F. (2010). A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science. SIAM Journal on Imaging Sciences, 3(4), 1015–1046. doi:10.1137/09076934X

He, B., & Yuan, X. (2012). Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences, 1–35. Retrieved from http://epubs.siam.org/doi/abs/10.1137/100814494

Herrmann, F. J., Hanlon, I., Kumar, R., Leeuwen, T. van, Li, X., Smithyman, B., … Takougang, E. T. (2013). Frugal full-waveform inversion: From theory to a practical algorithm. The Leading Edge, 32(9), 1082–1092. doi:10.1190/tle32091082.1

Leeuwen, T. van, & Herrmann, F. J. (2013a). A penalty method for PDE-constrained optimization.

Leeuwen, T. van, & Herrmann, F. J. (2013b). Mitigating local minima in full-waveform inversion by expanding the search space. Geophysical Journal International, 195(1), 661–667. Retrieved from http://gji.oxfordjournals.org/cgi/doi/10.1093/gji/ggt258

Rudin, L., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60, 259–268. Retrieved from http://www.sciencedirect.com/science/article/pii/016727899290242F

Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49(8), 1259–1266. Retrieved from http://link.springer.com/article/10.1007/s10107-012-0571-6 http://arxiv.org/abs/1110.0895 http://library.seg.org/doi/abs/10.1190/1.1441754

Virieux, J., & Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6), WCC1–WCC26. doi:10.1190/1.3238367

Zhang, X., Burger, M., & Osher, S. (2010). A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration. Journal of Scientific Computing, 46(1), 20–46. doi:10.1007/s10915-010-9408-8

Zhu, M., & Chan, T. (2008). An Efficient Primal-Dual Hybrid Gradient Algorithm For Total Variation Image Restoration. UCLA CAM Report [08-34], (1), 1–29.