Multi-Parameter full-waveform inversion is a challenging problem, because the unknown parameters appear in the same wave equation and the magnitude of the parameters can vary many orders of magnitude. This makes accurate estimation of multiple-parameters very difficult. To mitigate the problems, sequential strategies, regularization methods and scalings of gradients and quasi-Newton Hessians have been proposed. All of these require design, fine-tuning and adaptation to different waveform inversion problems. We propose to use a sparse approximation to the Hessian derived from a penalty-formulation of the objective function. Sparseness allows to have the Hessian in memory and compute update directions at very low cost. This results in decent reconstruction of the multiple parameters at very low additional memory and computational expense.
Multi-parameter waveform inversion is not a new topic, (Tarantola, 1986) proposes a strategy of inverting for different parameters sequentially. Other research in multi-parameter inversion relates to multiple parameters in multiple PDE’s, such as joint inversion of electromagnetic and seismic data. In this area of research a regularization approach was used to promote structurally similar models (Gallardo & Meju, 2004). Some more recent examples in multi-parameter full waveform inversion are Lavoué, Brossier, Métivier, Garambois, & Virieux (2014) and Prieux, Brossier, Operto, & Virieux (2013).
Because regularization which promotes structural similarity relies on a priori information on the geology and sequential inversion strategies may have to be redesigned and fine-tuned for different parameters or different datasets. We will not use any of those ideas in this abstract. Although these sequential inversion and regularization concepts can definitely be very successful, we focus our efforts on fully automatic inversion with minimal use of a priori information and a minimum amount of user intervention to asses if it is a viable alternative to the methods mentioned above.
In this abstract we seek improvement on the gradient based methods (including quasi-Newton methods) by using Hessian information. Our formulation is based on the penalty formulation of the waveform inversion problem as introduced for waveform inversion by Leeuwen & Herrmann (2013). We extend their formulation to the multi-parameter case and derive a sparse Gauss-Newton type approximation to the Hessian. This type of approximation will not be possible when using the adjoint-state method to solve the Lagrangian formulation of a PDE-constrained optimization problem.
Using simple numerical experiments we illustrate the effect of our Hessian approximation and apply it to a multi-parameter waveform inversion problem.
We start by defining the equations we wish to use to invert the measured waveforms. The second order partial differential equation (PDE) that describes the acoustic wave motion is
∂rb∂kp+ω2κp=q.
The general waveform inversion problem is following PDE-constrained optimization problem (in discrete form):
minb,κ,u12‖Pu−d‖22s.t.A(b,κ)u=q.
To compute gradients and Hessians we need the partial derivatives of the PDE w.r.t the two parameters which are
Gb=∂A(b,κ)ˉu/∂bGκ=∂A(b,κ)ˉu/∂κ.
The gradients (eq.7) of the reduced objective (eq.5) can be used for gradient descent-type methods. We will not study these, because the linear convergence rate will make this method infeasible for difficult problems with an ill-conditioned Hessian. Quasi-Newton methods can provide a faster superlinear rate of convergence. The Hessian, of which L-BFGS tries to iteratively obtain an approximation with low-rank updates, has a block structure, where the blocks corresponding to different unknowns can have widely varying Frobenius norms. This results in strong ill-conditioning (can be many orders of magnitude above machine precision). It is therefore questionable how much sense the L-BFGS approximation will make. It is possible to initialize the L-BFGS Hessian with a diagonal to correct for scale differences between Hessian blocks corresponding to the different parameters. This initialization remains a far from trivial choice that has to be made and although sensible and physically inspired choices can be made, the information is not provided by the problem at hand directly. Diagonal scaling can improve compared to the case where just the gradient is used however. See Lavoué et al. (2014) for a nice discussion about the difficulties of multi-parameter inversion and the L-BFGS Hessian. These disappointing results are the motivation to use Hessian information. We begin by deriving the Hessian matrix corresponding to the reduced objective (eq.5). Note that the Hessian and Newton system for the full objective (eq.3) are given by
(∇2u,uϕλ∇2u,κϕλ∇2u,bϕλ∇2κ,uϕλ∇2κ,κϕλ∇2κ,bϕλ∇2b,uϕλ∇2b,κϕλ∇2b,bϕλ)(δuδκδb)=(∇uϕλ∇κϕλ∇bϕλ).
The key contribution of this abstract follows here. We will make two approximations to the reduced Hessian. This Hessian originating from the penalty objective formulation can be split in a dense (CA−1B) and a sparse part (D). As the first approximation, we choose to approximate the reduced Hessian by only its sparse part, D. The accuracy of this approximation depends on the choice of the scalar λ, it is more accurate for a small λ. The penalty objective is equivalent to the constrained Lagrangian formulation for λ↑∞. The use of the sparse part enables us to have it in memory and compute exact Newton-type directions at very low cost, as explained next. Thus, after the first approximation we have the following Hessian:
(∇2κ,κϕλ∇2κ,bϕλ∇2b,κϕλ∇2b,bϕλ).
while Not converged do
1. ˉu=argminu‖(λA(b,κ)P)u−(λqd)‖2 // Solve
2. Gκ,Gb,∇bˉϕλ,∇κˉϕλ // Form
3. pgn=˜H−1g // Solve
4. find steplength α // Linesearch
5. m=m+αpgn // update model
end
We can naively use just the gradient in the L-BFGS method. The result is shown in figure 2 where the true models are shown in figure 1. The initial guess was a smoothed version of the true models. The results show that the buoyancy is not updated very much. Nearly all of the updates is in the compressibility model. The velocity derived from the compressibility and buoyancy is actually reasonable.
To analyse the effect of our proposed Hessian approximation, we construct a model with a point scatterer. It consists of both a compressibility and buoyancy perturbation. The background model is homogeneous. We look at the gradient of the objective w.r.t both the buoyancy and compressibility and compare it to the corresponding Gauss-Newton directions to visually illustrate the effect of the Hessian approximation. The sources and receivers are located at the top of the domain. We see that the Gauss-Newton direction improves the focussing of the buoyancy direction, but it does not do much for the compressibility.
In this waveform inversion example we try to invert just pressure wavefield data recorded near the surface of the model. There are 77 sources and receivers and the data contains frequencies from 2 to 17 Hertz. We use a frequency continuation approach to invert the data. We invert one batch of just 2 frequencies at a time and use the result as an initial guess for the next batch. The batches are {2,3},{3,4},...,{16,17} Hertz. Algorithm 1 was used to solve this inverse problem. The results are shown in figure 4. The reconstructed buoyancy and compressibility show the main structures of the true models (figure 1). The derived velocity model is also quite decent. This synthetic example indicates that our approximate reduced Hessian allows simultaneous inversion of multiple-parameters without the introduction of explicit scalings of gradients or quasi-Newton Hessians. In a way, these type of scalings are implicit in the true Hessian which naturally provides this information based on the actual problem.
The adjoint-state method is based on the Lagrangian formulation to solve eq.2. One proceeds by solving for the field variable and using the result to solve for the Lagrangian multiplier (adjoint-wavefield). This can be used to compute the gradient of the resulting reduced Lagrangian objective w.r.t the medium parameters. This effectively eliminated the PDE-constraint and reduced the constrained problem to an unconstrained one. The resulting reduced Hessian (see for example Haber et al., 2000) contains only one sparse part which is not Hermitian positive definite for all parameters. This makes a similar approach to the one proposed in this abstract impossible. To use Hessian information in this case, the dense Hessian needs to be stored or matrix-vector products have to be computed though, accurate and expensive, extra PDE-solves.
We derived a sparse approximation to the reduced Hessian based on a penalty formulation of the waveform inversion problem. This approximate Hessian enables multi-parameter waveform inversion without the need for incorporating a priori information about the coupling between the parameters by regularization, scaling of gradients or quasi-Newton Hessians or inverting for different parameters sequentially. The derived approximation to the reduced Hessian is Hermitian positive definite by construction and easy to invert when computing the update direction. Further research will focus on including more sparse parts into the approximation as well as the analysis of the practically achievable rate of convergence.
This work was financially supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN 261641-06) and the Collaborative Research and Development Grant DNOISE II (CDRP J 375142-08). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BGP, BP, Chevron, ConocoPhillips, CGG, ION GXT, Petrobras, PGS, Statoil, Total SA, WesternGeco, Woodside.
Aravkin, A. Y., & Leeuwen, T. van. (2012). Estimating nuisance parameters in inverse problems. Inverse Problems, 28(11), 115016.
Gallardo, L. A., & Meju, M. A. (2004). Joint two-dimensional dC resistivity and seismic travel time inversion with cross-gradients constraints. Journal of Geophysical Research: Solid Earth, 109(B3), n/a–n/a. doi:10.1029/2003JB002716
Haber, E., Ascher, U. M., & Oldenburg, D. (2000). On optimization techniques for solving nonlinear inverse problems. Inverse Problems, 16(5), 1263–1280. doi:10.1088/0266-5611/16/5/309
Lavoué, F., Brossier, R., Métivier, L., Garambois, S., & Virieux, J. (2014). Two-dimensional permittivity and conductivity imaging by full waveform inversion of multioffset gPR data: a frequency-domain quasi-newton approach. Geophysical Journal International, 197(1), 248–268. doi:10.1093/gji/ggt528
Leeuwen, T. van, & Herrmann, F. J. (2013). Mitigating local minima in full-waveform inversion by expanding the search space. Geophysical Journal International, 195, 661–667. doi:10.1093/gji/ggt258
Métivier, L., Brossier, R., Virieux, J., & Operto, S. (2013). Full waveform inversion and the truncated newton method. SIAM Journal on Scientific Computing, 35(2), B401–B437. doi:10.1137/120877854
Prieux, V., Brossier, R., Operto, S., & Virieux, J. (2013). Multiparameter full waveform inversion of multicomponent ocean-bottom-cable data from the valhall field. part 1: imaging compressional wave speed, density and attenuation. Geophysical Journal International, 194(3), 1640–1664. doi:10.1093/gji/ggt177
Tarantola, A. (1986). A strategy for nonlinear elastic inversion of seismic reflection data. GEOPHYSICS, 51(10), 1893–1903. doi:10.1190/1.1442046