---
title: Uncertainty quantification for inverse problems with a weak wave-equation constraint
author: |
    Zhilong Fang^1^\*, Curt Da Silva^1^, Rachel Kuske^2^, and Felix J. Herrmann^1^\
		^1^Seismic Laboratory for Imaging and Modeling (SLIM), University of British Columbia\
	^2^Department of Mathematics, University of British Columbia\
bibliography:
	- zfang.bib
---
## Abstract
In this work, we present a new posterior distribution to quantify
uncertainties in solutions of wave-equation based inverse problems. By
introducing an auxiliary variable for the wavefields, we weaken the
strict wave-equation constraint used by conventional Bayesian
approaches. With this weak constraint, the new posterior distribution is
a bi-Gaussian distribution with respect to both model parameters and
wavefields, which can be directly sampled by the Gibbs sampling method.

\vspace{-0.1cm}
## Introduction
\vspace{-0.2cm}

In wave-equation based inverse problems, the goal is to infer the
unknown model parameters from the observed data using the wave-equation
as a constraint. Conventionally, the wave-equation is treated as a
strict constraint in a Bayesian inverse problem. After eliminating
this constraint, the problem involves the following forward modeling
operator mapping model to predicted data:

```math #forward
  f(\mvec) = \Pmat\A(\mvec)^{-1}\q,
```

where the vectors ``\mvec \in \mathbb{R}^{\ngrid}`` and ``\q \in \mathbb{C}^{\ngrid}`` represent the discretized ``\ngrid``-dimensional unknown model parameters and known source term, respectively. The matrix ``\A \in \mathbb{C}^{\ngrid \times \ngrid}`` represents the discretized wave-equation operator and the operator ``\Pmat \in \mathbb{R}^{\nrcv \times \ngrid}`` projects the solution of the wave equation ``\uvec = \A(\mvec)^{-1}\q`` onto the ``\nrcv`` receivers.

In the Bayesian framework, the solution of an inverse problem given observed data ``\dvec`` is a posterior probability density function (PDF) ``\rhopost(\mvec|\dvec)`` expressed as [@Tarantola1982Bayesian]:

```math #PostD
  \rhopost(\mvec|\dvec) \propto \rholike(\dvec|\mvec)\rhoprior(\mvec),
```

where the likelihood PDF ``\rholike(\dvec|\mvec)`` describes the probability of observing  data ``\dvec`` given model parameters ``\mvec`` and the prior PDF ``\rhoprior(\mvec)`` describes one's prior knowledge about the model parameters. Under the assumption that the noise in the data is Gaussian with zero mean and covariance matrix ``\SIGMAN``, and the prior distribution is also Gaussian with a mean model ``\mp`` and covariance matrix ``\SIGMAP``, the posterior PDF can be written as:

```math #PostD2
  \rhopost(\mvec|\dvec) \propto \exp(-\frac{1}{2}(\|f(\mvec)-\dvec\|^{2}_{\SIGMAN^{-1}} + \|\mvec-\mp\|^{2}_{\SIGMAP^{-1}})).
```

Due to the non-linear map ``f(\mvec)`` and the high dimensionality of the model parameters (``\ngrid \geq 10^5``), applying Markov chain Monte Carlo (McMC) methods to sample the posterior PDF (#PostD2) faces a difficult challenge of constructing a proposal PDF that provides a reasonable approximation of the target density with reasonable computational costs [@MartinMcMC2012].

\vspace{-0.1cm}
## Posterior PDF with weak constraint
\vspace{-0.2cm}

As the exact constraints ``A(\mvec)\uvec = \q`` leads to the difficulty of studying the corresponding posterior PDF, we weaken the constraints and arrive at a more generic posterior PDF with an auxiliary variable -- wavefields  ``\uvec`` as follows:

```math #PostDP
  \rhopost(\mvec,\uvec|\dvec) &\propto \rho_{1}(\dvec|\uvec)\rho_{2}(\uvec|\mvec)\rhoprior(\mvec), \text{ with } \\
  \rho_{1}(\dvec|\uvec) &\propto \exp(-\frac{1}{2}\|\Pmat\uvec-\dvec\|^{2}_{\SIGMAN^{-1}}), \text{ and} \\
  \rho_{2}(\uvec|\mvec) &\propto \exp(-\frac{\lambda^{2}}{2}\|\A(\mvec)\uvec-\q\|^2).
```

Here the penalty parameter ``\lambda`` controls the trade off between the wave-equation and the data-fitting terms. As ``\lambda`` grows, the wavefields are more tightly constrained by the wave equation. It is readily observed that the posterior PDF (#PostD) is a special case of the posterior PDF (#PostDP) with ``\rho_{2}(\uvec|\mvec) = 1`` and ``\A(\mvec)\uvec = \q``.

The new posterior PDF (#PostDP) has two important properties. First, the conditional PDF ``\rhopost(\uvec|\mvec,\dvec)`` of ``\uvec`` on ``\mvec`` is Gaussian. Second, if the matrix ``\A(\mvec)`` linearly depends on ``\mvec``, the conditional PDF ``\rhopost(\mvec|\uvec,\dvec)`` of ``\mvec`` on ``\uvec`` is also Gaussian. Therefore, the posterior PDF (#PostDP) is bi-Gaussian with respect to ``\uvec`` and ``\mvec``.

<!-- If the matrix ``\A(\mvec)`` linearly depends on ``\mvec``, the dependency of the posterior PDF on ``\mvec`` is through the linear operation ``\A(\mvec)\uvec-\q`` instead of the highly nonlinear relation ``\A(\mvec)^{-1}\q``, which leads to a bi-Gaussian posterior PDF with respect to ``\mvec`` and ``\uvec``. In other words, the conditional PDF ``\rhopost(\uvec|\mvec,\dvec)`` of ``\uvec`` on ``\mvec`` is Gaussian and vice versa. -->

\vspace{-0.1cm}
## Gibbs sampling
\vspace{-0.2cm}

The bi-Gaussian property of the posterior PDF (#PostDP) provides a straight-forward intuition of applying Gibbs sampling method by alternatively drawing samples ``\uvec`` and ``\mvec`` from the conditional PDFs ``\rhopost(\uvec|\mvec,\dvec)`` and ``\rhopost(\mvec|\uvec,\dvec)``. At the ``k^{th}`` iteration of the Gibbs sampling, starting from the point ``\mvec_{k}``, the conditional PDF ``\rhopost(\uvec|\mvec_{k},\dvec)`` ``= \mathcal{N}(\overline{\uvec},\Hmat_{u}^{-1})`` with the Hessian matrix ``\Hmat_{u}`` and the mean wavefields ``\overline{\uvec}`` given by:

```math #OptU
  \Hmat_{u} &= \Pmat^{\top}\SIGMAP^{-1}\Pmat + \lambda^2\A(\mvec_{k})^{\top}\A(\mvec_{k}), \\
  \overline{\uvec} &=  \Hmat_{u}^{-1}(\A(\mvec_{k})^{\top}\q+\Pmat^{\top}\SIGMAP^{-1}\dvec).
```

To draw a sample ``\uvec_{k+1} \sim \mathcal{N}(\overline{\uvec},\Hmat_{u}^{-1})``, we first compute the Cholesky factorization of ``\Hmat_{u} = \Lmat_{u}^{\top}\Lmat_{u}``, where ``\Lmat_{u}`` is an upper triangular matrix. Then we apply ``\Lmat_{u}`` to compute ``\overline{\uvec}`` and ``\uvec_{k+1}`` by:

```math #Compuk1
  \overline{\uvec} &= \Lmat_{u}^{-1}\Lmat_{u}^{-\top}(\A(\mvec_{k})^{\top}\q+\Pmat^{\top}\SIGMAP^{-1}\dvec), \\
  \uvec_{k+1} &= \overline{\uvec} + \Lmat_{u}^{-1}\rvec_{u}, \rvec_{u} \sim \mathcal{N}(0,\mathbf{I}).
```

With ``\uvec_{k+1}``, the conditional PDF ``\rhopost(\mvec|\uvec_{k+1},\dvec)`` ``=\mathcal{N}(\overline{\mvec},\Hmat_{m}^{-1})``. The mean ``\overline{\mvec} = \mvec_{k}-\Hmat_{m}^{-1}\g_{m}`` with gradient and Hessian expressed as:

```math #GandH
  \g_{m}    &= \Gmat^{\top}(\A(\mvec_{k})\uvec_{k+1} - \q) + \SIGMAP^{-1}(\mvec_{k}-\mp), \\
  \Hmat_{m} &= \Gmat^{\top}\Gmat + \SIGMAP^{-1},
```

where ``\Gmat = \frac{\partial \A(\mvec)\uvec_{k+1}}{\partial{\mvec}}`` is the sparse Jacobian matrix. The new sample ``\mvec_{k+1}`` can be computed by:

```math #Sampm
  \mvec_{k+1} = \overline{\mvec} + \Lmat_{m}^{-1}\rvec_{m}, \rvec_{m} \sim \mathcal{N}(0,\mathbf{I}),
```

where ``\Lmat_{m} = \Hmat_{m}^{1/2}``.

\vspace{-0.1cm}
## Numerical experiment
\vspace{-0.2cm}

We apply our proposal method to an inverse problem constrained by the Helmholtz equation. We use single source, single frequency data to invert a 1D gridded squared slowness profile ``m(z) = 1/(v_0 + \alpha z)^2`` for values ``v_0 \, = \, 2000 \, \mathrm{m/s}`` and ``\alpha \, = \, 0.75``. We use frequency increments of ``1 \, \mathrm{Hz}``, a grid spacing of ``50 \, \mathrm{m}``, a maximum offset of ``10000 \, \mathrm{m}``, and a maximum depth of ``5000 \, \mathrm{m}``. Both source and receivers are located at the surface of the model. Gaussian noise with covariance matrix ``\SIGMAN \, = \, \mathbf{I}`` is added to the data. The mean model of the prior distribution is selected to have ``v_0 \, = \, 2000 \, \mathrm{m/s}`` and ``\alpha \, = \, 0.65``. The covariance matrix is set to ``\SIGMAP = 2\times10^{-8}\mathbf{I}``. We use the Gibbs sampling method to generate ``10^6`` samples with ``\lambda \, = \, 10^5``. We compare the true model, the prior mean model, and the posterior mean model in Figure [#fig:Model], and the prior and posterior standard deviations (STD)in Figure [#fig:STD]. The ``90\%`` confidence interval of the posterior distribution is also shown in Figure [#fig:Model] (gray background). Compared to the prior distribution, the posterior mean model has a smaller error in the shallow part but a larger error in the deep part. Meanwhile, the posterior STD in the shallow part has a larger decrease from the prior STD than that in the deep part. Both facts implies that data has a larger influence on the squared slowness in the shallow part than that in the deep part.


### Figure: ModelSTD{#fig:ModelSTD}
![Model comparison](./Fig/ModelBar){#fig:Model width=50%}
![STD comparison](./Fig/STD){#fig:STD width=same}
(a) Comparison of true model (solid), mean model of prior PDF (dotted) and posterior PDF (dashed); (b) Comparison of STDs of prior (dotted) and posterior (dashed) PDFs.

\vspace{-0.1cm}
## Conclusion
\vspace{-0.2cm}
By weakening the wave-equation constraint in a controlled manner, we arrive at a novel formulation of posterior PDF for wave-equation inversion. This new posterior PDF is bi-Gaussian with respect to model parameters and wavefields. Numerical experiment demonstrate that this posterior PDF can be successfully and efficiently sampled by the Gibbs sampling method.

\vspace{-0.1cm}
## Acknowledgements
\vspace{-0.2cm}

This research was carried out as part of the SINBAD project with the support of the member organizations of the SINBAD Consortium.


```math_def
\newcommand{\Pmat}{\mathbf{P}}
\newcommand{\A}{\mathbf{A}}
\newcommand{\q}{\mathbf{q}}
\newcommand{\uvec}{\mathbf{u}}
\newcommand{\post}{\text{post}}
\newcommand{\rhopost}{\rho}
\newcommand{\prior}{\text{prior}}
\newcommand{\noise}{\text{noise}}
\newcommand{\mis}{\text{mis}}
\newcommand{\like}{\text{like}}
\newcommand{\rholike}{\rho}
\newcommand{\rholikepen}{{\rho}_{\text{like}}^{\text{pen}}}
\newcommand{\overrholikepen}{\overline{\rho}_{\text{like}}^{\text{pen}}}
\newcommand{\rhoprior}{\rho}
\newcommand{\vvec}{\mathbf{v}}
\newcommand{\obs}{\text{obs}}
\newcommand{\dvec}{\mathbf{d}}
\newcommand{\xvec}{\mathbf{x}}
\newcommand{\Hmat}{\mathbf{H}}
\newcommand{\Cmat}{\mathbf{C}}
\newcommand{\g}{\mathbf{g}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\dobs}{\mathbf{d}_{\text{obs}}}
\newcommand{\F}{\mathbf{F}}
\newcommand{\SIGMAN}{{\Gamma}_{\text{n}}}
\newcommand{\SIGMANij}{_{i,j}{\Gamma}_{\text{noise}}}
\newcommand{\SIGMAPDE}{{\Gamma}_{\text{pde}}}
\newcommand{\SIGMAP}{{\Gamma}_{\text{p}}}
\newcommand{\SIGMAU}{{\Gamma}_{\uvec}}
\newcommand{\SIGMAS}{{\Gamma}_{\text{src}}}
\newcommand{\SIGMAR}{{\Gamma}_{\text{rcv}}}
\newcommand{\mvec}{\mathbf{m}}
\newcommand{\Gmat}{\mathbf{G}}
\newcommand{\Wmat}{\mathbf{W}}
\newcommand{\Imat}{\mathbf{I}}
\newcommand{\Lmat}{\mathbf{L}}
\newcommand{\Llike}{\mathbf{L}_{\like}}
\newcommand{\Lprior}{\mathbf{L}_{\text{prior}}}
\newcommand{\Lpost}{\mathbf{L}_{\text{post}}}
\newcommand{\Smat}{\mathbf{S}}
\newcommand{\Rmat}{\mathbf{R}}
\newcommand{\Mmat}{\mathbf{M}}
\newcommand{\Jmat}{\mathbf{J}}
\newcommand{\rvec}{\mathbf{r}}
\newcommand{\nsrc}{n_{\text{src}}}
\newcommand{\nrcv}{n_{\text{rcv}}}
\newcommand{\nfreq}{n_{\text{freq}}}
\newcommand{\ngrid}{n_{\text{grid}}}
\newcommand{\ndata}{n_{\text{data}}}
\newcommand{\nsmp}{n_{\text{smp}}}
\newcommand{\mMAP}{\mathbf{m}_{\text{map}}}
\newcommand{\Lmis}{\Lmat_{\text{mis}}}
\newcommand{\nd}{n_{\text{data}}}
\newcommand{\Hp}{\Hmat_{\text{prior}}}
\newcommand{\fred}{f_\text{red}}
\newcommand{\fpen}{f_\text{pen}}
\newcommand{\Hlike}{\Hmat_{\text{like}}}
\newcommand{\Hprior}{\Hmat_{\text{prior}}}
\newcommand{\Hpost}{\Hmat_{\text{post}}}
\renewcommand{\mp}{\tilde{\mvec}}
\renewcommand{\SIGMAPDE}{\Gamma_{\text{PDE}}}
\newcommand{\epssrc}{\epsilon_{\text{src}}}
\newcommand{\epsrcv}{\epsilon_{\text{rcv}}}
```