---
title: Wave-equation based inversion with amortized variational Bayesian inference
author: |
    Ali Siahkoohi^1^,
    Rafael Orozco^1^,
    Gabrio Rizzuti^2^,
    and Felix J. Herrmann^1^ \
    ^1^School of Computational Science and Engineering,
    Georgia Institute of Technology\
    ^2^Department of Mathematics, Utrecht University
bibliography:
    - abstract.bib
---

## Abstract:

Solving inverse problems involving measurement noise and modeling errors
requires regularization in order to avoid data overfit. Geophysical
inverse problems, in which the Earth's highly heterogeneous structure is
unknown, present a challenge in encoding prior knowledge through
analytical expressions. Our main contribution is a
generative-model-based regularization approach, robust to
out-of-distribution data, which exploits the prior knowledge embedded in
existing data and model pairs. Utilizing an amortized variational
inference objective, a conditional normalizing flow (NF) is pretrained
on pairs of low- and high-fidelity migrated images in order to achieve a
low-fidelity approximation to the seismic imaging posterior distribution
for previously unseen data. The NF is used after pretraining to
reparameterize the unknown seismic image in an inversion scheme
involving physics-guided data misfit and a Gaussian prior on the NF
latent variable. Solving this optimization problem with respect to the
latent variable enables us to leverage the benefits of data-driven
conditional priors whilst being informed by physics and data. The
numerical experiments demonstrate that the proposed inversion scheme
produces seismic images with limited artifacts when dealing with noisy
and out-of-distribution data.

## Introduction

An inverse problem involves reliably estimating an unknown quantity from
noisy indirect observations. This problem is commonly solved using
optimization techniques to minimize the difference between predicted and
observed data. Solely minimizing the data misfit negatively impacts the
quality of the obtained solution due to noise in the data, modeling
errors, and a nontrivial null-space of the forward operator
[@tarantola2005inverse]. To prevent this, it is crucial to capture and
incorporate prior knowledge into the inverse problem
[@tarantola2005inverse], e.g., Gaussian or Laplace distribution priors
[@malinverno2004expanded; @MartinMcMC2012; @herrmann11GPelsqIm]. While
theoretically understood, these type of priors may lead to undesirable
biases in the outcome of inversion.

The purpose of our contribution is to address this challenge by
utilizing a formulation that exploits a data-driven conditional prior.
To achieve this, following @orozco2021photoacoustic\, we train a
conditional normalizing flow [NF, @kruse2021hint] to capture the
conditional distribution of the unknown, given data, i.e., the posterior
distribution. The training involves minimizing an amortized variational
inference objective [@jordan1999introduction; @siahkoohi2020ABIpto;
@siahkoohi2021Seglbe; @kruse2021hint; @kovachki2021conditional] using
existing training pairs in the form of low-fidelity data and model
pairs. After training, we are able to capture the low-fidelity posterior
distribution for previously unseen seismic data. We use the this network
to reparameterize the unknown in an inversion scheme, involving
physics-guided data misfit and a Gaussian prior on the NF latent
variable. Due to the inherent invertibility of NFs, they can represent
any model in the unknown space, which allows them to be be used as
priors when dealing with out-of-distribution data [@pmlr-v119-asim20a;
@orozco2021photoacoustic].

There are three key advantages to our proposed method: (1) Data-driven
priors make use of available data, such as high-resolution seismic
images to capture prior knowledge about the Earth's subsurface; (2) The
use of a conditional prior favors solutions that are consistent with the
data, which provides more specific knowledge about the unknown; (3) With
the help of our formulation, we combine data-driven priors with
conventional physics-based inversion methods, which offers the
advantages of data-driven priors without relying solely on them as a
black box.

In the following sections, we discuss conditional NFs, trained using an
amortized variational inference procedure. Next, we present an inversion
scheme for seismic imaging that incorporates conditional priors. We
conclude by demonstrating this technique on a realistic seismic imaging
problem involving noisy and out-of-distribution data.

## Amortized variational inference

The problem setup entails applying variational inference
[@jordan1999introduction] to approximate the posterior distribution
[@tarantola2005inverse] associated with the inverse problem ``\B{y} = F
(\B{x}) + \boldsymbol{\epsilon}``\, where ``\B{y} \in Y`` represents the
observed data, ``\B{x} \in X`` unknown model, ``\boldsymbol{\epsilon}``
possibly non-Gaussian noise, and ``F: X \rightarrow Y`` the possibly
nonlinear forward operator. In the context of amortized variational
inference [@kruse2021hint], we wish to approximate the posterior
distribution associated with this inverse problem for previously unseen
data. This method has computational advantages as it does not require
solving an additional instance of variational inference for new data. In
this work, we choose NFs [@rezende2015variational] that due to their
invertibility (up to numerical precision) can be used to approximate a
target distribution, of which we have only samples. NFs can be adapted
to sample from the conditional distribution ``p(\B{x} \mid \B{y})`` by
using a block-triangular construction [@marzouk2016sampling],
``T_{\R{w}} (\B{y}, \B{x}) =(T_{\R{w}_1} (\B{y}), T_{\R{w}_2} (\B{y},
\B{x}))``  with ``\B{w} = (\B{w}_1, \B{w}_2)``. The conditional NF
``T_{\R{w}} : Y \times X \rightarrow Z \times Z``\, which takes as input
data and model pairs ``(\B{y}, \B{x})``\, aims to yield two normally
distributed outputs in the latent space ``Z \times Z``. Training
objective is based on minimizing the Kullback-Leibler divergence between
the NF output distribution and the Gaussian latent distribution
[@kruse2021hint]:

```math #NF-training-cond
\mathop{\rm arg\,min}_{\B{w}}\, \frac{1}{n}
    \sum_{i=1}^n \Bigg[ \frac{1}{2}\big\|
    T_{\R{w}} (\B{y}^{(i)}, \B{x}^{(i)}) \big\|^2_2 -
    \log \Big | \det \nabla_{(\R{y}, \R{x})} T_{\R{w}} (\B{y}^{(i)},
    \B{x}^{(i)}) \Big |\Bigg].
```

In the above objective, the ``\ell_2``-norm follows from a Gaussian
assumption on the latent variables and the second term is a
regularization term that avoids ``T_{\R{w}}`` from converging to trivial
solutions---e.g., ``T_{\R{w}} := \B{0}``. Computing ``\det
\nabla_{(\R{y}, \R{x})} T_{\R{w}} (\B{y}, \B{x})`` and its gradient adds
almost no extra cost because of the particular design of invertible
networks [@dinh2016density]. Following training, we can obtain samples
from conditional distribution ``p(\B{x} \mid \B{y})`` via
``T_{\R{w}_2}^{-1} \big( T_{\R{w}_1} (\B{y}), \B{z} \big) \sim p(\B{x}
\mid \B{y}), \ \B{z} \sim p_{\R{z}}(\B{z})`` [@marzouk2016sampling;
@kruse2021hint]. This amounts to feeding the latent code associated with
observed data, i.e.,  ``T_{\R{w}_1} (\B{y})``,  and Gaussian samples
``\B{z} \sim p_{\R{z}}(\B{z})`` into the inverse network,
``T_{\R{w}_2}^{-1}``. These samples may be used for Bayesian inference
if we have an ideal training dataset [@herrmann2019NIPSliwcuc;
@siahkoohi2021Seglbe; @kumar2021enabling; @kothari2021trumpets].
However, such an assumption is rarely correct in geophysical
applications due to Earth's strong heterogeneity
[@siahkoohi2019transfer; @sun2020elastic; @siahkoohi2021deep], which
highlights the importance of devising formulations that are robust to
changes in data distribution during inference.

## Seismic imaging with data-driven conditional priors

Using multiple processed shot records, ``\left \{\B{d}_{i}\right
\}_{i=1}^{n_s}``, seismic imaging aims to estimate the short-wavelength
component of the Earth's subsurface squared-slowness model, denoted by
``\delta \B{m}``. The linearized Born scattering operator
``\B{J}(\B{m}_0, \B{q}_i)`` relates the unknown seismic image ``\delta
\B{m}^{\ast}``, to data, the ``i\text{th}`` source signature,
``\B{q}_{i}``, and the background squared-slowness model ``\B{m}_0``.
This relation can be written as

```math #linear-fwd-op
\B{d}_i = \B{J}(\B{m}_0, \B{q}_i)
    \delta \B{m}^{\ast} + \boldsymbol{\epsilon}_i, \quad
    \boldsymbol{\epsilon}_i \sim \mathrm{N} (\B{0}, \sigma^2
    \B{I}).
```

Noise is denoted by ``\boldsymbol{\epsilon}_i`` and represents
measurement noise and linearization errors, which for simplicity is
assumed to be distributed according to a zero-centered Gaussian
distribution with known covariance ``\sigma^2 \B{I}``. We train a NF on
pairs of low- and high-fidelity seismic images via the amortized
variational inference objective by choosing ``\B{x}`` to represent
high-fidelity migrated images and ``\B{y} := \delta\B{m}_{\text{RTM}}``
corresponding to low-fidelity reverse-time migrated images obtained by
the process of demigration, followed by adding noise and migration.
After training, the conditional NF captures the low-fidelity seismic
imaging posterior distribution. In order to obtain a high-fidelity
seismic image maximum a posteriori (MAP) estimate, we propose to
reparameterize ``\delta \B{m}`` vis the pretrained NF and solve the
optimization problem

```math #imaging-opt-nf
\widehat{\B{z}} = \argmin_{\B{z}}\,  \frac{1}{2 \sigma^2}
    \Bigg[ \sum_{i=1}^{n_s}\big \|
    \B{d}_i- \B{J}(\B{m}_0, \B{q}_i)
    T_{\R{w}_2}^{-1} \big( T_{\R{w}_1} (\delta\B{m}_{\text{RTM}}), \B{z} \big) \big \|_2^2 \Bigg] + \frac{1}{2} \big \|
    \B{z} \big \|_2^2,
```

followed by mapping ``\delta\B{m}_{\text{MAP}} := T_{\R{w}_2}^{-1} \big(
T_{\R{w}_1} (\delta\B{m}_{\text{RTM}}), \widehat{\B{z}} \big)``. We
initialize optimization problem #imaging-opt-nf with ``\B{z}_0 =
\B{0}``\. This initialization and a Gaussian prior on ``\B{z}``
regularize the inversion by favoring solutions that are likely samples
of the low-fidelity posterior distribution [@pmlr-v119-asim20a]. NFs'
inherent invertibility allows them to represent any image ``\delta
\B{m}`` in the solution space. This limits the potentially negative bias
of the conditional prior in domains where access to high-fidelity
training data is limited. We demonstrate this through a numerical
experiment in the next section.

<!-- Thus, in case of out-of-distribution
data, the optimization in principle can yield a solution that fits the
data due to the invertibility of ``T_{\R{w}}``\ -->

## Numerical experiments

We propose a realistic example in which we create ``4750`` 2D training
pairs of low- and high-fidelity seismic images, which the latter are
``3075\, \mathrm{m} \times 5120\, \mathrm{m}`` sections extracted from
the shallow part of [Parihaka](https://wiki.seg.org/wiki/Parihaka-3D)
[@WesternGeco2012] prestrack Kirchhoff migration dataset. The
low-fidelity images are obtain by migrating noisy synthetic data
obtained from the high-fidelity images according to Equation
#linear-fwd-op\. The acquisition geometry involves ``102`` shot records,
``204`` fixed receivers, Ricker wavelet with a central frequency of
``30\, \mathrm{Hz}``, and band-limited noise. We augment a ``125\,
\mathrm{m}`` water column on top of these models to limit the near
source imaging artifacts. We train ``T_{\R{w}}`` according to the
objective function in Equation #NF-training-cond with the Adam
optimization algorithm [@kingma2014adam]. In order to evaluate the
effectiveness of our inversion scheme when applied to
out-of-distribution data, we select a 2D section from the deeper
portions of the Parihaka dataset (see Figure #true-2\). As compared to
training images, this image includes more noncontinuous reflectors, due
to low signal-to-noise ratio in the deeper parts of the Parihaka
dataset. We simulated linearized data with the same acquisition geometry
described above to obtain a low-fidelity image (Figure #rtm-2\). We
solve optimization problem #imaging-opt-nf for ``5`` passes over the
shot records, i.e., approximately the same cost as 5 reverse-time
migrations. Figure #sample-2-4 shows the initial guess of the
optimization, i.e., ``T_{\R{w}_2}^{-1} \big( T_{\R{w}_1}
(\delta\B{m}_{\text{RTM}}), \B{z}_0 \big)``. We can see that this image
is not correctly recovering the reflectors, however, it can be
considered a better starting guess than the reverse-time migrated image
(Figure #rtm-2\) due to corrected amplitudes. Finally, Figure #cm-2
shows the MAP estimate, obtained via solving the optimization problem
#imaging-opt-nf\, which successfully reconstructs most of the reflectors
with limited artifacts.

Our example uses [JUDI](https://github.com/slimgroup/JUDI.jl)
[@witteJUDI2019] to construct wave-equation solvers, which utilizes
[Devito](https://www.devitoproject.org/) [@devito-compiler; @devito-api]
as a highly optimized finite difference solver under the hood. The
network architectures are implemented using
[InvertibleNetworks.jl](https://github.com/slimgroup/InvertibleNetworks.jl)
[@invnet], a memory-efficient framework for training invertible nets in
Julia. Sample code to reproduce the results is provided on
[GitHub](https://github.com/slimgroup/ConditionalNFs4Imaging.jl).


### Figure: {#example .wide}
![](figs/MAP/true_model.png){width=45% #true-2}
![](figs/MAP/observed_data.png){width=45% #rtm-2} \
![](figs/MAP/X0.png){width=45% #sample-2-4}
![](figs/MAP/MAP.png){width=45% #cm-2}
:Imaging with conditional NF priors.
(a) High-fidelity image. (b) Reverse-time migrated image. (c) Initial
guess, ``T_{\R{w}_2}^{-1} \big( T_{\R{w}_1} (\delta\B{m}_{\text{RTM}}),
\B{z}_0 \big)``. (d) MAP estimation with conditional NF prior (Equation
#imaging-opt-nf\).

## Conclusions

Considering the Earth's strong heterogeneity, designing regularization
schemes to incorporate prior knowledge for solving ill-posed geophysical
inverse problems is challenging. To address this challenge, we proposed
a regularization scheme that takes advantage of existing data in the
form of low- and high-fidelity seismic images to train a conditional
normalizing flow (NF). This conditional NF approximates the imaging
posterior distribution for previously unseen data. In order to minimize
the impact of data distribution shifts during inference, we
reparameterized the unknown image with the conditional NF and inverted
for a Gaussian latent variable that fits the data. The resulting maximum
a posteriori estimate takes advantage of the data-driven conditional
prior while remaining bound to data and physics. Using numerical
experiments, we demonstrated that this approach yields seismic images
with limited imaging artifacts in the absence of high-fidelity training
data. Further research on quantifying the uncertainty through this
regularization technique is required.

## Acknowledgment

This research was carried out with the support of Georgia Research
Alliance and partners of the ML4Seismic Center.

```math_def
\newcommand{\R}{\mathrm}
\newcommand{\B}{\mathbf}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand*\mystrut[1]{\vrule width0pt height0pt depth#1\relax}
```