Velocity Model Building with Jacobian-Informed Neural Operators

Jeongjin (Jayjay) Park Huseyin Tuna Erdinc Felix Herrmann
true image true image true image
Institution logo

Released to public domain under Creative Commons license type BY (https://creativecommons.org/licenses/by/4.0)
Copyright (c) 2024, Felix J. Herrmann (Georgia Tech)

Motivation: Why Neural Operators for LS Inversion? 1


Forward Map

Classical workflows (MVA, FWI)

  • repeated PDE solves, adjoint-state gradients
  • becomes very expensive when exploring multiple background models
  • \(\mathbf{m}\): ground-truth velocity model
  • \(\mathbf{m}_0\): background velocity model
  • \(\delta \mathbf{m}_{RTM}\): RTM at the background model
  • \(\mathcal{G}(\mathbf{m}, \mathbf{m}_0) = \delta \mathbf{m}_{RTM}\): RTM (Migration) operator
  1. Ma, Xiao, and Tariq Alkhalifah. “Velocity model building from seismic images using a Convolutional Neural Operator.” arXiv preprint arXiv:2509.20238 (2025).

Motivation: Neural Operator as an amortized neural surrogate


LS Inversion with \(\mathcal{G}_{nn}\)

Definition 1. (Neural operator for RTM imaging)

A learned operator \[\mathcal{G}_{nn}(\mathbf{m},\mathbf{m}_0) \approx \mathcal{G}(\mathbf{m}, \mathbf{m}_0)\]

predicts the RTM image for any \(\mathbf{m}_0\)

Effect

  • Near-zero cost per forward RTM prediction
  • Fast gradients through \(\mathcal{G}_{nn}\) via AD
  • this enables fast, scalable inversion
  • Our training goal: Amortized across background models

Least-Squares Inverse Problem: Setup

Schematic plot for LS

Equation 1. (Inversion objective)

\[\mathcal{L}(\mathbf{m},\mathbf{m}_0) = \|\mathcal{G}_{nn}(\mathbf{m}, \mathbf{m}_0) - \delta \mathbf{m}_{RTM}\|^2_2\]

Equation 2. (Inversion result as a minimizer)

\[\hat{\mathbf{m}} = \arg\min_\mathbf{m} \mathcal{L}(\mathbf{m}, \mathbf{m}_0)\]

Equation 3. (Gradient Descent / Optimization update)

\[\mathbf{m}^{k+1} = \mathbf{m}^k - \eta \nabla_\mathbf{m} \mathcal{L}(\mathbf{m}^k)\]

Neural Operator Training

  • Training \(\mathcal{G}_{nn}\) as an “amortized” RTM operator

Input function space and Training \(\mathcal{G}_{\text{nn}}\)

To learn a two-argument RTM operator
\[ \mathcal{G}_{\text{nn}} : \mathcal{X} \times \mathcal{B} \rightarrow \mathcal{Y}, \] we require training data that samples the input function space.


Probability model for training pairs

In practice, drawing training instances, \((\mathbf{m}, \mathbf{m}_0) \sim \mu,\) mean

  • \(\mathbf{m} \in \mathcal{X}\): samples variability in true geologic models
  • \(\mathbf{m}_0 \in \mathcal{B}\): samples variability in background (kinematic) models

This matches operator-learning theory, which guarantees

Definition 4. (Amortized RTM Operator)

\[ \mathbb{E}_{(m,m_0)\sim\mu} \Big[ \|\mathcal{G}_{\text{nn}}(m,m_0) - \delta m_{RTM} \| \Big] \]


Example list of \(\mathbf{m}_0\) for a given \(\mathbf{m}\)

\(\mathcal{G}_{\text{nn}}\): an Amortized RTM Operator

Trained on \((\mathbf{m},\mathbf{m}_0)\sim\mu\), the neural operator learns


  • how the pair \((\mathbf{m}, \mathbf{m}_0)\) determines the RTM image
  • not to rely on a single fixed background model
  • changes in traveltime due to low-wavenumber variations on \(\mathbf{m}_0\)

This makes \(\mathcal{G}_{\text{nn}}\) an amortized RTM operator, meaning it generalizes across the space of backgrounds \(\mathcal{B}\).

Result: Strong generalization to unseen background models during inversion!

Dataset Creation

  • To fulfill our training objective, we need to carefully design dataset

Creating Dataset for Amortized Neural Operator

Definition 5. (Dataset for Amortized Neural Operator)

\[ \mathcal{D} = \big\{ \big(\mathbf{m}^{(i)},\, \mathbf{m}_{0}^{(i,s)},\, \delta \mathbf{m}_{\mathrm{RTM}}^{(i,s)}\big) \big\}_{i=1,\dots,N;\; s=1,\dots,10} \]

  • \(i\): index for ground-truth velocity model
  • \(s\): index for background model
  • \(\mathbf{m}^{(i)}\): ground-truth velocity model
  • \(\mathbf{m}_{0}^{(i,s)}\): background model samples
  • \(\delta \mathbf{m}_{\mathrm{RTM}}^{(i,s)} = \mathcal{G}(\mathbf{m}^{(i)}, \mathbf{m}_{0}^{(i,s)})\): RTM image at that background


For each true model \(\mathbf{m}(x,z)^{(i)}\), we construct multiple background models by smoothing slowness field in depth and time-domain.

Example list of \(\mathbf{m}_0\) for a given \(\mathbf{m}\)

Algorithm for Dataset Creation



Inducing Variability in the \(\mathbf{m}_0\)

  • Two-way travel time variability (induced by smoother/faster backgrounds)
  • Two building blocks:
    • T: smoothing in time
    • D: smoothing in depth
  • We randomly convolve in T and D for multiple times
  • Thus, it shifts, stretches, and focuses RTM events



Variability in the \(\mathbf{m}_0\) in vertical trace

Building Block 1: Smoothing in Depth

Building Block 1: Smoothing in Depth


Equation 4. (Depth-domain smoothing)

\[ s(x,z)= \left(S_{\sigma_x,\sigma_z} * s\right)(x,z) \]


  • \(\mathbf{m}(x,z)\): true velocity model in km/s
  • \(\mathbf{s}(x,z)\): \(1 / \mathbf{m}(x,z)\), slowness
  • \(S_\alpha\): Gaussian smoothing operator with kernel width \(\alpha\)

But smoothing in time requires more steps. We first need to convert from depth to time coordinate.

Building Block 2-1: Mapping between Depth and Time

  • The slowness model, \(\mathbf{s}(x, z)\), is defined in a depth grid.
  • Using travel time curve samples \((z_j, t_j)\), we know
    1. \(t(z): z \mapsto t\)
    2. \(z(t): t \mapsto z\) using LinearInterpolation


Equation 5. (Two-way travel time)

Given depth samples \(z_j = jh\) \[t_j(x)=2\sum_{k=1}^{j} \frac{h}{1000}\, s(x,z_k)\]

This defines discrete pairs \((z_j, t_j)\) that can be interpolated



  • \(h\): depth grid unit in meters
  • \(j\): index of depth

Building Block 2-2: Smoothing in Time

Building Block 2: Smoothing in Time


Equation 6. (Depth-to-time conversion)

\[\text{Using} \: t(z): z \mapsto t, \: \text{obtain} \: \mathbf{s}^{\text{time}}(x,t_n).\]

Equation 7. (Time-domain smoothing)

\[ s^\text{time}(x,t)= \left(S_{\sigma_x,\sigma_z} * s^\text{time}\right)(x,t) \]

Equation 8. (Time-to-depth conversion)

\[\text{Using} \: z(t): t \mapsto z, \: \text{obtain} \: \mathbf{s}(x,z_j)\]

Background models: diverse variations in travel time

Variability in the \(\mathbf{m}_0\)

Background models: \(\{\mathbf{m}_0^{(i,s)}\}_{s=1}^{10}\) for a given \(\mathbf{m}^{(i)}\)

Velocity model and Migrated models

RTM variation: \(\delta \mathbf{m}_{\mathrm{RTM}}^{(i,s)}\)

RTM variations

Numerical Experiment (Preliminary Result)

Can we replace \(\mathcal{G}\) with \(\mathcal{G}_{nn}\) in the least-squares inversion?

Fourier Neural Operator (FNO)1

Schematic Plot for architecture of Fourier Neural Operator
  • Convolution Operator
    • Efficient as convolution in frequency domain - multiplication
    • learning weights, \(R\), in frequency domain
  • Additional Linear Transformation of Input
    • To keep track of locational information and boundary information
  1. A Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. M., and Anandkumar, A.Fourier neural operator for parametric partial differential equations.In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.

Testing Accuracy in Forward Prediction

Example Sample ground-truth velocity and background velocity (\(\mathbf{m}_0\): TTD \(\sigma_x = 122.8\))

Forward prediction result

Forward Prediction Trace Plot

RTM Veritcal Trace

Inversion Setup

Unseen background model
  • We test it on very simple case by smoothing the ground-truth model

Ground-truth model
  • Iterations: 150
  • Sample: unseen background model

Inversion Setup

Loss objective decay plot
  • To better understand how optimization is working, we evaluate how model iterate evolves
    • at 1st iteration
    • at 80th iteration

Inversion: RTM Prediction

RTM prediction at iteration 1

Residual at iteration 1

RTM prediction at iteration 80

Residual at iteration 80

Inversion: RTM Trace Comparison

Trace of Model iterate at the 1st iteration

Trace of model iterate at the 80th iteration

Inversion: Model iterate and its gradient

Model after 1st iteration

Gradient evaluated at 1st model iterate

Model after 80th iteration

Gradient evaluated at 80th iteration

Inversion: Recovered Model


Groundtruth Model


Model after 150th iteration

Next step: Fisher-Informed Neural Operator (FINO)

When MSE-FNO fails in inversion

  • When we create \(\mathbf{m}_0\) by smoothing the groundtruth, \(\mathbf{m}\) the inversion works well
  • When smoothing happens in slowness, \(\mathbf{s}\), then the relationship between groundtruth and the background model becomes highly nonlinear, and learning gradient correctly becomes challenging
  • With FINO framework, such limitation can be overcome, by explicitly teaching the derivative information important for inversion.

Acknowledgement

This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center.