Fisher-Informed Training of Neural Operators for Reliable PDE Inversion

Jeongjin (Jayjay) Park Grant Bruer Huseyin Tuna Erdinc Richard Rex Nisha Chandramoorthy Felix Herrmann
true image true image true image true image true image true image
Institution logo

Released to public domain under Creative Commons license type BY (https://creativecommons.org/licenses/by/4.0)
Copyright (c) 2025, Felix J. Herrmann (Georgia Tech)

Motivation: Why do we need Fisher-Informed Training?

  • Especially during solving inversion problems?

Preliminary: Neural Operators as surrogates for PDEs

Benchmark PDE, Laminar Flow



  • \(\mathcal{F}\): Solution Operator of the PDE
  • \(\mathcal{F}_{nn}\): Neural Operator (NO)
  • \(\mathbf{a}(x) \sim \pi\): input field drawn from distribution \(\pi\)
  • \(\mathbf{u}(x)\): output solution field

Surrogate offers faster & cheaper optimization

Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\)

  • Requires many solves of the PDE
  • NOs approximate the operator \(\mathcal{F}\), not just a single solution
  • Automatic differentiation (AD) gives fast gradients via backpropagation
  • Once trained, \(\mathcal{F}_{nn}\) supports efficient gradient-based optimization in place of expensive simulations.

Preliminary: Surrogate-based inversion

Schematic Plot for Least Squares Inversion

Equation 1. (Inversion Objective: Maximum Likelihood Estimate)

\[ \mathcal{L}(\mathbf{a}) = \| \mathcal{R}\mathcal{F}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]

Equation 2. (Surrogate-based Inversion)

\[ \mathcal{L}_{nn}(\mathbf{a}) = \| \mathcal{R}\underbrace{\mathcal{F}_{nn}}_{surrogate}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]

  • \(\mathbf{d}_{obs}\): observation (sparse, noisy)
  • \(\mathbf{a}^k\): model iterate at \(k\)th iteration
  • \(\mathcal{R}\): observation operator

Motivation: Problem of Neural Operators

Error in gradient lead to incorrect inversion result even when starting from in-distribution


  • \(\mathbf{a}^\ast\): the true parameter field
  • \(\mathcal{F}(\mathbf{a}^{k-1})\): the predicted observation using the current model iterate
  • \(\mathbf{g}(\mathbf{a}^{k-1})\): the gradient direction used for the update.

So, why does it fail in inversion?

  • In the Figure, we observe \(\mathcal{F}_{nn}(\mathbf{a}^{k-1}) \approx \mathcal{F}(\mathbf{a}^{k-1})\)
  • But gradients at \(\mathbf{a}^{k-1}\) looks very different
    • This is due to \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)
  • We see very different model update \(\mathbf{a}^{k-1}\) as a result.

Motivation: Analyzing the Problem of Neural Operator

Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\) but \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)

Equation 3. (Gradient-descent update with surrogate)

\[\begin{align} \mathbf{a}^{k} &= \mathbf{a}^{k-1} - \eta \: \mathbf{g}_{nn}(\mathbf{a}^{k-1}) \\ \mathbf{g}_{nn}(\mathbf{a}^{k-1}) &= \mathbf{J}_{\mathcal{F}_{nn}}^\top \left( \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}^{k-1}) - \mathbf{d}_{obs} \right) \end{align}\]


Consequence in surrogate-assisted inversion

  • if \(\mathbf{g}_{nn}\) is wrong,
    • then optimization path will drift or diverge
    • converge to physically meaningless minima
  • MSE-FNO: FNO trained with MSE loss function
  • FINO: FNO trained with Fisher-informed training algorithm


Consequence in surrogate-based inversion without learning \(\mathbf{J}_{\mathcal{F}}\)

Motivation: The role of training algorithm



To build NO for realiable inversion 1 2

For a reliable surrogate-based inversion, we need Neural Operator to be

  • \(\mathcal{F}_{nn} \approx \mathcal{F}\)
  • \(\mathbf{J}_{\mathcal{F}_{nn}} \approx \mathbf{J}_{\mathcal{F}}\)



Bottleneck: Standard training algorithm only teaches \(\mathcal{F}_{nn} \approx \mathcal{F}\)

\[ \min_\theta \| \mathcal{F}(\mathbf{a}) - \mathcal{F}_{nn}(\mathbf{a}; \theta)\|^2 \]

  • Optimizes only forward accuracy
  • The minimizer \(\mathcal{F}_{nn}^\ast\) may satisfy \(\mathcal{F}_{nn}(\mathbf{a}) \approx \mathcal{F}(\mathbf{a})\)
  • \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)
  1. O’Leary-Roseberry, Thomas, et al. “Derivative-informed neural operator: an efficient framework for high-dimensional parametric derivative learning.” Journal of Computational Physics 496 (2024): 112555.

  2. Park, Jeongjin, Nicole Yang, and Nisha Chandramoorthy. “When are dynamical systems learned from time series data statistically accurate?.” Advances in Neural Information Processing Systems 37 (2024): 43975-44008.

Objective: Neural Operator for Reliable Inversion



Our Solution

  1. To reduce the computational cost, we regularize training with \(\| \mathbf{J}_{\mathcal{F}_{nn}} \mathbf{v}_i - \mathbf{J}_{\mathcal{F}} \mathbf{v}_i \|\) instead of \(\| \mathbf{J}_{\mathcal{F}_{nn}} - \mathbf{J}_{\mathcal{F}}\|\)
  2. For \(\mathbf{v}\), we select the most informative direction that observation can give on the parameter \(\mathbf{a}\).



But which direction to align with?

We want to encourage the surrogate’s Jacobian, \(\mathbf{J}_{\mathcal{F}_{nn}}\), to align with the true PDE’s Jacobian \(\mathbf{J}_{\mathcal{F}}\),

  • in directions that matter most for inference.
  • That is, directions where the data are most informative about the parameters.

Objective: If we have bad prior information, rely on likelihood

During inversion, we don’t need the full \(\mathbf{J}_\mathcal{F}\). We only need to know \(\mathbf{J}_{\mathcal{F}}\) in certain directions.

Fisher Information 1

For parameters \(\mathbf{a}\) and observation \(\mathbf{y}\) in a probabilistic model \(p(\mathbf{y} \mid \mathbf{a})\), Fisher Information Matrix is

\[\mathcal{I}(\mathbf{a}) = \mathbf{E}_{\mathbf{y} \mid \mathbf{a}} [ \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a}) \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a})^\top]\]

When is it useful?

  • Particularly useful when we have bad prior information
  • By training \(\mathcal{F}_{nn}\) with Fisher-directions, the eigenvectors of Fisher information matrix, we are telling \(\mathcal{F}_{nn}\) to
    • learn gradients correctly,
    • where observations change the most when parameters change

Schematic Plot of Fisher Information
  1. Fisher, Ronald A. “On the mathematical foundations of theoretical statistics.” Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character 222.594-604 (1922): 309-368.

Objective: Neural Operator for Reliable Inversion

The limitation of standard training is the lack of gradient alignment. Our algorithm fixes that.

Our Solution

Equation 4. (Loss Function in Fisher-Informed Training)

\[\mathcal{L}_{\text{FINO}} = \underbrace{\| \mathcal{F} - \mathcal{F}_{nn}\|^2}_{\text{misfit in solution}} + \lambda \underbrace{ \| \mathbf{J}_{\mathcal{F}_{nn}} \mathbf{v}_i - \mathbf{J}_{\mathcal{F}} \mathbf{v}_i \|^2 }_\text{gradient alignment}\]

where \(v_i\) are Eigenvectors of Fisher information matrix, the principal directions of the likelihood curvature.



Sensitivity alignment (cosine similarity)

Values near 1 indicate the surrogate remains in-distribution and produces physically meaningful updates.

Method

Dataset Creation: Sketching Fisher Direction

  • Bottleneck: Computing the \(\mathcal{I}(\mathbf{a})\in\mathbb{R}^{p\times p}\) is infeasible in high-dims.
    • \(p = d^2\), where \(d \times d\) is the dimension of our PDE problem
  • Eigenvector of FIM: Under Gaussian likelihood, \[\mathcal{I}(\mathbf{a})=\mathbf{J}^\top_{R\circ\mathcal{F}}(\mathbf{a})\,\Sigma_y^{-1}\,\mathbf{J}_{R\circ\mathcal{F}}(\mathbf{a}),\]
    • Left singular vector of \(\mathcal{I}^{1/2}(\mathbf{a})\) = eigenvector of \(\mathcal{I}(\mathbf{a})\).
    • The relevant subspace for inversion is the range of \(\mathcal{I}^{1/2}(\mathbf{a}) = \mathbf{J}^\top_{R \circ \mathcal{F}}(\mathbf{a}) \Sigma_y^{-1/2}\) not \(\mathcal{I}(\mathbf{a})\)

Step 1 — Sketch local Fisher subspace 1

Pseudocode

for j = 1..M:
  sample a^(j) ~ N(a0, Σ_pr)
  for k = 1..r:
    ε ~ N(0, Σ_y)
    v = R^T Σ_y^{-1/2} ε
    q = J_F(a^(j))^T v      # VJP via AD pullback
    append q to ˜Q
return ˜Q

Cost Reduction

  • \(r\) adjoint computations. In practice, \(r \ll p\)
  • Each adjoint computation uses only a subset of data misfit, 1 ~ 2% of the field.
  • Overall complexity ↓ ↓
  1. Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.” SIAM review 53.2 (2011): 217-288.

Dataset Creation: Global Fisher Direction

Step 2 — Obtaining one global Fisher Direction

To reduce computation time even more,

\[\bar{\mathcal{I}} \approx \mathbb{E}_{a \sim \pi}[\mathcal{I}(a)],\]

and then compute left singular vectors of \(\bar{\mathcal{I}}^{1/2}(\mathbf{a})\)


Outcome

A single low-rank basis capturing globally informative directions for training & inversion.


\(v_i\) is the computed global Fisher direction

Numerical Experiment

Experiment Setup (inversion)

  • Neural Network: Fourier Neural Operator (FNO) 1
Laminar Flow Darcy
Resolution 64 × 64 128 × 128
# of Fisher Eigenvectors \(r = 400\) \(r = 200\)
Fraction of Obs. Space 9.7 % 1.9 %
# of Iterations (during inversion) 2500 100
Step Size 0.05 0.8
  • MSE-FNO: FNO trained with standard training algorithm, MSE loss function
  • FINO: FNO trained with Fisher-Informed training

For all inversion experiment,

  • we want to assess how much inversion improves, with “Fisher-informed training”.
  • Thus, we train MSE-FNO and FINO so that they have the same test loss (in MSE metric).

\[\nabla_\mathbf{a} \mathcal{L}_{nn}(\mathbf{a}) = \underbrace{\mathbf{J}_{\mathcal{F}_{nn}}(\mathbf{a})^\top}_\text{the part we want to evaluate} \underbrace{(\mathcal{F_{nn}}(\mathbf{a}) - \mathbf{d}_{obs})}_\text{similar performance}\]

  • MSE-FNO and FINO’s Test Loss (in relative L2)
    1. Laminar Flow: \(0.05\)
    2. Darcy: \(0.009\)

This way, we are isolating the impact of “Fisher-informed training”

  1. A Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. M., and Anandkumar, A.Fourier neural operator for parametric partial differential equations.In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.

Objective: Physically-Correct Surrogate-assisted Inversion

Objective 1: Replace \(\mathcal{F}\) with \(\mathcal{F}_{nn}\) in the least-squares objective

\[\mathbf{a} = \arg\min_\mathbf{a} \| \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2\]

Objective 2: Validating physical-accuracy of inversion result

Inversion Scenario 1: Laminar Flow

Experiment Setup

Objective 1: Inversion result

Inversion Result, \(\mathbf{a}\) after 2500 iterations
  • Note that forward prediction is almost the same.
  • Error in the gradient is causing the inaccuracy in the \(\mathbf{a}\) of MSE-FNO.

Objective 1: Model Error in Relative H1


Model Error in Sobolev norm

Relative \(H^1\): \[ \frac{\sqrt{ \| \mathbf{a} - \mathbf{a}^\ast \|_{L^2}^2 + \|\nabla \mathbf{a} - \nabla \mathbf{a}^\ast \|_{L^2}^2 }}{\sqrt{ \| \mathbf{a}^\ast\|_{L^2}^2 + \| \nabla \mathbf{a}^\ast \|_{L^2}^2 }}. \]

  • measures error in values and spatial gradients of the recovered field

Objective 2: Validating Physical-accuracy of inversion results

By assessing gradient, \(\nabla_\mathbf{a} \mathcal{L}\), accuracy at each iteration, we can evaluate if the \(\mathcal{F}_{nn}\) is good for optimization/inversion

  • \(\mathbf{g}_{ns}\) vs \(\mathbf{g}_{nn}\) at surrogate's model iterate
    • in-distribution vs out-of-distribution

Objective 2: Error of \(\mathbf{g}_{nn}\) at surrogate’s model iterate

A high sensitivity alignment (\(\approx 1\)) means the surrogate is operating in-distribution. Its gradients agree with the true simulator, and inversion remains physically consistent.

Cosine similarity of gradient

Relative L2 of gradient error
  • Approximately, break-even point comes after 70 times of inversion experiment.

Obejctive 2: Optimization Trajectory1



  • How model iterates, \(\{\mathbf{a}^k\}_{k=0}^{K}\), evolve in the loss landscape.
    • That is, we are projecting iterates into directions where Numerical Simulator makes the largest parameter updates

Observation:

  1. MSE-FNO does not update parameter much in the directions where Numerical Simulator makes the most update on.
  2. After some iteration, MSE-FNO’s update direction starts to diverge due to inaccurate gradient.
  1. 1. Li, Hao, et al. “Visualizing the loss landscape of neural nets.” Advances in neural information processing systems 31 (2018).

Objective 2: Inversion Result’s Physical-accuracy

Power Spectrum Density of \(\mathbf{a}\)

Inversion Scenario 2: Darcy



Experiment Setup

Dataset

  • Input: log permeability represented as 8 x 8 Karhunen-Loeve (K-L) basis of Gaussian Random Field
    • K-L basis: Principal Component Analysis (PCA) for random fields
    • In this setting, coincides with Fourier basis
  • Observation: pressure field, 2% of data point

Optimization

  • Update coefficients of Fourier modes, not the log permeability itself

Recovered Physical Parameter, Permeability

Darcy inversion Result

Conclusion

Takeaway

With Fisher-Informed training, we made Neural Operator into a reliable tool for gradient-based inversion.

Key Insights

  • Fisher-informed training adds a gradient alignment term along the most informative directions data has to the parameter.

  • We’ve answered two questions through numerical experiments:

    1. FINO can reliably replace \(\mathcal{F}\) in the least-squares inversion
    • by comparing recovered models
    • model Error in \(H^1\)
    1. FINO yields recovered models that are more physically accurate
    • cosine similarity of gradient
    • optimization trajectory
    • spectral analysis of inversion result

Acknowledgement

This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center.