Fisher-Informed Training of Neural Operators for Reliable PDE Inversion

Jeongjin (Jayjay) Park	Grant Bruer	Huseyin Tuna Erdinc	Richard Rex	Nisha Chandramoorthy	Felix Herrmann

Motivation: Why do we need Fisher-Informed Training?

Especially during solving inversion problems?

Preliminary: Neural Operators as surrogates for PDEs

\(\mathcal{F}\): Solution Operator of the PDE
\(\mathcal{F}_{nn}\): Neural Operator (NO)
\(\mathbf{a}(x) \sim \pi\): input field drawn from distribution \(\pi\)
\(\mathbf{u}(x)\): output solution field

Surrogate offers faster & cheaper optimization

Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\)

Requires many solves of the PDE
NOs approximate the operator \(\mathcal{F}\), not just a single solution
Automatic differentiation (AD) gives fast gradients via backpropagation
Once trained, \(\mathcal{F}_{nn}\) supports efficient gradient-based optimization in place of expensive simulations.

Preliminary: Surrogate-based inversion

Schematic Plot for Least Squares Inversion

Equation 1. (Inversion Objective: Maximum Likelihood Estimate)

\[ \mathcal{L}(\mathbf{a}) = \| \mathcal{R}\mathcal{F}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]

Equation 2. (Surrogate-based Inversion)

\[ \mathcal{L}_{nn}(\mathbf{a}) = \| \mathcal{R}\underbrace{\mathcal{F}_{nn}}_{surrogate}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]

\(\mathbf{d}_{obs}\): observation (sparse, noisy)
\(\mathbf{a}^k\): model iterate at \(k\)th iteration
\(\mathcal{R}\): observation operator

Motivation: Problem of Neural Operators

Error in gradient lead to incorrect inversion result *even when starting from in-distribution*

\(\mathbf{a}^\ast\): the true parameter field
\(\mathcal{F}(\mathbf{a}^{k-1})\): the predicted observation using the current model iterate
\(\mathbf{g}(\mathbf{a}^{k-1})\): the gradient direction used for the update.

So, why does it fail in inversion?

In the Figure, we observe \(\mathcal{F}_{nn}(\mathbf{a}^{k-1}) \approx \mathcal{F}(\mathbf{a}^{k-1})\)
But gradients at \(\mathbf{a}^{k-1}\) looks very different
- This is due to \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)
We see very different model update \(\mathbf{a}^{k-1}\) as a result.

Motivation: Analyzing the Problem of Neural Operator

Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\) but \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)

Equation 3. (Gradient-descent update with surrogate)

\[\begin{align} \mathbf{a}^{k} &= \mathbf{a}^{k-1} - \eta \: \mathbf{g}_{nn}(\mathbf{a}^{k-1}) \\ \mathbf{g}_{nn}(\mathbf{a}^{k-1}) &= \mathbf{J}_{\mathcal{F}_{nn}}^\top \left( \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}^{k-1}) - \mathbf{d}_{obs} \right) \end{align}\]

Consequence in surrogate-assisted inversion

if \(\mathbf{g}_{nn}\) is wrong,
- then optimization path will drift or diverge
- converge to physically meaningless minima
MSE-FNO: FNO trained with MSE loss function
FINO: FNO trained with Fisher-informed training algorithm

Consequence in surrogate-based inversion without learning \(\mathbf{J}_{\mathcal{F}}\)

Motivation: The role of training algorithm

To build NO for realiable inversion ¹ ²

For a reliable surrogate-based inversion, we need Neural Operator to be

\(\mathcal{F}_{nn} \approx \mathcal{F}\)
\(\mathbf{J}_{\mathcal{F}_{nn}} \approx \mathbf{J}_{\mathcal{F}}\)

Bottleneck: Standard training algorithm only teaches \(\mathcal{F}_{nn} \approx \mathcal{F}\)

\[ \min_\theta \| \mathcal{F}(\mathbf{a}) - \mathcal{F}_{nn}(\mathbf{a}; \theta)\|^2 \]

Optimizes only forward accuracy
The minimizer \(\mathcal{F}_{nn}^\ast\) may satisfy \(\mathcal{F}_{nn}(\mathbf{a}) \approx \mathcal{F}(\mathbf{a})\)
\(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)

Objective: Neural Operator for Reliable Inversion

Our Solution

To reduce the computational cost, we regularize training with \(\| \mathbf{J}_{\mathcal{F}_{nn}} \mathbf{v}_i - \mathbf{J}_{\mathcal{F}} \mathbf{v}_i \|\) instead of \(\| \mathbf{J}_{\mathcal{F}_{nn}} - \mathbf{J}_{\mathcal{F}}\|\)
For \(\mathbf{v}\), we select the most informative direction that observation can give on the parameter \(\mathbf{a}\).

But which direction to align with?

We want to encourage the surrogate’s Jacobian, \(\mathbf{J}_{\mathcal{F}_{nn}}\), to align with the true PDE’s Jacobian \(\mathbf{J}_{\mathcal{F}}\),

in directions that matter most for inference.
That is, directions where the data are most informative about the parameters.

Objective: If we have bad prior information, rely on likelihood

During inversion, we don’t need the full \(\mathbf{J}_\mathcal{F}\). We only need to know \(\mathbf{J}_{\mathcal{F}}\) in certain directions.

Fisher Information ¹

For parameters \(\mathbf{a}\) and observation \(\mathbf{y}\) in a probabilistic model \(p(\mathbf{y} \mid \mathbf{a})\), Fisher Information Matrix is

\[\mathcal{I}(\mathbf{a}) = \mathbf{E}_{\mathbf{y} \mid \mathbf{a}} [ \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a}) \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a})^\top]\]

When is it useful?

Particularly useful when we have bad prior information
By training \(\mathcal{F}_{nn}\) with Fisher-directions, the eigenvectors of Fisher information matrix, we are telling \(\mathcal{F}_{nn}\) to
- learn gradients correctly,
- where observations change the most when parameters change

Objective: Neural Operator for Reliable Inversion

The limitation of standard training is the lack of gradient alignment. Our algorithm fixes that.

Our Solution

Equation 4. (Loss Function in Fisher-Informed Training)

\[\mathcal{L}_{\text{FINO}} = \underbrace{\| \mathcal{F} - \mathcal{F}_{nn}\|^2}_{\text{misfit in solution}} + \lambda \underbrace{ \| \mathbf{J}_{\mathcal{F}_{nn}} \mathbf{v}_i - \mathbf{J}_{\mathcal{F}} \mathbf{v}_i \|^2 }_\text{gradient alignment}\]

where \(v_i\) are Eigenvectors of Fisher information matrix, the principal directions of the likelihood curvature.

Sensitivity alignment (cosine similarity)

Values near 1 indicate the surrogate remains in-distribution and produces physically meaningful updates.

Method

Dataset Creation: Sketching Fisher Direction

Bottleneck: Computing the \(\mathcal{I}(\mathbf{a})\in\mathbb{R}^{p\times p}\) is infeasible in high-dims.
- \(p = d^2\), where \(d \times d\) is the dimension of our PDE problem
Eigenvector of FIM: Under Gaussian likelihood, \[\mathcal{I}(\mathbf{a})=\mathbf{J}^\top_{R\circ\mathcal{F}}(\mathbf{a})\,\Sigma_y^{-1}\,\mathbf{J}_{R\circ\mathcal{F}}(\mathbf{a}),\]
- Left singular vector of \(\mathcal{I}^{1/2}(\mathbf{a})\) = eigenvector of \(\mathcal{I}(\mathbf{a})\).
- The relevant subspace for inversion is the range of \(\mathcal{I}^{1/2}(\mathbf{a}) = \mathbf{J}^\top_{R \circ \mathcal{F}}(\mathbf{a}) \Sigma_y^{-1/2}\) not \(\mathcal{I}(\mathbf{a})\)

Step 1 — Sketch local Fisher subspace ¹

Pseudocode

for j = 1..M:
  sample a^(j) ~ N(a0, Σ_pr)
  for k = 1..r:
    ε ~ N(0, Σ_y)
    v = R^T Σ_y^{-1/2} ε
    q = J_F(a^(j))^T v      # VJP via AD pullback
    append q to ˜Q
return ˜Q

Cost Reduction

\(r\) adjoint computations. In practice, \(r \ll p\)
Each adjoint computation uses only a subset of data misfit, 1 ~ 2% of the field.
Overall complexity ↓ ↓

Dataset Creation: Global Fisher Direction

Step 2 — Obtaining one global Fisher Direction

To reduce computation time even more,

\[\bar{\mathcal{I}} \approx \mathbb{E}_{a \sim \pi}[\mathcal{I}(a)],\]

and then compute left singular vectors of \(\bar{\mathcal{I}}^{1/2}(\mathbf{a})\)

Outcome

A single low-rank basis capturing globally informative directions for training & inversion.

\(v_i\) is the computed global Fisher direction

Numerical Experiment

Experiment Setup (inversion)

Neural Network: Fourier Neural Operator (FNO) ¹

	Laminar Flow	Darcy
Resolution	64 × 64	128 × 128
# of Fisher Eigenvectors	\(r = 400\)	\(r = 200\)
Fraction of Obs. Space	9.7 %	1.9 %
# of Iterations (during inversion)	2500	100
Step Size	0.05	0.8

MSE-FNO: FNO trained with standard training algorithm, MSE loss function
FINO: FNO trained with Fisher-Informed training

For all inversion experiment,

we want to assess how much inversion improves, with “Fisher-informed training”.
Thus, we train MSE-FNO and FINO so that they have the same test loss (in MSE metric).

\[\nabla_\mathbf{a} \mathcal{L}_{nn}(\mathbf{a}) = \underbrace{\mathbf{J}_{\mathcal{F}_{nn}}(\mathbf{a})^\top}_\text{the part we want to evaluate} \underbrace{(\mathcal{F_{nn}}(\mathbf{a}) - \mathbf{d}_{obs})}_\text{similar performance}\]

MSE-FNO and FINO’s Test Loss (in relative L2)
1. Laminar Flow: \(0.05\)
2. Darcy: \(0.009\)

This way, we are isolating the impact of “Fisher-informed training”

Objective: Physically-Correct Surrogate-assisted Inversion

Objective 1: Replace \(\mathcal{F}\) with \(\mathcal{F}_{nn}\) in the least-squares objective

\[\mathbf{a} = \arg\min_\mathbf{a} \| \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2\]

Objective 2: Validating physical-accuracy of inversion result

Inversion Scenario 1: Laminar Flow

Objective 1: Inversion result

Inversion Result, \(\mathbf{a}\) after 2500 iterations

Note that forward prediction is almost the same.
Error in the gradient is causing the inaccuracy in the \(\mathbf{a}\) of MSE-FNO.

Objective 1: Model Error in Relative H1

Relative \(H^1\): \[ \frac{\sqrt{ \| \mathbf{a} - \mathbf{a}^\ast \|_{L^2}^2 + \|\nabla \mathbf{a} - \nabla \mathbf{a}^\ast \|_{L^2}^2 }}{\sqrt{ \| \mathbf{a}^\ast\|_{L^2}^2 + \| \nabla \mathbf{a}^\ast \|_{L^2}^2 }}. \]

measures error in values and spatial gradients of the recovered field

Objective 2: Validating Physical-accuracy of inversion results

By assessing gradient, \(\nabla_\mathbf{a} \mathcal{L}\), accuracy at each iteration, we can evaluate if the \(\mathcal{F}_{nn}\) is good for optimization/inversion

\(\mathbf{g}_{ns}\) vs \(\mathbf{g}_{nn}\) at surrogate's model iterate
- in-distribution vs out-of-distribution

Objective 2: Error of \(\mathbf{g}_{nn}\) at surrogate’s model iterate

A high sensitivity alignment (\(\approx 1\)) means the surrogate is operating in-distribution. Its gradients agree with the true simulator, and inversion remains physically consistent.

Approximately, break-even point comes after 70 times of inversion experiment.

Obejctive 2: Optimization Trajectory¹

How model iterates, \(\{\mathbf{a}^k\}_{k=0}^{K}\), evolve in the loss landscape.
- That is, we are projecting iterates into directions where Numerical Simulator makes the largest parameter updates

Observation:

MSE-FNO does not update parameter much in the directions where Numerical Simulator makes the most update on.
After some iteration, MSE-FNO’s update direction starts to diverge due to inaccurate gradient.

Objective 2: Inversion Result’s Physical-accuracy

Power Spectrum Density of \(\mathbf{a}\)

Inversion Scenario 2: Darcy

Dataset

Input: log permeability represented as 8 x 8 Karhunen-Loeve (K-L) basis of Gaussian Random Field
- K-L basis: Principal Component Analysis (PCA) for random fields
- In this setting, coincides with Fourier basis
Observation: pressure field, 2% of data point

Optimization

Update coefficients of Fourier modes, not the log permeability itself

Recovered Physical Parameter, Permeability

Darcy inversion Result

Conclusion

Takeaway

With Fisher-Informed training, we made Neural Operator into a reliable tool for gradient-based inversion.

Key Insights

Fisher-informed training adds a gradient alignment term along the most informative directions data has to the parameter.
We’ve answered two questions through numerical experiments:
1. FINO can reliably replace \(\mathcal{F}\) in the least-squares inversion
- by comparing recovered models
- model Error in \(H^1\)
1. FINO yields recovered models that are more physically accurate
- cosine similarity of gradient
- optimization trajectory
- spectral analysis of inversion result

Acknowledgement

This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center.

Fisher-Informed Training of Neural Operators for Reliable PDE Inversion

Motivation: Why do we need Fisher-Informed Training?

Preliminary: Neural Operators as surrogates for PDEs

Surrogate offers faster & cheaper optimization

Preliminary: Surrogate-based inversion

Equation 1. (Inversion Objective: Maximum Likelihood Estimate)

Equation 2. (Surrogate-based Inversion)

Motivation: Problem of Neural Operators

So, why does it fail in inversion?

Motivation: Analyzing the Problem of Neural Operator

Equation 3. (Gradient-descent update with surrogate)

Consequence in surrogate-assisted inversion

Motivation: The role of training algorithm

To build NO for realiable inversion 1 2

Bottleneck: Standard training algorithm only teaches \(\mathcal{F}_{nn} \approx \mathcal{F}\)

Objective: Neural Operator for Reliable Inversion

Our Solution

But which direction to align with?

Objective: If we have bad prior information, rely on likelihood

Fisher Information 1

When is it useful?

Objective: Neural Operator for Reliable Inversion

Our Solution

Equation 4. (Loss Function in Fisher-Informed Training)

Method

Dataset Creation: Sketching Fisher Direction

Step 1 — Sketch local Fisher subspace 1

Dataset Creation: Global Fisher Direction

Step 2 — Obtaining one global Fisher Direction

Outcome

Numerical Experiment

Experiment Setup (inversion)

Objective: Physically-Correct Surrogate-assisted Inversion

Objective 1: Replace \(\mathcal{F}\) with \(\mathcal{F}_{nn}\) in the least-squares objective

Objective 2: Validating physical-accuracy of inversion result

Inversion Scenario 1: Laminar Flow

Objective 1: Inversion result

Objective 1: Model Error in Relative H1

Objective 2: Validating Physical-accuracy of inversion results

Objective 2: Error of \(\mathbf{g}_{nn}\) at surrogate’s model iterate

Obejctive 2: Optimization Trajectory1

Objective 2: Inversion Result’s Physical-accuracy

Inversion Scenario 2: Darcy

Dataset

Optimization

Recovered Physical Parameter, Permeability

Conclusion

Takeaway

Key Insights

Acknowledgement

To build NO for realiable inversion ¹ ²

Fisher Information ¹

Step 1 — Sketch local Fisher subspace ¹

Obejctive 2: Optimization Trajectory¹