| Jeongjin (Jayjay) Park | Grant Bruer | Huseyin Tuna Erdinc | Richard Rex | Nisha Chandramoorthy | Felix Herrmann |
|
|
Released to public domain under Creative Commons license type BY (https://creativecommons.org/licenses/by/4.0)
Copyright (c) 2025, Felix J. Herrmann (Georgia Tech)
Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\)
\[ \mathcal{L}(\mathbf{a}) = \| \mathcal{R}\mathcal{F}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]
\[ \mathcal{L}_{nn}(\mathbf{a}) = \| \mathcal{R}\underbrace{\mathcal{F}_{nn}}_{surrogate}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2 \]
Key Insight: \(\mathcal{F}_{nn} \approx \mathcal{F}\) but \(\mathbf{J}_{\mathcal{F}_{nn}} \neq \mathbf{J}_{\mathcal{F}}\)
\[\begin{align} \mathbf{a}^{k} &= \mathbf{a}^{k-1} - \eta \: \mathbf{g}_{nn}(\mathbf{a}^{k-1}) \\ \mathbf{g}_{nn}(\mathbf{a}^{k-1}) &= \mathbf{J}_{\mathcal{F}_{nn}}^\top \left( \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}^{k-1}) - \mathbf{d}_{obs} \right) \end{align}\]
MSE-FNO: FNO trained with MSE loss functionFINO: FNO trained with Fisher-informed training algorithmFor a reliable surrogate-based inversion, we need Neural Operator to be
\[ \min_\theta \| \mathcal{F}(\mathbf{a}) - \mathcal{F}_{nn}(\mathbf{a}; \theta)\|^2 \]
We want to encourage the surrogate’s Jacobian, \(\mathbf{J}_{\mathcal{F}_{nn}}\), to align with the true PDE’s Jacobian \(\mathbf{J}_{\mathcal{F}}\),
During inversion, we don’t need the full \(\mathbf{J}_\mathcal{F}\). We only need to know \(\mathbf{J}_{\mathcal{F}}\) in certain directions.
For parameters \(\mathbf{a}\) and observation \(\mathbf{y}\) in a probabilistic model \(p(\mathbf{y} \mid \mathbf{a})\), Fisher Information Matrix is
\[\mathcal{I}(\mathbf{a}) = \mathbf{E}_{\mathbf{y} \mid \mathbf{a}} [ \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a}) \nabla_\mathbf{a} \log p(\mathbf{y} \mid \mathbf{a})^\top]\]
The limitation of standard training is the lack of gradient alignment. Our algorithm fixes that.
\[\mathcal{L}_{\text{FINO}} = \underbrace{\| \mathcal{F} - \mathcal{F}_{nn}\|^2}_{\text{misfit in solution}} + \lambda \underbrace{ \| \mathbf{J}_{\mathcal{F}_{nn}} \mathbf{v}_i - \mathbf{J}_{\mathcal{F}} \mathbf{v}_i \|^2 }_\text{gradient alignment}\]
where \(v_i\) are Eigenvectors of Fisher information matrix, the principal directions of the likelihood curvature.
Values near 1 indicate the surrogate remains in-distribution and produces physically meaningful updates.
Pseudocode
for j = 1..M:
sample a^(j) ~ N(a0, Σ_pr)
for k = 1..r:
ε ~ N(0, Σ_y)
v = R^T Σ_y^{-1/2} ε
q = J_F(a^(j))^T v # VJP via AD pullback
append q to ˜Q
return ˜QCost Reduction
To reduce computation time even more,
\[\bar{\mathcal{I}} \approx \mathbb{E}_{a \sim \pi}[\mathcal{I}(a)],\]
and then compute left singular vectors of \(\bar{\mathcal{I}}^{1/2}(\mathbf{a})\)
A single low-rank basis capturing globally informative directions for training & inversion.
| Laminar Flow | Darcy | |
|---|---|---|
| Resolution | 64 × 64 | 128 × 128 |
| # of Fisher Eigenvectors | \(r = 400\) | \(r = 200\) |
| Fraction of Obs. Space | 9.7 % | 1.9 % |
| # of Iterations (during inversion) | 2500 | 100 |
| Step Size | 0.05 | 0.8 |
MSE-FNO: FNO trained with standard training algorithm, MSE loss functionFINO: FNO trained with Fisher-Informed trainingFor all inversion experiment,
MSE-FNO and FINO so that they have the same test loss (in MSE metric).\[\nabla_\mathbf{a} \mathcal{L}_{nn}(\mathbf{a}) = \underbrace{\mathbf{J}_{\mathcal{F}_{nn}}(\mathbf{a})^\top}_\text{the part we want to evaluate} \underbrace{(\mathcal{F_{nn}}(\mathbf{a}) - \mathbf{d}_{obs})}_\text{similar performance}\]
MSE-FNO and FINO’s Test Loss (in relative L2)
This way, we are isolating the impact of “Fisher-informed training”
\[\mathbf{a} = \arg\min_\mathbf{a} \| \mathcal{R}\,\mathcal{F}_{nn}(\mathbf{a}) - \mathbf{d}_{obs} \|^2_2\]
MSE-FNO.Relative \(H^1\): \[ \frac{\sqrt{ \| \mathbf{a} - \mathbf{a}^\ast \|_{L^2}^2 + \|\nabla \mathbf{a} - \nabla \mathbf{a}^\ast \|_{L^2}^2 }}{\sqrt{ \| \mathbf{a}^\ast\|_{L^2}^2 + \| \nabla \mathbf{a}^\ast \|_{L^2}^2 }}. \]
By assessing gradient, \(\nabla_\mathbf{a} \mathcal{L}\), accuracy at each iteration, we can evaluate if the \(\mathcal{F}_{nn}\) is good for optimization/inversion
surrogate's model iterate
A high sensitivity alignment (\(\approx 1\)) means the surrogate is operating in-distribution. Its gradients agree with the true simulator, and inversion remains physically consistent.
Observation:
MSE-FNO does not update parameter much in the directions where Numerical Simulator makes the most update on.MSE-FNO’s update direction starts to diverge due to inaccurate gradient.Power Spectrum Density of \(\mathbf{a}\)
Darcy inversion Result
With Fisher-Informed training, we made Neural Operator into a reliable tool for gradient-based inversion.
Fisher-informed training adds a gradient alignment term along the most informative directions data has to the parameter.
We’ve answered two questions through numerical experiments:
This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center.