Solving multiphysics-based inverse problems with learned surrogates and constraints

Authors

Affiliation

Ziyi Yin*

Georgia Institute of Technology

Rafael Orozco

Georgia Institute of Technology

Mathias Louboutin

Georgia Institute of Technology

Felix J. Herrmann

Georgia Institute of Technology

Affiliation: Georgia Institute of Technology

Address: 756 West Peachtree Street NW, Atlanta, Georgia, USA 30308

Emails:

Ziyi Yin (corresponding author): ziyi.yin@gatech.edu
Rafael Orozco: rorozco@gatech.edu
Mathias Louboutin: mlouboutin3@gatech.edu
Felix J. Herrmann: felix.herrmann@gatech.edu

Abstract

Solving multiphysics-based inverse problems for geological carbon storage monitoring can be challenging when multimodal time-lapse data are expensive to collect and costly to simulate numerically. We overcome these challenges by combining computationally cheap learned surrogates with learned constraints. Not only does this combination lead to vastly improved inversions for the important fluid-flow property, permeability, it also provides a natural platform for inverting multimodal data including well measurements and active-source time-lapse seismic data. By adding a learned constraint, we arrive at a computationally feasible inversion approach that remains accurate. This is accomplished by including a trained deep neural network, known as a normalizing flow, which forces the model iterates to remain in-distribution, thereby safeguarding the accuracy of trained Fourier neural operators that act as surrogates for the computationally expensive multiphase flow simulations involving partial differential equation solves. By means of carefully selected experiments, centered around the problem of geological carbon storage, we demonstrate the efficacy of the proposed constrained optimization method on two different data modalities, namely time-lapse well and time-lapse seismic data. While permeability inversions from both these two modalities have their pluses and minuses, their joint inversion benefits from either, yielding valuable superior permeability inversions and CO₂ plume predictions near, and far away, from the monitoring wells.

Keywords: Fourier neural operators, normalizing flows, multiphysics, deep learning, learned surrogates, learned constraints, inverse problems

Introduction

In this paper, we introduce a novel learned inversion algorithm designed to address inverse problems based on partial differential equations (PDEs). These problems can be represented using the following general form:

\[ \mathbf{d} = \mathcal{H}\circ\mathcal{S}(\mathbf{K}) + \boldsymbol \epsilon. \tag{1}\]

In this expression, the nonlinear operator \(\mathcal{S}\) represents the solution operator of a nonlinear parametric PDE mapping the coefficients \(\mathbf{K}\) to the solution. Given numerical solutions of the PDE, partially observed data, collected in the vector \(\mathbf{d}\), are modeled by compounding the solution operator with the measurement operator, \(\mathcal{H}\), followed by adding the noise term \(\boldsymbol{\epsilon}\) with noise level of \(\sigma\)—i.e., \(\boldsymbol{\epsilon}\sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})\). This problem is quite general and pertinent to various physical applications, including geophysical exploration (Tarantola 1984, 2005), medical imaging (Arridge 1999), and experimental design (Alexanderian 2021).

Without loss of generality, we focus on time-lapse seismic monitoring of geological carbon storage (GCS), which involves underground storage of supercritical CO₂ captured from the atmosphere or from industrial smoke stacks (Furre et al. 2017). We consider GCS in saline aquifers, which involves multiphase flow physics where CO₂ replaces brine in the porous rocks (Nordbotten and Celia 2011). In this context, the PDE solution operator, \(\mathcal{S}\), serves as the multiphase flow simulator, which takes the gridded spatially varying permeability in the reservoir, \(\mathbf{K}\), as input and produces \(n_t\) time-varying CO₂ saturation snapshots, \(\mathbf{c}=[\mathbf{c}_1, \mathbf{c}_2, \cdots,\mathbf{c}_{n_t}]\), as output. The governing equations for the multiphase flow involve Darcy’s and the mass conservation law. Detailed information on the governing equations, initial and boundary conditions, and numerical solution schemes can be found in (Rasmussen et al. 2021) and the references therein. To ensure safety, conformance, and containment of GCS projects, various kinds of time-lapse data are collected to monitor CO₂ plumes. These different data modalities include measurements in wells (Freifeld et al. 2009; Nogues, Nordbotten, and Celia 2011), and the collection of gravity (Nooner et al. 2007; Alnes et al. 2011), electromagnetic (Carcione et al. 2012; Zhdanov et al. 2013), and seismic time-lapse data (Arts et al. 2004; Lumley 2010; Yin et al. 2023) that can be used to follow the plume and invert for reservoir properties such as the permeability, \(\mathbf{K}\). The latter is the property of interest in this exposition.

Overall, solving for the reservoir model parameter, \(\mathbf{K}\), poses significant challenges for two primary reasons:

the forward modeling operator, \(\mathcal{H}\circ\mathcal{S}\), can be ill-posed, resulting in multiple model parameters that fit the observed data equally well. This necessitates the use of regularizers (Golub, Hansen, and O’Leary 1999; Tarantola 2005) in the form of penalties or constraints (Peters, Smithyman, and Herrmann 2019).
The PDE modeling operator \(\mathcal{S}\), and the sensitivity calculations with respect to the model parameters can be computationally expensive for large-scale problems, limiting the efficacy of iterative methods such as gradient-based (D. C. Liu and Nocedal 1989) or Markov chain Monte Carlo (Cowles and Carlin 1996) methods.

To overcome the second challenge, numerous attempts have been made to replace computationally expensive PDE solves with more affordable approximate alternatives (Razavi, Tolson, and Burn 2012; Asher et al. 2015), which include the use of radial basis functions to learn the complex models from few sample points (Powell 1985) or reduced-order modeling where the dimension of the model space is reduced (Schilders, Van der Vorst, and Rommes 2008; K. Lu et al. 2019). More recently, various deep learning techniques have emerged as cheap alternatives to numerical PDE solves (L. Lu, Jin, and Karniadakis 2019; Pestourie et al. 2020; Qian et al. 2020; Karniadakis et al. 2021; Kovachki et al. 2021; Rahman, Ross, and Azizzadenesheli 2022; Hijazi, Freitag, and Landwehr 2023). After incurring initial training costs, these neural operators lead to vastly improved computation of PDE solves. Data-driven methods have also been used successfully to learn coarse-to-fine grid mappings of PDEs solves. Because of their advertised performance on approximating solution operators of the multiphase flow in porous media (G. Wen et al. 2022, 2023; Grady et al. 2023; Philipp A. Witte, Redmond, et al. 2022; Philipp A. Witte, Hewett, et al. 2022; Philipp A. Witte et al. 2023), we will consider Fourier neural operators (FNOs, Z. Li et al. 2020; Kovachki, Lanthaler, and Mishra 2021) in this work even though alternative choices can be made. Once trained, FNOs produce approximate PDE solutions orders of magnitude faster than traditional solvers (Z. Li et al. 2020, 2021; Grady et al. 2023; De Hoop et al. 2022). In addition, Yin et al. (2022), Louboutin et al. (2022) and Louboutin, Yin, et al. (2023) demonstrated that trained FNOs can replace PDE solution operators during inversion. This latest development is especially beneficial to applications such as GCS where trained FNOs can be used in lieu of numerically costly flow simulators (Lie 2019; Rasmussen et al. 2021; Gross and Mazuyer 2021). However, despite their promising results, unconstrained inversion formulations offer little to no guarantees that the model iterates remain within the statistical distribution on which the FNO was trained initially during inversion. As a consequence, FNOs may no longer produce accurate fluid-flow simulations throughout the iterations, which can lead to erroneous inversion results when the errors become too large, possibly rendering surrogate modeling by FNOs ineffective. To avoid this situation, we propose a constrained formulation where a trained normalizing flow (NF, Rezende and Mohamed (2015)) is included as a learned constraint. This added constraint guarantees that the model iterates remain within the desired statistical distribution. Because our approach safeguards the FNO’s accuracy, it allows FNOs to act as reliable low-cost neural surrogates replacing costly fluid-flow simulations and gradient calculations that rely on numerically expensive PDE solves during inversion.

The organization of this paper is as follows: First, we introduce FNOs and explore the possibility of replacing the forward modeling operator with a trained FNO surrogate. Next, NFs are introduced. By means of a motivational example, we demonstrate how these learned generative networks control the prediction error of FNOs by ensuring that the model iterates remain in distribution. Based on this motivational example, we propose our novel method for using trained NFs as a learned constraint to guarantee performance of FNO surrogates during inversion. Through four synthetic experiments related to GCS monitoring, the efficacy of our method will be demonstrated.

Fourier neural operators

There is an extensive literature on training deep neural networks to serve as affordable alternatives to computationally expensive numerical simulators (L. Lu, Jin, and Karniadakis 2019; Karniadakis et al. 2021; Kovachki et al. 2021; Kontolati et al. 2023; Benitez et al. 2023). Without loss of generality, we limit ourselves in this exposition to the training of a special class of neural operators known as Fourier neural operators (FNOs). These FNOs are designed to approximate numerical solution operators of the PDE solution operator, \(\mathcal{S}\), by minimizing the following objective: \[ \underset{\boldsymbol{\theta}}{\operatorname{minimize}} \quad \, \frac{1}{N}\sum_{j=1}^{N}\|\mathcal{S}_{\boldsymbol{\theta}}(\mathbf{K}^{(j)})-\mathbf{c}^{(j)}\|_2^2\quad\text{where}\quad\mathbf{c}^{(j)}=\mathcal{S}(\mathbf{K}^{(j)}). \tag{2}\]

Here, \(\mathcal{S}_{\boldsymbol{\theta}}\) denotes the FNO with network weights \({\boldsymbol{\theta}}\). The optimization aims to minimize the \(\ell_2\) misfit between numerically simulated PDE solutions, \(\mathbf{c}^{(j)}\), and solutions approximated by the FNO, across \(N\) training samples (permeability models), \(\{\mathbf{K}^{(j)}\}_{j=1}^N\) compiled by domain experts. Once trained, FNOs can generate approximate PDE solutions for unseen model parameters orders of magnitude faster than numerical simulations (Grady et al. 2023; De Hoop et al. 2022). For model parameters that fall within the distribution used to train, approximation by FNOs are reasonably accurate—i.e., \(\mathcal{S}_{\boldsymbol\theta^\ast}(\mathbf{K})\approx\mathcal{S}(\mathbf{K})\), with \(\boldsymbol\theta^\ast\) being the minimizer of Equation 2. We refer to the numerical examples section for details calculating these weights. Before studying the impact of applying these surrogates on samples for the permeability that are out of distribution, let us first consider an example where data is inverted using surrogate modeling.

Inversion with learned surrogates

Replacing PDE solutions by approximate solutions yielded by trained FNO surrogates has two main advantages when solving inverse problems. First, as mentioned earlier, FNOs are orders of magnitude faster than numerical PDE solves, which allows for many simulations at negligible costs (Chandra et al. 2020; Lan, Li, and Shahbaba 2022). Second, existing softwate for multiphase flow simulations may not always support computationally efficient calculations of sensitivity, e.g. via adjoint-state calculations (Cao et al. 2003; Plessix 2006; Jansen 2011) of the simulations with respect to their input. In such cases, FNO surrogates are favorable because automatic differentiation on the trained network (Griewank et al. 1989; Louboutin et al. 2022; Yin et al. 2022; Yang et al. 2023; Louboutin, Yin, et al. 2023) readily provides access to gradients with respect to model parameters. As a result, the PDE solver, \(\mathcal{S}\), in Equation 1 can be replaced by trained surrogate, \(\mathcal{S}_{\boldsymbol\theta^\ast}\)—i.e., we have

\[ \underset{\mathbf{K}}{\operatorname{minimize}} \quad \|\mathbf{d} - \mathcal{H}\circ\mathcal{S}_{\boldsymbol\theta^\ast}(\mathbf{K})\|_2^2 \tag{3}\]

where \(\boldsymbol\theta^\ast\) represent the optimized weights minimizing Equation 2. While the above formulation in terms of trained surrogates has been applied successfully during permeability inversion from time-lapse seismic data (D. Li et al. 2020; Yin et al. 2022; Louboutin, Yin, et al. 2023), this type of inversion is only valid as long as the (intermediate) permeabilities remain within distribution during the inversion. Practically, this means two things. First, the data need to be in the range of permeability models that are in distribution. This means that there can not be too much noise neither can the observed data be the result of an out-of-distribution permeability. Second, there are no guarantees that the permeability model iterates remain in distribution during inversion even though some bias of the gradients of the surrogate towards in-distribution permeabilities may be expected. To overcome this challenge, we propose to add a learned constraint to Equation 3 that offers guarantees that the model iterates remain in distribution.

Learned constraints with normalizing flows

As demonstrated by Peters and Herrmann (2017), Esser et al. (2018), Peters, Smithyman, and Herrmann (2019) regularization of non-linear inverse problems, such as full-waveform inversion, with constraints, e.g., total-variation (Esser et al. 2018) or transform-domain sparsity with \(\ell_1\)-norms (X. Li et al. 2012), offers distinct advantages over regularizations based adding these norms as penalties. Even though constraint and penalty formulations are equivalent for linear inverse problems for the appropriate choice of the Lagrange multiplier, minimizing the constraint formulation leads to completely different solution paths compared to adding a penalty term to the data misfit objective (Hennenfent et al. 2008). In the constrained formulation, the model iterates remain at all times within the constraint set while model iterates yielded by the penalty formulation does not offer these guarantees. Peters and Herrmann (2017) demonstrated this importance difference for the non-convex problem of full-waveform inversion. For this problem, it proved essential to work with a homotopy where the intersection of multiple handcrafted constraints (intersection of box and size of total-variation-norm ball constraints) are relaxed slowly during the inversion, so the model iterates remain physically feasible and local minima are avoided.

Motivated by these results, we propose a similar approach but now for “data-driven” learned constraints based on normalizing flows (NFs, Rezende and Mohamed (2015)). NFs are powerful deep generative neural networks capable of learning to generate samples from complex distributions (Dinh, Sohl-Dickstein, and Bengio 2016; Siahkoohi et al. 2023b; Orozco, Louboutin, et al. 2023; Louboutin, Yin, et al. 2023). Designed to be invertible, these NFs require the latent and model spaces to share identical dimensions, which confers several advantages:

unlike variational autoencoders (Kingma and Welling 2013) or generative adversarial networks (GANs, Goodfellow et al. 2014), which both have a lower-dimensional latent space, NFs do not impose any intrinsic dimensionality constraints. This flexibility lets NFs capture model space characteristics across high dimensions (Kobyzev, Prince, and Brubaker 2020). Relevantly, concurrent literature has delved into the intrinsic dimensionality of NFs, indicating the potential to using NFs to generate models with inherently lower dimensions (Horvat and Pfister 2022).
NFs’ inherent invertibility negates the need to store state variables during gradient calculations, enabling memory-efficient training and inversion in large-scale 3D applications, such as in geophysics (Zhang and Curtis 2020, 2021; Siahkoohi et al. 2021, 2023b; Siahkoohi and Herrmann 2021; Zhao, Curtis, and Zhang 2021; Lensink, Peters, and Haber 2022) and ultrasound imaging (Orozco et al. 2021, 2023; Orozco, Louboutin, and Herrmann 2022; Orozco, Louboutin, et al. 2023; Orozco, Siahkoohi, Louboutin, et al. 2023; J. Wen, Ahmad, and Schniter 2023).
because of their invertibility NFs guarantee unique latent codes for all model space samples, including out-of-distribution ones. Therefore, they can still be used to invert for out-of-distribution model parameters, while other methods like GANs may introduce bias (Asim et al. 2020).

Aside from being invertible, NFs are trained to map samples from a target distribution in the physical space to samples from the standard zero-mean Gaussian distribution noise in the latent space. After training is completed, samples from the target distribution are generated by running the NF in reverse on samples in the latent space from the standard Gaussian distribution. Below, we will demonstrate how NFs can be used to guarantee that the permeability remains in distribution during the inversion.

Training normalizing flows

Given samples from the permeability distribution, \(\{\mathbf{K}^{(j)}\}_{j=1}^N\), training NFs entails minimizing the Kullback-Leibler divergence between the base and target distributions (Ardizzone et al. 2018). This involves solving the following variational problem: \[ \underset{\mathbf{w}}{\operatorname{minimize}} \quad \frac{1}{N}\sum_{i=1}^{N} \left(\frac{1}{2}\|\mathcal{G}^{-1}_{\mathbf{w}}(\mathbf{K}_i)\|_2^2-\log\left|\det\mathrm{J}_{\mathcal{G}^{-1}_{\mathbf{w}}}(\mathbf{K}_i)\right|\right). \tag{4}\]

In this optimization problem, \(\mathcal{G}^{-1}_{\mathbf{w}}\) represents the NF, which is parameterized by its network weights \(\mathbf{w}\), while \(\mathrm{J}_{\mathcal{G}^{-1}_{\mathbf{w}}}\) denotes its Jacobian. By minimizing the \(\ell_2\)-norm, the objective imposes a Gaussian distribution on the network’s output and the second \(\log\det\) term prevents trivial solutions, i.e., cases where \(\mathcal{G}^{-1}_{\mathbf{w}}\) produces zeros. To ensure alignment between the permeability distributions, Equation 2 and Equation 4 are trained on the same dataset consisting of 2000 permeability models examples of which are included in Figure 1. Each \(64\times64\) permeability model of consists of a randomly generated highly permeable channels (\(120\,\mathrm{mD}\)) in a low-permeable background of \(20\,\mathrm{mD}\), where \(\,\mathrm{mD}\) denotes millidarcy. Generative examples produced by the trained NF are included in the second row of Figure 1, which confirm the NF’s ability to learn distributions from examples. Aside from generating samples from the learned distribution, trained NFs are also capable of carrying out density calculations, an ability we will exploit below.

Figure 1: Permeability models. First row shows the realistic permeability samples for FNO and NF training. Second row shows the generative samples from the trained NF.

Trained normalizing flows as constraints

As we mentioned before, adding constraints to the solution of non-convex optimization problems offers guarantees that model iterates remain within constrained sets. When solving inverse problems with learned surrogates, it is important that model iterates remain “in distribution”, which can be achieved by recasting the optimization problem in Equation 3 into the following constrained form:

\[ \underset{\mathbf{z}}{\operatorname{minimize}} \quad \|\mathbf{d} - \mathcal{H}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2\quad\text{subject to}\quad\|\mathbf{z}\|_2\leq \tau. \tag{5}\]

To arrive at this constrained optimization problem, two important changes were made. First, the permeability \(\mathbf{K}\) is replaced by the output of a trained NF with trained weights \(\mathbf{w}^\ast\) obtained by minimizing Equation 4. This reparameterization in terms of the latent variable, \(\mathbf{z}\), produces permeabilities that are in distribution as long as \(\mathbf{z}\) remains distributed according to the standard normal distribution. Second, we added a constraint on this latent space variable in Equation 5, which ensures that the latent variable \(\mathbf{z}\) remains within an \(\ell_2\)-norm ball of size \(\tau\).

To better understand the behavior of a trained normalizing flow in conjunction with the \(\ell_2\)-norm constraint for in- and out-of-distribution examples, we include Figure 2 and Figure 3. In the latter Figure, nonlinear projections (via latent space shrinkage), \[ \widetilde{\mathbf{K}} = \mathcal{G}_{\mathbf{w}^\ast}\left(\alpha\mathbf{z}\right)\quad\text{where}\quad \mathbf{z}=\mathcal{G}_{\mathbf{w}^\ast}^{-1}(\mathbf{K}) \tag{6}\]

are plotted as a function of increasing \(\alpha\). We also plot in Figure 4 the NF’s relative nonlinear approximation error, \(\|\mathbf{\widetilde{K}}-\mathbf{K}\|_2/\|\mathbf{K}\|_2\), and the corresponding relative FNO prediction error,\(\|\mathcal{S}_{\boldsymbol\theta^\ast}(\widetilde{\mathbf{K}})-\mathcal{S}(\widetilde{\mathbf{K}})\|_2/\|\mathcal{S}(\widetilde{\mathbf{K}})\|_2\) as a function of increasing \(0\leq\alpha\leq 1\). From these plots, we can make the following observations. First, the latent representations (Figure 2 (c) and Figure 2 (d)) of the in- and out-of-distribution samples (Figure 2 (a) and Figure 2 (b) ) clearly show that NF applied to out-of-distribution samples produces a latent variable far from the standard normal distribution, while the latent variable corresponding to the in-distribution example is close to being white Gaussian noise. Quantitatively, the \(\ell_2\) norm of the latent variables are \(0.99\|\mathcal{N}(0,\mathbf{I})\|_2\) and \(3.11\|\mathcal{N}(0,\mathbf{I})\|_2\), respectively, where \(\|\mathcal{N}(0,\mathbf{I})\|_2\) corresponds to the \(\ell_2\)-norm of the standard normal distribution. Second, we observe from Figure 3 that for small \(\ell_2\)-norm balls (\(\alpha\ll 1\)) the projected solutions tend to be close to the most probable sample, which is a flat permeability channel in the middle. This is true for both the in- and out-of-distribution example. As \(\alpha\) increases, the in-distribution example is reconstructed accurately when the \(\ell_2\) norm of the scaled latent variable, \(\|\alpha\mathbf{z}\|_2\), is close to the \(\|\mathcal{N}(0,\mathbf{I})\|_2\). Clearly, this is not the case for the out-of-distribution example. When \(\|\alpha\mathbf{z}\|_2\approx\|\mathcal{N}(0,\mathbf{I})\|_2\), the reconstruction still looks like an in-distribution permeability sample and is not close to the out-of-distribution sample. However, if \(\alpha=1\), which makes \(\|\alpha\mathbf{z}\|_2\) well beyond the norm of the standard normal distribution, the out-of-distribution example is recovered accurately by virtue of the invertibility of NFs, irrespective on their input and what they have been trained on. Third, the relative FNO prediction error for the in-distribution example (Figure 4 (a)) remains flat while the error of the FNO surrogate increases as soon as \(\alpha\approx 0.25\). At that value for \(\alpha\), the projection, \(\widetilde{\mathbf{K}}\), is gradually transitioning from being in-distribution to out-of-distribution, which occurs at a non-linear approximation error of about 45%. As expected the plots in Figure 4 also show a monotonous decay of the nonlinear approximation error as a function of increasing \(\alpha\). To further analyze the effects of the nonlinear projections in Equation 6, we draw 50 random realizations from the standard normal distribution, scale each of them by \(0\leq\alpha\leq 2\), and calculate the FNO prediction errors on these samples. Figure 5 includes the results of this excercise where each column represents the FNO prediction error calculated for \(0\leq \alpha \leq 2\). From these experiments, we make the following two observations. First, when \(\alpha<0.8\), the FNO consistently makes accurate predictions for all projected samples. Second, as expected, the FNO starts to make less accurate predictions for \(\alpha>1\) with errors that increase as the size of the \(\ell_2\)-norm ball of the latent space expands, demarcating the transition from being in distribution to being out-of-distribution.

Figure 5: FNO prediction errors for the latent space shrinkage experiment in Equation 6 for 50 random realizations of standard Gaussian noise.

In summary, the experiments of Figure 2 to Figure 4 indicate that FNO errors remain small and relatively constant for the in-distribution example. Irrespective of the value of \(\alpha\), the generated samples remain in distribution while moving from the most likely—i.e., a flat high-permeability channel in the middle, to the in-distribution sample as \(\alpha\) increases. Conversely, the projection of the out-of-distribution example morphs from being in distribution to being out-of-distribution for \(\alpha \geq 0.25\). The FNO prediction errors also increase during this transition from an in-distribution sample to an out-of-distribution sample. Therefore, shrinkage in the latent space by multiplying with a small \(\alpha\) can serve as an effective projection that ensures relatively low FNO prediction errors. We will use this unique ability to control the distribution during inversion.

Inversion with progressively relaxed learned constraints

Our main objective is to perform inversions where the multiphase flow equations are replaced with pretrained FNO surrogates. To make sure the learned surrogates remain accurate, we propose working with a continuation scheme where the learned constraint in Equation 5 is steadily relaxed by increasing the size of the \(\ell_2\)-norm ball constraint. Compared to the more common penalty formulation, where regularization entails adding a Lagrange-multiplier weighted \(\ell_2\)-norm squared, constrained formulations offer guarantees that the model iterates for the latent variable, \(\mathbf{z}\), remain within the constraint set—i.e., within the \(\ell_2\)-norm ball of size \(\tau\). Using the argument of the previous section, this implies that permeability distributions generated by the trained NF remain in distribution as long as the size of the initial \(\ell_2\)-norm ball, \(\tau_{\mathrm{init}}\), is small enough (e.g., smaller than \(0.6\|\mathcal{N}(0,\mathbf{I})\|_2\), following the observations from Figure 5). Taking advantage of this trained NF in a homotopy, we propose the following algorithm:

Algorithm 1: Inversion with relaxed learned constraints

   1. Input: initial model parameter \(\mathbf{K}_0\in \mathbb{R}^N\), observed data \(\mathbf{d}\), noise level \(\sigma\)
   2. Input: trained FNO \(\mathcal{S}_{\boldsymbol{\theta}^\ast}\), trained NF \(\mathcal{G}_{\mathbf{w}^\ast}\)
   3. Input: number of inner-loop iterations \(maxiter\)
   4. Input: initial \(\ell_2\) ball size \(\tau_{\mathrm{init}}\), multiplier \(\beta>1\), final \(\ell_2\) ball size \(\tau_{\mathrm{final}}\)
   5. \(\mathbf{z} = \mathcal{G}_{\mathbf{w}^\ast}^{-1}(\mathbf{K}_0)\)
   6. \(\tau = \tau_{\mathrm{init}}\)
   7. while \(\|\mathbf{d} - \mathcal{H}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2 > \sigma\|\mathcal{N}(0,\mathbf{I})\|_2\) and \(\tau\leq\tau_{\mathrm{final}}\)
   8. for \(iter = 1:maxiter\)
   9. \(\displaystyle\mathbf{g}=\nabla_{\mathbf{z}}\|\mathbf{d} - \mathcal{H}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2\)
10. \(\mathbf{z} = \mathcal{P}_{\tau}(\mathbf{z} - \gamma\mathbf{g})\)
11. end
12. \(\tau = \beta\tau\)
13. end
14. Output: inverted model parameter \(\mathbf{K}=\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\)

Given observed data, \(\mathbf{d}\), trained networks, \(\mathcal{S}_{\boldsymbol{\theta}^\ast}\) and \(\mathcal{G}_{\mathbf{w}^\ast}\), the initial guess for the permeability distribution, \(\mathbf{K} _0\), the initial size of the \(\ell_2\)-norm ball, \(\tau_{\mathrm{init}}\), and the final size of the \(\ell_2\)-norm ball, \(\tau_{\mathrm{final}}\), Algorithm 1 proceeds by solving a series of constrained optimization problems where the size of the constraint set is increased by a factor of \(\beta\) after each iteration (cf. line 12 in Algorithm 1). The constrained optimization subproblems themselves (cf. line 8 to 11 of Algorithm 1) are solved with projected gradient descent (Beck 2014). Each iteration of the projected gradient descent method first calculates the gradient (cf. line 9 of Algorithm 1), followed by the much cheaper projection of the updated latent variable back onto the \(\ell_2\)-norm ball of size \(\tau\) via the projection operator \(\mathcal{P}_{\tau}\) (cf. line 10 in Algorithm 1). This projection is a trivial scaling operation if the updated latent variable \(\ell_2\)-norm exceeds the constraint — i.e., \[ \mathcal{P}_{\tau}(\mathbf{z}) = \begin{cases} \mathbf{z} & \text{if } \|\mathbf{z}\|_2 \leq \tau \\ \tau\mathbf{z}/\|\mathbf{z}\|_2 & \text{if } \|\mathbf{z}\|_2 > \tau \end{cases} \tag{7}\]

A line search determines the steplength \(\gamma\) (Stanimirović and Miladinović 2010) for each iteration shown in line 8 to 11. As is common in continuation methods, the relaxed gradient-descent iterations are warm-started with the optimization result from the previous iteration, which at the first iteration is initialized by the latent representation of the initial permeability model, \(\mathbf{K}_0\) (cf. line 5 in Algorithm 1). Practically, each subproblem does not need to be fully solved, but only need a few iterations instead. The number of iterations to solve each subproblem is denoted by \(maxiter\) in line 8 of Algorithm 1. This continuation strategy serves two purposes. First, for small \(\tau\)’s it makes sure the model iterates remain in distribution, so accuracy of the learned surrogate is preserved. Second, by relaxing the constraint slowly, the data residual is gradually allowed to decrease, bringing in more and more features derived from the data. By slowly relaxing the constraint, we find a careful balance between these two purposes as long as progress is made towards the solution when solving the subproblem (cf. line 8 to 11 in Algorithm 1). One notable distinction of the surrogate-assisted inversion, compared to the conventional inversion with relaxed constraints (Esser et al. 2018), is that the size of the \(\ell_2\)-norm projection ball cannot increase far beyond the \(\ell_2\)-norm of the standard Gaussian white noise on which the NFs are trained. Otherwise, there is no guarantee the learned surrogate is accurate because the NF may generate samples that are out-of-distribution (cf. Figure 5). This is explicitly incorporated into the stopping criteria, \(\tau\leq\tau_{\mathrm{final}}\), in line 7 of Algorithm 1.

Numerical Experiments

To showcase the advocasy of the proposed optimization method with relaxed learned constraints, a series of carefully chosen experiments of increasing complexity are conducted. These experiments are designed to be relevant to GCS, which in its ultimate form involves coupling of multiphase flow with the wave equation to perform end-to-end inversion for the permeability given multimodal data. To convince ourselves of the validity of our approach, at all times comparisons will be made between inversion results involving numerical solves of the multiphase equations and inversions yielded by approximations with our learned surrogate.

For all numerical experiments, the “ground-truth” permeability model will be selected from the unseen test set and is shown in Figure 6 (a). The inversions will be initiated with the smooth permeability model depicted in Figure 6 (b). This initial model, \(\mathbf{K}_0\), represents the arithmetic mean of all permeability samples in the training dataset. To ensure that the model iterates remain in distribution, we set the starting \(\ell_2\)-norm ball size to \(\tau_{\mathrm{init}}=0.6 \|\mathcal{N}(0,\mathbf{I})\|_2\)—i.e., \(0.6\times\) the \(\ell_2\)-norm of standard white Gauss noise realizations for the discrete permeability model of 64 by 64 gridpoints. To gradually relax the learned constraint, the multiplier of the projection ball size is taken to be \(\beta=1.2\), and we set the ultimate projection ball size \(\tau_{\mathrm{final}}\) in Algorithm 1 to be \(1.2\) times the norm of standard white noise. To limit computational costs of solving the subproblems, we allow each constrained subproblem (cf. line 8 to 11 in Algorithm 1) to perform 8 iterations of projected gradient descent to solve for the latent variable. From practical experience, we found that the proposed inversions are not very sensitive to the choice of these hyperparameters.

To simulate the evolution of injected CO₂ plumes, we make use of the open-source software package Jutul.jl (Møyner et al. 2023; Møyner, Bruer, and Yin 2023; Yin, Bruer, and Louboutin 2023), which for each permeability model, \(\mathbf{K}^{(j)}\), solves the immiscible and compressible two-phase flow equations for the CO₂ and brine saturation. As shown in Figure 6 (a), an injection well is set up on the left-hand side of the model, which injects supercritical CO₂ with density 700 \(\mathrm{kg}/\mathrm{m}^3\) at a constant rate of 0.005 \(\mathrm{m}^3/\mathrm{s}\). To relieve pressure, a production well is included on the right-hand side of the model, which produces brine with density 1000 \(\mathrm{kg}/\mathrm{m}^3\) with a constant rate of also 0.005 \(\mathrm{m}^3/\mathrm{s}\). This finally results in approximately a \(6\%\) storage capacity after 800 days of CO₂ injection. From these simulations, we collect eight snapshots for the CO₂ concentration, \(\mathbf{c}=[\mathbf{c}_1, \mathbf{c}_2, \cdots,\mathbf{c}_{n_t}]\) with \(n_t=8\) the number of snapshots that cover a total time period of 800 days. The last five snapshots of these simulations are included in the top row of Figure 6 (a). Due to buoyancy effects and well control, the CO₂ plume gradually moves from the left to the right and upwards.

Given these simulated CO₂ concentrations, the optimized weights, \(\mathbf{w}^\ast\), for the FNO surrogate are calculated by minimizing Equation 2 for \(N=1900\) training pairs, \(\{\mathbf{K}^{(j)},\, \mathbf{c}^{(j)}\}_{j=1}^N\). Another 100 training pairs are used for validation. After training with 350 epochs, an average of 7% prediction error is achieved for permeability samples from the unseen test set. As observed from Figure 7, the approximation errors of the FNO are mostly concentrated at the leading edge of the CO₂ plumes. The same permeability models are used to train the NF by minimizing Equation 4 for 245 epochs using the open-source software package InvertibleNetworks.jl (P. Witte et al. 2023). We use the HINT network structure (Kruse et al. 2021) for the NF. Three generative samples are shown in the second row of Figure 1. From these examples, we can see that the trained NF is capable of generating random permeability models that resemble the ones in the training samples closely, despite minor noisy artifacts.

Unconstrained/constrained permeability inversion from CO₂ saturation data

To demonstrate that permeability inversion with surrogates is indeed feasible, we first consider the idealized, impossible in practice, situation where we assume to have access to the time-lapse CO₂ concentration, \(\mathbf{c}=[\mathbf{c}_1, \mathbf{c}_2, \cdots,\mathbf{c}_{n_t}]\), everywhere, and for all \(n_t=8\) timesteps. In that case, the measurement operator, \(\mathcal{H}\) in Equation 1, corresponds to the identity matrix. Given CO₂ concentrations simulated from the “ground-truth” permeability distribution plotted in Figure 6 (a), we invert for the permeability by minimizing the unconstrained formulation (cf. Equation 3) for the correct, yielded by the PDE, and approximate fluid-flow physics, yielded by the trained FNO. The results of these inversions after 100 iterations of gradient descent with back-tracking linesearch (Stanimirović and Miladinović 2010) are plotted in Figure 8 (a) and Figure 8 (b). From these plots, we observe that the inversion results using PDE solvers delineates most of the upper boundary of the channel accurately. Because there is a null space in the fluid-flow modeling—i.e., this null space mostly corresponds to regions of the permeability model that are barely touched by the CO₂ plume (e.g. bottom and right-hand side of the channel) — artifacts are present in the high-permeability channel itself. As expected, the reconstruction of the permeability is also not perfect at the bottom and at the far right of the model. The inversion result with the FNO surrogate is similar but introduces unrealistic artifacts in the high-permeability channel and also outside the channel. These more severe artifacts can be explained by the behavior of the FNO approximation error plotted as the orange curve in Figure 8 (e). The error value increases rapidly to 13%, and finally saturates at 10%. This behavior of the error is a manifestation of out-of-distribution model iterates that explain the erroneous behavior of the surrogate and its gradient with respect to the permeability.

Inversions yielded by the relaxed constrained formulation with the trained NF (see Algorithm 1), on the other hand, show virtually artifact free inversion results (see Figure 8 (c) and Figure 8 (d)) that compare favorably with the “ground-truth” permeability plotted in Figure 6. While adding the NF as a constraint obviously adds information, explaining the improved inversion for the accurate physics (Figure 8 (c)), it also renders the approximate surrogates more accurate, as can be observed from the blue curve in Figure 8 (e), where the FNO approximation error is controlled thanks to adding the constraint to the inversion. This behavior underlines the importance of ensuring model iterates to remain within distribution. It also demonstrates the benefits of a solution strategy where we start with a small \(\tau\), followed by relaxing the constraint slowly by increasing the size of the constraint set gradually.

Unconstrained/constrained permeability inversion from well observations

While the example of the previous section established feasibility of constrained permeability inversion, it relied on having access to the CO₂ saturation everywhere, which is unrealistic in practice. To address this issue, we first consider permeability inversion from CO₂ saturations, collected at three equally spaced monitoring well locations, for only the first 6 timesteps over the period of 600 days (Mosser, Dubrule, and Blunt 2019). In this more realistic setting, the measurement operator, \(\mathcal{H}\) in Equation 1, corresponds to a restriction operator that extracts simulated CO₂ saturations at each well location in first six snapshots. The objective function reads

\[ \underset{\mathbf{z}}{\operatorname{minimize}} \quad \|\mathbf{d}_{\mathrm{w}}-\mathbf{M}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2\quad\text{subject to}\quad\|\mathbf{z}\|_2\leq \tau, \tag{8}\]

where \(\mathbf{d}_{\mathrm{w}}\) represents the well measurements collected at three well locations through the linear restriction operator \(\mathbf{M}\). The goal is to invert for the permeability by minimizing the misfit of the well measurements of the CO₂ saturation without and with constraints on the \(\ell_2\)-norm ball in the latent space. The results of these numerical experiments are included in the first row of Figure 9, where the differences with respect to the ground truth permeability shown in Figure 6 (a) are plotted in the second row. Because the part of the permeability that is not touched by the CO₂ plume lives in the null space, we highlight the CO₂ plume in the difference plots by dark color and focus on analyzing errors within the plume region. As expected, the unconstrained inversions based on PDE solves (Figure 9 (a)) and surrogate approximations (Figure 9 (b)) are both poorly resolved because of the limited spatial information on the saturation. Contrasting these unconstrained inversions with results for the constrained inversions for the PDE (Figure 9 (c)) and surrogate (Figure 9 (d)) again shows the importance of adding constraints to the inversion. Figure 9 (i) clearly demonstrates that the FNO prediction errors remain relatively constant during constrained inversion while the error continues to grow during the unconstrained iterations eventually exceeding 14%. Both constrained results improve significantly, even though they converge to different solutions in the end. This is because history matching is typically an ill-posed problem with many distinctive solutions (Canchumuni, Emerick, and Pacheco 2019). This observation further motivates us to consider the experiment below, where time-lapse seismic data are jointly inverted for the subsurface permeability.

Multiphysics end-to-end inversion

Next, we consider the alternative setting for seismic monitoring of geological carbon storage, where the dynamics of the CO₂ plumes are indirectly observed from time-lapse seismic data. In this case, the measurement operator, \(\mathcal{H}\), involves the composition of the rock physics modeling operator, \(\mathcal{R}\), which converts CO₂ saturations to decreases in the compressional wavespeeds for rocks within the reservoir (Avseth, Mukerji, and Mavko 2010), and the seismic modeling operator, \(\mathcal{F}\), which generates time-lapse seismic data recorded at the receiver locations and based on acoustic wave equation modeling (Sheriff and Geldart 1995). The multiphysics end-to-end inversion process estimates permeability from time-lapse seismic data via inversion of these nested physics operators for the flow, rock physics, and waves (D. Li et al. 2020). Following earlier work by Yin et al. (2022) and Louboutin, Yin, et al. (2023), the fluid-flow PDE modeling is replaced by the trained FNO (cf. Equation 5), resulting in the following optimization problem:

\[ \underset{\mathbf{z}}{\operatorname{minimize}} \quad \|\mathbf{d}_{\mathrm{s}} - \mathcal{F}\circ\mathcal{R}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2 \quad\text{subject to}\quad\|\mathbf{z}\|_2\leq \tau, \tag{9}\]

where \(\mathbf{d}_{\mathrm{s}}\) represents the observed time-lapse seismic data. While this end-to-end inversion problem benefits from having remote access to changes in the compressional wavespeed, it may now suffer from null spaces associated with the flow, \(\mathcal{S}_{\boldsymbol\theta^\ast}\), and the wave/rock physics, \(\mathcal{F}\circ\mathcal{R}\). For instance, the latter suffers from bandwidth limitation of the source function and from limited aperture. Because important components are missing in the observed data, inversion based on the data objective alone in Equation 9 are likely to suffer from artifacts that can easily drive the intermediate permeability model iterates out-of-distribution.

To demonstrate capabilities of the proposed relaxed inversion procedure with surrogates for the fluid flow, we assume the baseline to be known—i.e, we assume the brine-filled reservoir with \(25\%\) porosity to be acoustically homogeneous prior to CO₂ injection with a compressional wavespeed of \(3500 \mathrm{m/s}\). We use the patchy saturation model (Avseth, Mukerji, and Mavko 2010) to convert the time-dependent CO₂ saturation resulting in \(<300\mathrm{m/s}\) decreases in the wavespeed within the CO₂ plumes. We collect six seismic surveys at the first six snapshots for the CO₂ saturation from day 100 to day 600, which are the same snapshots as the ones used in the previous experiment. For each time-lapse seismic survey, 16 active-seismic sources are located within a well on the left-hand side of the model. We also position 16 sources on the top of the model. Each active source uses a Ricker wavelet with a central frequency of \(50\mathrm{Hz}\). The transmitted and reflected wavefields are collected by 480 receivers on the top and 480 receivers on the right-hand side of the model. The seismic acquisition is shown in Figure 10, where the plume at the last seismic vintage (at day 600) is plotted in the middle.

Figure 10: Seismic acquisition. The white \(\boldsymbol{\times}\) represents the acoustic sources, and the red lines represent the dense receivers. The CO₂ saturation snapshot at day 600 is plotted in the middle, which is the last snapshot that is monitored seismically.

To avoid numerical dispersion, the velocity model is upsampled by a factor of two in both the horizontal and vertical directions, which results in a \(7.5\mathrm{m}\) grid spacing. For the simulations, use is made of the open-source software package JUDI.jl (Philipp A. Witte et al. 2019; Louboutin, Witte, et al. 2023) to generate the time-lapse seismic data at the first six snapshots. The fact that this software is based on Devito’s wave propagators (Louboutin et al. 2019; Luporini et al. 2020) allows us to do this quickly. For realism, we add 10 dB Gaussian noise to the time-lapse seismic data. Given these six time-lapse vintages, our goal is to invert for the permeability in the reservoir by minimizing the time-lapse seismic data misfit through the nested physics operators shown in Equation 9.

Inversion results obtained by solving the PDEs for the fluid flow during the inversion are shown in Figure 11 (a) and Figure 11 (c). As before, the inversions benefit majorly from adding the trained NF as a constraint. Remarkably, the end-to-end inversion results shown in Figure 11 (a), Figure 11 (c), and Figure 11 (d) are close to the results plotted in Figure 8 (a), Figure 8 (c), and Figure 8 (d), which was obtained with access to the CO₂ saturation everywhere. This reaffirms the notion that time-lapse seismic can indeed provide useful spatial information away from the monitoring wells to estimate the reservoir permeability, which aligns with earlier observations by D. Li et al. (2020), Yin et al. (2022), and Louboutin, Yin, et al. (2023). Juxtaposing the results for the FNO surrogate without (Figure 11 (b)) and with the constraint (Figure 11 (d)) again underlines the importance of adding constraints especially in situations where the forward (wave) operator has a non-trivial nullspace. The presence of this nullspace has a detrimental affect on the unconstrained result obtained by the FNO. Contrary to solutions yielded by the PDE, trained FNOs offer little to no control on the feasibility of the solution, which explains the strong artifacts in Figure 11 (b). As we can see from Figure 11 (i), these artifacts are mainly due to the FNO-approximation errors that dominate and grow after a few iterations. Conversely, the errors for the constrained case remain more or less flatlined between \(7\%\) and \(8\%\). In contrast, using the trained NF as a learned constraint yields better recovery where the errors are minor within the plume region and mostly live on the edges, shown in the second row of Figure 11.

Jointly inverting time-lapse seismic data and well measurements

Finally, we consider the most preferred scenario for GCS monitoring, where multiple modalities of data are jointly inverted for the reservoir permeability (Huang 2022; M. Liu et al. 2023). In our experiment, we consider to jointly invert time-lapse seismic data and well measurements by minimizing the following objective function: \[ \underset{\mathbf{z}}{\operatorname{minimize}} \quad \|\mathbf{d}_{\mathrm{s}} - \mathcal{F}\circ\mathcal{R}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2 + \lambda \|\mathbf{d}_{\mathrm{w}}-\mathbf{M}\circ\mathcal{S}_{\boldsymbol\theta^\ast}\circ\mathcal{G}_{\mathbf{w}^\ast}(\mathbf{z})\|_2^2 \quad\text{subject to}\quad\|\mathbf{z}\|_2\leq \tau. \tag{10}\]

This objective function includes both the time-lapse seismic data misfit from Equation 9 and the time-lapse well measurement misfit from Equation 8 with a balancing term \(\lambda\). While better choices can be made, we select this \(\lambda\) in our numerical experiment to be \(10\), so that the magnitudes of the two terms are relatively the same. The inversion results and differences from the unseen ground truth permeability are shown in Figure 12, where we again observe large artifacts for the recovery when FNO surrogate is inverted without NF constraints. This behavior is confirmed by the plot for the FNO error curve as a function of the number of iterations. This error finally reaches a value over \(15\%\).

We report quantitative measures for the permeability inversions for all optimization methods and different types of observed data in Table 1 for the signal-to-noise ratios (S/Ns) and the structural similarity index measure (SSIM, Wang et al. 2004). To avoid undue influence of the null space for the permeability, we only calculate the S/N and SSIM values based on the parts of the models that are touched by CO₂ plume. From these values, following observations can be made. First, the NF-constrained permeability inversion are superior in both S/Ns and SSIMs, which demonstrates the efficacy of the learned constraint. Second, by virtue of this NF constraint, the results yielded by either the PDE solver or by the FNO surrogate produce very similar S/Ns and SSIMs. This behavior reaffirms that the trained FNO behavior is similar to the behavior yielded by PDE solver when its inputs remain in-distribution, which is controlled by the NF constraints.

Table 1: S/N (in dB) and SSIM values of permeability recovery.
Inversion method	Well measurement	Time-lapse seismic	Both
Unconstrained inversion with PDE solvers	9.34 dB / 0.67	10.50 dB / 0.73	10.70 dB / 0.73
Unconstrained inversion with FNO surrogates	9.64 dB / 0.68	11.94 dB / 0.72	11.98 dB / 0.72
Constrained inversion with PDE solvers	12.2 dB / 0.77	14.18 dB / 0.80	15.20 dB / 0.85
Constrained inversion with FNO surrogates	11.06 dB / 0.74	14.16 dB / 0.81	14.92 dB / 0.83

CO₂ plume estimation and forecast

While end-to-end permeability inversion from time-lapse data provides novel access to this important fluid-flow property, the real interest in monitoring GCS lies in determining where CO₂ plumes are and will be in the foreseeable future, say of 100 and 200 days ahead. To demonstrate the value of the proposed surrogates and of the use of time-lapse seismic data, as opposed to time-lapse saturation data measured at the wells only, we in Figure 7 juxtapose CO₂ predictions obtained from fluid-flow simulations based on the inverted permeabilities in situations where either well data is available (first row), or where time-lapse seismic data is available (second row), or where both data modalities are available (third row). These results are achieved by first inverting for permeabilities using FNO surrogates and NF constraints, followed by running the fluid-flow simulations for additional time steps given the inverted permeabilities yielded by well-only (Figure 9 (d)), time-lapse data (Figure 11 (d)), and both (Figure 12 (d)). From these plots, we draw the following two conclusion. First, the predicted CO₂ plumes estimated from seismic data are significantly more accurate than those obtained by inverting time-lapse saturations measured at the wells only. As expected, there are large errors in the regions away from the wells for the CO₂ plumes estimated from wells shown in the fourth row of Figure 13. Second, thanks to the NF-constraint, the CO₂ predictions obtained with the computationally beneficial surrogate approximation remain close to the ground truth CO₂ plume plotted in the first row of Figure 7, with only minor artifacts at the edges. Third, using both seismic data and well measurements produces CO₂ plume predictions with the smallest errors, while the uplift of well measurements on top of seismic observations is modest (comparing the second and the third rows of Figure 13). Finally, while the CO₂ plume estimates for the past (monitored) vintages (i.e. first three columns of the third row of Figure 13) are accurate, the near-future forecasts without time-lapse well or seismic data (i.e. last two columns of the third row of Figure 13) could be less accurate. This is because the right-hand side and the bottom of the permeability model are not touched yet by the CO₂ plume during the first 600 days. As a result, the error on the permeability recovery on the right-hand side leads to the slightly larger errors on the CO₂ plume forecast. Overall, these CO₂ forecasts for the future 100 and 200 days match the general trend of the CO₂ plume without any observed data despite minor errors. A continuous monitoring system, where multiple modalities of data are being acquired and inverted throughout the GCS project, could allow for updating the reservoir permeability and forecasting the CO₂ plume consistently.

Analysis of computational gains

FNOs, and deep neural surrogates in general, have the potential to be orders of magnitude faster than conventional PDE solvers (Z. Li et al. 2020), and this speed-up is generally problem-dependent. In our numerical experiments, the PDE solver from Jutul.jl (Møyner et al. 2023; Møyner, Bruer, and Yin 2023; Yin, Bruer, and Louboutin 2023) currently only supports CPUs and we find an average runtime for both the forward and gradient on the \(64 \times 64\) model to be 10.6 seconds on average on an 8-core Intel(R) Xeon(R) W-2245 CPU. The trained FNO, implemented using modules from Flux.jl (Innes 2018), takes 16.4 milliseconds on average for both the forward and gradient. This means that the trained FNO in our case provides \(646\times\) speed up compared to conventional PDE solvers. The training of FNO takes about 4 hours on an NVIDIA T1000 8GB GPU. Given these numbers, we can calculate the break-even point — i.e., the point where using FNO surrogate becomes cheaper in terms of the overall runtime, by the following formula:

\[ \textrm{breakeven} = \frac{\textrm{generating training set time} + \textrm{training time}}{\textrm{PDE solver runtime}-\textrm{FNO runtime}} \approx 3364. \tag{11}\]

This means that after 3364 calls to the forward simulator and its gradients, the computational savings gained from using the FNO surrogate evaluations during the inversion process balances out the initial upfront costs. These upfront costs include the generation of the training dataset and the training of the FNO. Therefore, after this break-even point of 3364 calls, the use of the FNO surrogate becomes more cost-effective compared to the conventional PDE solver. Because the trained FNO has the potential to generalize to different kinds of inversion problems, and potentially also different GCS sites, 3364 calls is justifiable in practice. However, we acknowledge that a more detailed analysis on a more realistic 4D scale problem will be necessary to understand the potential computational gains and tradeoffs of the proposed methodology. For details on a high-performance computing parallel implementation of FNOs, we refer to Grady et al. (2023) who also conducted a realistic performance on large-scale 4D multiphase fluid-flow problems. Even in cases where the computational advances are perhaps challenging to justify, the use of FNOs has the additional benefit by providing access to the gradient with respect to model parameters (i.e. permeability) through automatic differentiation. This feature is important since it is an enabler for inversion problems that involve complex PDE solvers for which gradients are often not readily available, e.g. Gross and Mazuyer (2021). By training FNOs on input-output pairs, “gradient-free” gradient-based inversion is made possible in situations where the simulator does not support gradients.

Discussion and conclusions

Monitoring of geological carbon storage is challenging because of excessive computational needs and demands on data collection by drilling monitor wells or by collecting time-lapse seismic data. To offset the high computational costs of solving multiphase flow equations and to improve permeability inversions from possibly multimodal time-lapse data, we introduce the usage of trained Fourier neural operators (FNOs) that act as surrogates for the fluid-flow simulations. We propose to do this in combination with trained normalizing flows (NFs), which serve as regularizers to keep the inversion and the accuracy of the FNOs in check. Since the computational expense of FNO’s online evaluation is negligible compared to numerically expensive partial differential equation solves, FNOs only incur upfront offline training costs. While this obviously presents a major advantage, the approximation accuracy of FNOs is, unfortunately, only guaranteed when its argument, the permeability, is in distribution—i.e., is drawn from the same distribution as the FNO was trained on. This creates a problem because there is, thanks to the non-trivial null space of permeability inversion, no guarantee the model iterates remain in-distribution. Quite the opposite, our numerical examples show that these iterates typically move out-of-distribution during the (early) iterations. This results in large errors in the FNO and in rather poor inversion results for the permeability.

To overcome this out-of-distribution dillema for the model iterates during permeability inversion with FNOs, we propose adding learned constraints, which ensure that model iterates remain in-distribution during the inversion. We accomplish this by training a NF on the same training set for the permeability used to train the FNO. After training, the NF is capable of generating in-distribution samples for the permeability from random realizations of standard Gaussian noise in the latent space. We employ this learned ability by parameterizing the unknown permeability in the latent space, which offers additional control on whether the model iterates remain in-distribution during the inversion. After establishing that out-of-distribution permeability models can be mapped to in-distribution models by restricting the \(\ell_2\)-norm of their latent representation, we introduce permeability inversion as a constrained optimization problem where the data misfit is minimized subject to a constraint on the \(\ell_2\)-norm of the latent space. Compared to adding pretrained NFs as priors via additative penalty terms, use of constraints ensures that model iterates remain at all times in-distribution. We show that this holds as long as the size of constraint set does not exceed the size of the \(\ell_2\)-norm ball of the standard normal distribution. As a result, we arrive at a computationally efficient continuation scheme, known as a homotopy, during which the \(\ell_2\)-norm constraint is relaxed slowly, so the data misfit objectives can be minimized while the model iterates remain in distribution.

By means of a series of carefully designed numerical experiments, we were able to establish the advocasy of combining learned surrogates and constraints, yielding solutions to permeability inversion problems that are close to solutions yielded by costly PDE-based methods. The examples also clearly show the advantages of working with gradually relaxed constraints where model iterates remain at all times in distribution with the additional joint benefit of slowly building up the model while bringing down the data misfit, an approach known to mitgate the effects of local minima (Peters and Herrmann 2017; Esser et al. 2018; Peters, Smithyman, and Herrmann 2019). Consequently, the quality of all time-lapse inversions improved significantly without requiring information that goes beyond having access to the training set of permeability models.

While we applied the proposed method to gradient-based iterative inversion, a similar approach can be used for other types of inversion methods, including inference with Markov chain Monte Carlo methods for uncertainty quantification (Lan, Li, and Shahbaba 2022). We also envisage extensions of the proposed method to other physics-based inverse problems (Heidenreich et al. 2014; Yang et al. 2023) and simulation-based inference problems (Cranmer, Brehmer, and Louppe 2020), where numerical simulations often form the computational bottleneck.

Despite the encouraging results from the numerical experiments, the presented approach leaves room for improvements, which we will leave for future work. For instance, the gradient with respect to the model parameters (permeability) derived from the neural surrogate is not guaranteed to be accurate—e.g. close to the gradient yielded by the adjoint-state method. As recent work by O’Leary-Roseberry et al. (2022) has shown, this potential source of error can be addressed by training neural surrogates on the simulator’s gradient with respect to the model parameters, provided it is available. Unfortunately, deriving gradients of complex HPC implementatons of numerical PDE solvers is often extremely challenging, explaining why this information is often not available. Because our method solely relies on gradients of the surrogate, which are readily available through algorithmic differentiation, we only need access to numerical PDE solvers available in legacy codes. While this approach may go at the expense of some accuracy, this feature offers a distinct practical advantage. However, as with many other machine learning approaches, our learned methods may also suffer from time-lapse observations that are out-of-distribution—i.e., produced by a permeability model that is out-of-distribution. While this is a common problem in data-driven methods, recent developments (Siahkoohi et al. 2023a) may remedy this problem by applying latent space corrections, a solution that is amenable to our approach. On the other hand, expanding the latent space’s \(\ell_2\) norm ball during inversion would allow NFs to generate any out-of-distribution model parameter. However, in that case the accuracy of the learned surrogate is not guaranteed. For such cases, transitioning from the learned surrogate to the numerical solver during later iterations may be advantageous and merits further study. The choice for the size of the \(\ell_2\)-norm ball at the beginning and at the end can also be further investigated (Aster, Borchers, and Thurber 2018).

While our paper primarily presents a proof of concept through a relatively small 2D experiment, our inversion strategy is designed to scale to large-scale 3D problems. NFs, with their inherent memory efficiency due to invertibility, are already primed for extension to 3D problems. For the learned surrogates, Grady et al. (2023) showcases model-parallel FNOs, demonstrating success in simulating 4D multiphase flow physics of over 2.6 billon variables. By combining these strengths, we are optimistic scaling this inversion strategy to 3D.

To end on a positive note and forward looking note, we argue that the presented approach makes a strong case for the inversion of multimodal data, consisting of time-lapse well and seismic data. While inversions from time-lapse saturation data collected from wells are feasible and fall within the realm of reservoir engineering, their performance, as expected, degrades away from the well. We argue that adding active-source seismic provides essential fill-in away from the wells. As such, it did not come to our surprise that joint inversion of multimodal data resulted in the best permeability estimates. From our perspective, our successfull combination of these often disjoint data modalities holds future promise when addressing challenges that come with monitoring and control of geological carbon storage and enhanced geothermal systems.

Availability of data and materials

The scripts to reproduce the experiments are available on the SLIM GitHub page https://github.com/slimgroup/FNO-NF.jl.

Abbreviations

FNO: Fourier neural operator
GCS: geological carbon storage
NF: normalizing flow
PDE: partial differential equations

Acknowledgements

The authors would like to thank Philipp A. Witte at Microsoft, Ali Siahkoohi at Rice University, and Olav Møyner at SINTEF for constructive discussions. The authors gratefully acknowledge the contribution of OpenAI’s ChatGPT for refining sentence structure and enhancing the overall readability of this manuscript.

Funding

This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center. This research was also supported in part by the US National Science Foundation grant OAC 2203821 and the Department of Energy grant No. DE-SC0021515.

Author information

Authors and Affiliations

Georgia Institute of Technology, 756 West Peachtree Street NW, Atlanta, GA 30308

Ziyi Yin, Rafael Orozco, Mathias Louboutin, and Felix J. Herrmann

Contributions

ZY concepted the general outline of the project, implemented the algorithm, designed and conducted the numerical experiments, and wrote the manuscript. RO helped concept the general outline of the project, discussed the approach and helped design the numerical experiments. ML helped concept the general outline of the project and discussed the approach. FH concepted the general outline of the project, discussed the approach, helped design the numerical experiments, edited the manuscript, and supervised the project. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Ziyi Yin.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

References

Alexanderian, Alen. 2021. “Optimal Experimental Design for Infinite-Dimensional Bayesian Inverse Problems Governed by PDEs: A Review.” Inverse Problems 37 (4): 043001.

Alnes, Håvard, Ola Eiken, Scott Nooner, Glenn Sasagawa, Torkjell Stenvold, and Mark Zumberge. 2011. “Results from Sleipner Gravity Monitoring: Updated Density and Temperature Distribution of the CO2 Plume.” Energy Procedia 4: 5504–11.

Ardizzone, Lynton, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W Pellegrini, Ralf S Klessen, Lena Maier-Hein, Carsten Rother, and Ullrich Köthe. 2018. “Analyzing Inverse Problems with Invertible Neural Networks.” arXiv Preprint arXiv:1808.04730.

Arridge, Simon R. 1999. “Optical Tomography in Medical Imaging.” Inverse Problems 15 (2): R41.

Arts, Rob, Ola Eiken, Andy Chadwick, Peter Zweigel, L Van der Meer, and B Zinszner. 2004. “Monitoring of CO2 Injected at Sleipner Using Time-Lapse Seismic Data.” Energy 29 (9-10): 1383–92.

Asher, Michael J, Barry FW Croke, Anthony J Jakeman, and Luk JM Peeters. 2015. “A Review of Surrogate Models and Their Application to Groundwater Modeling.” Water Resources Research 51 (8): 5957–73.

Asim, Muhammad, Max Daniels, Oscar Leong, Ali Ahmed, and Paul Hand. 2020. “Invertible Generative Models for Inverse Problems: Mitigating Representation Error and Dataset Bias.” In International Conference on Machine Learning, 399–409. PMLR.

Aster, Richard C, Brian Borchers, and Clifford H Thurber. 2018. Parameter Estimation and Inverse Problems. Elsevier.

Avseth, Per, Tapan Mukerji, and Gary Mavko. 2010. Quantitative Seismic Interpretation: Applying Rock Physics Tools to Reduce Interpretation Risk. Cambridge university press.

Beck, Amir. 2014. Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. SIAM.

Benitez, Jose Antonio Lara, Takashi Furuya, Florian Faucher, Xavier Tricoche, and Maarten V de Hoop. 2023. “Fine-Tuning Neural-Operator Architectures for Training and Generalization.” arXiv Preprint arXiv:2301.11509.

Canchumuni, Smith WA, Alexandre A Emerick, and Marco Aurelio C Pacheco. 2019. “History Matching Geological Facies Models Based on Ensemble Smoother and Deep Generative Models.” Journal of Petroleum Science and Engineering 177: 941–58.

Cao, Yang, Shengtai Li, Linda Petzold, and Radu Serban. 2003. “Adjoint Sensitivity Analysis for Differential-Algebraic Equations: The Adjoint DAE System and Its Numerical Solution.” SIAM Journal on Scientific Computing 24 (3): 1076–89.

Carcione, Jose M, Davide Gei, Stefano Picotti, and Alberto Michelini. 2012. “Cross-Hole Electromagnetic and Seismic Modeling for CO2 Detection and Monitoring in a Saline Aquifer.” Journal of Petroleum Science and Engineering 100: 162–72.

Chandra, Rohitash, Danial Azam, Arpit Kapoor, and R Dietmar Müller. 2020. “Surrogate-Assisted Bayesian Inversion for Landscape and Basin Evolution Models.” Geoscientific Model Development 13 (7): 2959–79.

Cowles, Mary Kathryn, and Bradley P Carlin. 1996. “Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review.” Journal of the American Statistical Association 91 (434): 883–904.

Cranmer, Kyle, Johann Brehmer, and Gilles Louppe. 2020. “The Frontier of Simulation-Based Inference.” Proceedings of the National Academy of Sciences 117 (48): 30055–62.

De Hoop, Maarten, Daniel Zhengyu Huang, Elizabeth Qian, and Andrew M Stuart. 2022. “The Cost-Accuracy Trade-Off in Operator Learning with Neural Networks.” arXiv Preprint arXiv:2203.13181.

Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. 2016. “Density Estimation Using Real Nvp.” arXiv Preprint arXiv:1605.08803.

Esser, Ernie, Lluis Guasch, Tristan van Leeuwen, Aleksandr Y Aravkin, and Felix J Herrmann. 2018. “Total Variation Regularization Strategies in Full-Waveform Inversion.” SIAM Journal on Imaging Sciences 11 (1): 376–406.

Freifeld, Barry M, Thomas M Daley, Susan D Hovorka, Jan Henninges, Jim Underschultz, and Sandeep Sharma. 2009. “Recent Advances in Well-Based Monitoring of CO2 Sequestration.” Energy Procedia 1 (1): 2277–84.

Furre, Anne-Kari, Ola Eiken, Håvard Alnes, Jonas Nesland Vevatne, and Anders Fredrik Kiær. 2017. “20 Years of Monitoring CO2-Injection at Sleipner.” Energy Procedia 114: 3916–26. https://doi.org/https://doi.org/10.1016/j.egypro.2017.03.1523.

Golub, Gene H, Per Christian Hansen, and Dianne P O’Leary. 1999. “Tikhonov Regularization and Total Least Squares.” SIAM Journal on Matrix Analysis and Applications 21 (1): 185–94.

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Nets.” Advances in Neural Information Processing Systems 27.

Grady, Thomas J, Rishi Khan, Mathias Louboutin, Ziyi Yin, Philipp A Witte, Ranveer Chandra, Russell J Hewett, and Felix J Herrmann. 2023. “Model-Parallel Fourier Neural Operators as Learned Surrogates for Large-Scale Parametric PDEs.” Computers & Geosciences, 105402.

Griewank, Andreas et al. 1989. “On Automatic Differentiation.” Mathematical Programming: Recent Developments and Applications 6 (6): 83–107.

Gross, Herve, and Antoine Mazuyer. 2021. “GEOSX: A Multiphysics, Multilevel Simulator Designed for Exascale Computing.” In SPE Reservoir Simulation Conference. OnePetro.

Heidenreich, S, H Gross, MA Henn, C Elster, and M Bär. 2014. “A Surrogate Model Enables a Bayesian Approach to the Inverse Problem of Scatterometry.” In Journal of Physics: Conference Series, 490:012007. 1. IOP Publishing.

Hennenfent, Gilles, Ewout van den Berg, Michael P Friedlander, and Felix J Herrmann. 2008. “New Insights into One-Norm Solvers from the Pareto Curve.” Geophysics 73 (4): A23–26.

Hijazi, Saddam, Melina Freitag, and Niels Landwehr. 2023. “POD-Galerkin Reduced Order Models and Physics-Informed Neural Networks for Solving Inverse Problems for the Navier–Stokes Equations.” Advanced Modeling and Simulation in Engineering Sciences 10 (1): 1–38.

Horvat, Christian, and Jean-Pascal Pfister. 2022. “Intrinsic Dimensionality Estimation Using Normalizing Flows.” Advances in Neural Information Processing Systems 35: 12225–36.

Huang, Lianjie. 2022. “Geophysical Monitoring for Geologic Carbon Storage.”

Innes, Mike. 2018. “Flux: Elegant Machine Learning with Julia.” Journal of Open Source Software 3 (25): 602.

Jansen, Jan Dirk. 2011. “Adjoint-Based Optimization of Multi-Phase Flow Through Porous Media–a Review.” Computers & Fluids 46 (1): 40–51.

Karniadakis, George Em, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. “Physics-Informed Machine Learning.” Nature Reviews Physics 3 (6): 422–40.

Kingma, Diederik P, and Max Welling. 2013. “Auto-Encoding Variational Bayes.” arXiv Preprint arXiv:1312.6114.

Kobyzev, Ivan, Simon JD Prince, and Marcus A Brubaker. 2020. “Normalizing Flows: An Introduction and Review of Current Methods.” IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11): 3964–79.

Kontolati, Katiana, Somdatta Goswami, George Em Karniadakis, and Michael D Shields. 2023. “Learning in Latent Spaces Improves the Predictive Accuracy of Deep Neural Operators.” arXiv Preprint arXiv:2304.07599.

Kovachki, Nikola, Samuel Lanthaler, and Siddhartha Mishra. 2021. “On Universal Approximation and Error Bounds for Fourier Neural Operators.” Journal of Machine Learning Research 22: Art–No.

Kovachki, Nikola, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2021. “Neural Operator: Learning Maps Between Function Spaces.” arXiv Preprint arXiv:2108.08481.

Kruse, Jakob, Gianluca Detommaso, Ullrich Köthe, and Robert Scheichl. 2021. “HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference.” In Proceedings of the AAAI Conference on Artificial Intelligence, 35:8191–99. 9.

Lan, Shiwei, Shuyi Li, and Babak Shahbaba. 2022. “Scaling up Bayesian Uncertainty Quantification for Inverse Problems Using Deep Neural Networks.” SIAM/ASA Journal on Uncertainty Quantification 10 (4): 1684–1713.

Lensink, Keegan, Bas Peters, and Eldad Haber. 2022. “Fully Hyperbolic Convolutional Neural Networks.” Research in the Mathematical Sciences 9 (4): 60.

Li, Dongzhuo, Kailai Xu, Jerry M Harris, and Eric Darve. 2020. “Coupled Time-Lapse Full-Waveform Inversion for Subsurface Flow Problems Using Intrusive Automatic Differentiation.” Water Resources Research 56 (8): e2019WR027032.

Li, Xiang, Aleksandr Y Aravkin, Tristan van Leeuwen, and Felix J Herrmann. 2012. “Fast Randomized Full-Waveform Inversion with Compressive Sensing.” Geophysics 77 (3): A13–17.

Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2020. “Fourier Neural Operator for Parametric Partial Differential Equations.” arXiv Preprint arXiv:2010.08895.

Li, Zongyi, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. 2021. “Physics-Informed Neural Operator for Learning Partial Differential Equations.” arXiv Preprint arXiv:2111.03794.

Lie, Knut-Andreas. 2019. An Introduction to Reservoir Simulation Using MATLAB/GNU Octave: User Guide for the MATLAB Reservoir Simulation Toolbox (MRST). Cambridge University Press.

Liu, Dong C, and Jorge Nocedal. 1989. “On the Limited Memory BFGS Method for Large Scale Optimization.” Mathematical Programming 45 (1-3): 503–28.

Liu, Mingliang, Divakar Vashisth, Dario Grana, and Tapan Mukerji. 2023. “Joint Inversion of Geophysical Data for Geologic Carbon Sequestration Monitoring: A Differentiable Physics-Informed Neural Network Model.” Journal of Geophysical Research: Solid Earth 128 (3): e2022JB025372.

Louboutin, Mathias, Fabio Luporini, Michael Lange, Navjot Kukreja, Philipp A. Witte, Felix J. Herrmann, Paulius Velesko, and Gerard J. Gorman. 2019. “Devito (V3.1.0): An Embedded Domain-Specific Language for Finite Differences and Geophysical Exploration.” Geoscientific Model Development. https://doi.org/10.5194/gmd-12-1165-2019.

Louboutin, Mathias, Philipp A. Witte, Ali Siahkoohi, Gabrio Rizzuti, Ziyi Yin, Rafael Orozco, and Felix J. Herrmann. 2022. “Accelerating Innovation with Software Abstractions for Scalable Computational Geophysics.” https://doi.org/10.1190/image2022-3750561.1.

Louboutin, Mathias, Philipp Witte, Ziyi Yin, Henryk Modzelewski, Kerim, Carlos da Costa, and Peterson Nogueira. 2023. Slimgroup/JUDI.jl: V3.2.3 (version v3.2.3). Zenodo. https://doi.org/10.5281/zenodo.7785440.

Louboutin, Mathias, Ziyi Yin, Rafael Orozco, Thomas J. Grady II, Ali Siahkoohi, Gabrio Rizzuti, Philipp A. Witte, Olav Møyner, Gerard J. Gorman, and Felix J. Herrmann. 2023. “Learned Multiphysics Inversion with Differentiable Programming and Machine Learning.” The Leading Edge 42 (7): 452–516. https://doi.org/10.1190/tle42070474.1.

Lu, Kuan, Yulin Jin, Yushu Chen, Yongfeng Yang, Lei Hou, Zhiyong Zhang, Zhonggang Li, and Chao Fu. 2019. “Review for Order Reduction Based on Proper Orthogonal Decomposition and Outlooks of Applications in Mechanical Systems.” Mechanical Systems and Signal Processing 123: 264–97.

Lu, Lu, Pengzhan Jin, and George Em Karniadakis. 2019. “Deeponet: Learning Nonlinear Operators for Identifying Differential Equations Based on the Universal Approximation Theorem of Operators.” arXiv Preprint arXiv:1910.03193.

Lumley, David. 2010. “4D Seismic Monitoring of CO 2 Sequestration.” The Leading Edge 29 (2): 150–55.

Luporini, Fabio, Mathias Louboutin, Michael Lange, Navjot Kukreja, Philipp Witte, Jan Hückelheim, Charles Yount, Paul HJ Kelly, Felix J Herrmann, and Gerard J Gorman. 2020. “Architecture and Performance of Devito, a System for Automated Stencil Computation.” ACM Transactions on Mathematical Software (TOMS) 46 (1): 1–28.

Mosser, Lukas, Olivier Dubrule, and Martin J Blunt. 2019. “Deepflow: History Matching in the Space of Deep Generative Models.” arXiv Preprint arXiv:1905.05749.

Møyner, Olav, Grant Bruer, and Ziyi Yin. 2023. Sintefmath/JutulDarcy.jl: V0.2.3 (version v0.2.3). Zenodo. https://doi.org/10.5281/zenodo.7855628.

Møyner, Olav, Martin Johnsrud, Halvor Møll Nilsen, Xavier Raynaud, Kjetil Olsen Lye, and Ziyi Yin. 2023. Sintefmath/Jutul.jl: V0.2.6 (version v0.2.6). Zenodo. https://doi.org/10.5281/zenodo.7855605.

Nogues, Juan P, Jan M Nordbotten, and Michael A Celia. 2011. “Detecting Leakage of Brine or CO2 Through Abandoned Wells in a Geological Sequestration Operation Using Pressure Monitoring Wells.” Energy Procedia 4: 3620–27.

Nooner, Scott L, Ola Eiken, Christian Hermanrud, Glenn S Sasagawa, Torkjell Stenvold, and Mark A Zumberge. 2007. “Constraints on the in Situ Density of CO2 Within the Utsira Formation from Time-Lapse Seafloor Gravity Measurements.” International Journal of Greenhouse Gas Control 1 (2): 198–214.

Nordbotten, Jan M, and Michael Anthony Celia. 2011. “Geological Storage of CO2: Modeling Approaches for Large-Scale Simulation.” In Geological Storage of CO 2: Modeling Approaches for Large-Scale Simulation. John Wiley; Sons.

O’Leary-Roseberry, Thomas, Peng Chen, Umberto Villa, and Omar Ghattas. 2022. “Derivate Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning.” arXiv Preprint arXiv:2206.10745.

Orozco, Rafael, Mathias Louboutin, and Felix J Herrmann. 2022. “Memory Efficient Invertible Neural Networks for 3D Photoacoustic Imaging.” arXiv Preprint arXiv:2204.11850.

Orozco, Rafael, Mathias Louboutin, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix J. Herrmann. 2023. “Amortized Normalizing Flows for Transcranial Ultrasound with Uncertainty Quantification,” March. https://openreview.net/forum?id=LoJG-lUIlk.

Orozco, Rafael, Ali Siahkoohi, Mathias Louboutin, and Felix J Herrmann. 2023. “Refining Amortized Posterior Approximations Using Gradient-Based Summary Statistics.” arXiv Preprint arXiv:2305.08733.

Orozco, Rafael, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix J. Herrmann. 2021. “Photoacoustic Imaging with Conditional Priors from Normalizing Flows.” https://openreview.net/forum?id=woi1OTvROO1.

———. 2023. “Adjoint operators enable fast and amortized machine learning based Bayesian uncertainty quantification.” In Medical Imaging 2023: Image Processing, edited by Olivier Colliot and Ivana IÅ¡gum, 12464:124641L. International Society for Optics; Photonics; SPIE. https://doi.org/10.1117/12.2651691.

Pestourie, Raphaël, Youssef Mroueh, Thanh V Nguyen, Payel Das, and Steven G Johnson. 2020. “Active Learning of Deep Surrogates for PDEs: Application to Metasurface Design.” Npj Computational Materials 6 (1): 164.

Peters, Bas, and Felix J Herrmann. 2017. “Constraints Versus Penalties for Edge-Preserving Full-Waveform Inversion.” The Leading Edge 36 (1): 94–100.

Peters, Bas, Brendan R Smithyman, and Felix J Herrmann. 2019. “Projection Methods and Applications for Seismic Nonlinear Inverse Problems with Multiple Constraints.” Geophysics 84 (2): R251–69.

Plessix, R-E. 2006. “A Review of the Adjoint-State Method for Computing the Gradient of a Functional with Geophysical Applications.” Geophysical Journal International 167 (2): 495–503.

Powell, Michael JD. 1985. “Radial Basis Functions for Multivariable Interpolation: A Review.” Algorithms for the Approximation of Functions and Data.

Qian, Elizabeth, Boris Kramer, Benjamin Peherstorfer, and Karen Willcox. 2020. “Lift & Learn: Physics-Informed Machine Learning for Large-Scale Nonlinear Dynamical Systems.” Physica D: Nonlinear Phenomena 406: 132401.

Rahman, Md Ashiqur, Zachary E Ross, and Kamyar Azizzadenesheli. 2022. “U-No: U-Shaped Neural Operators.” arXiv Preprint arXiv:2204.11127.

Rasmussen, Atgeirr Flø, Tor Harald Sandve, Kai Bao, Andreas Lauser, Joakim Hove, Bård Skaflestad, Robert Klöfkorn, et al. 2021. “The Open Porous Media Flow Reservoir Simulator.” Computers & Mathematics with Applications 81: 159–85.

Razavi, Saman, Bryan A Tolson, and Donald H Burn. 2012. “Review of Surrogate Modeling in Water Resources.” Water Resources Research 48 (7).

Rezende, Danilo, and Shakir Mohamed. 2015. “Variational Inference with Normalizing Flows.” In International Conference on Machine Learning, 1530–38. PMLR.

Schilders, Wilhelmus HA, Henk A Van der Vorst, and Joost Rommes. 2008. Model Order Reduction: Theory, Research Aspects and Applications. Vol. 13. Springer.

Sheriff, Robert E, and Lloyd P Geldart. 1995. Exploration Seismology. Cambridge university press.

Siahkoohi, Ali, and Felix J. Herrmann. 2021. “Learning by Example: Fast Reliability-Aware Seismic Imaging with Normalizing Flows.” In First International Meeting for Applied Geoscience & Energy, 1580–85. Society of Exploration Geophysicists; Expanded Abstracts. https://doi.org/10.1190/segam2021-3581836.1.

Siahkoohi, Ali, Gabrio Rizzuti, Mathias Louboutin, Philipp Witte, and Felix J. Herrmann. 2021. “Preconditioned Training of Normalizing Flows for Variational Inference in Inverse Problems.” https://openreview.net/pdf?id=P9m1sMaNQ8T.

Siahkoohi, Ali, Gabrio Rizzuti, Rafael Orozco, and Felix J Herrmann. 2023a. “Reliable Amortized Variational Inference with Physics-Based Latent Distribution Correction.” Geophysics 88 (3): R297–322.

Siahkoohi, Ali, Gabrio Rizzuti, Rafael Orozco, and Felix J. Herrmann. 2023b. “Reliable Amortized Variational Inference with Physics-Based Latent Distribution Correction.” Geophysics 88 (3). https://doi.org/10.1190/geo2022-0472.1.

Stanimirović, Predrag S, and Marko B Miladinović. 2010. “Accelerated Gradient Descent Methods with Line Search.” Numerical Algorithms 54: 503–20.

Tarantola, Albert. 1984. “Inversion of Seismic Reflection Data in the Acoustic Approximation.” Geophysics 49 (8): 1259–66.

———. 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM.

Wang, Zhou, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. “Image Quality Assessment: From Error Visibility to Structural Similarity.” IEEE Transactions on Image Processing 13 (4): 600–612.

Wen, Gege, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M Benson. 2022. “U-FNO—an Enhanced Fourier Neural Operator-Based Deep-Learning Model for Multiphase Flow.” Advances in Water Resources 163: 104180.

Wen, Gege, Zongyi Li, Qirui Long, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally Benson. 2023. “Real-Time High-Resolution CO2 Geological Storage Prediction Using Nested Fourier Neural Operators.” Energy & Environmental Science.

Wen, Jeffrey, Rizwan Ahmad, and Philip Schniter. 2023. “A Conditional Normalizing Flow for Accelerated Multi-Coil MR Imaging.” arXiv Preprint arXiv:2306.01630.

Witte, Philipp A, Russell J Hewett, Kumar Saurabh, AmirHossein Sojoodi, and Ranveer Chandra. 2022. “SciAI4Industry–Solving PDEs for Industry-Scale Problems with Deep Learning.” arXiv Preprint arXiv:2211.12709.

Witte, Philipp A, Tugrul Konuk, Erik Skjetne, and Ranveer Chandra. 2023. “Fast CO2 Saturation Simulations on Large-Scale Geomodels with Artificial Intelligence-Based Wavelet Neural Operators.” International Journal of Greenhouse Gas Control 126: 103880.

Witte, Philipp A., Mathias Louboutin, Navjot Kukreja, Fabio Luporini, Michael Lange, Gerard J. Gorman, and Felix J. Herrmann. 2019. “A Large-Scale Framework for Symbolic Implementations of Seismic Inversion Algorithms in Julia.” Geophysics 84 (3): F57–71. https://doi.org/10.1190/geo2018-0174.1.

Witte, Philipp A, WA Redmond, Russell J Hewett, and Ranveer Chandra. 2022. “Industry-Scale CO2 Flow Simulations with Model-Parallel Fourier Neural Operators.” In NeurIPS 2022 Workshop Tackling Climate Change with Machine Learning.

Witte, Philipp, Mathias Louboutin, Rafael Orozco, grizzuti, Ali Siahkoohi, Felix Herrmann, Bas Peters, Páll Haraldsson, and Ziyi Yin. 2023. Slimgroup/InvertibleNetworks.jl: V2.2.5 (version v2.2.5). Zenodo. https://doi.org/10.5281/zenodo.7850287.

Yang, Yan, Angela F Gao, Kamyar Azizzadenesheli, Robert W Clayton, and Zachary E Ross. 2023. “Rapid Seismic Waveform Modeling and Inversion with Neural Operators.” IEEE Transactions on Geoscience and Remote Sensing 61: 1–12.

Yin, Ziyi, Grant Bruer, and Mathias Louboutin. 2023. Slimgroup/JutulDarcyRules.jl: V0.2.5 (version v0.2.5). Zenodo. https://doi.org/10.5281/zenodo.7863970.

Yin, Ziyi, Huseyin Tuna Erdinc, Abhinav Prakash Gahlot, Mathias Louboutin, and Felix J Herrmann. 2023. “Derisking Geologic Carbon Storage from High-Resolution Time-Lapse Seismic to Explainable Leakage Detection.” The Leading Edge 42 (1): 69–76. https://doi.org/10.1190/tle42010069.1.

Yin, Ziyi, Ali Siahkoohi, Mathias Louboutin, and Felix J. Herrmann. 2022. “Learned Coupled Inversion for Carbon Sequestration Monitoring and Forecasting with Fourier Neural Operators.” https://doi.org/10.1190/image2022-3722848.1.

Zhang, Xin, and Andrew Curtis. 2020. “Seismic Tomography Using Variational Inference Methods.” Journal of Geophysical Research: Solid Earth 125 (4): e2019JB018589.

———. 2021. “Bayesian Geophysical Inversion Using Invertible Neural Networks.” Journal of Geophysical Research: Solid Earth 126 (7): e2021JB022320.

Zhao, Xuebin, Andrew Curtis, and Xin Zhang. 2021. “Bayesian seismic tomography using normalizing flows.” Geophysical Journal International 228 (1): 213–39. https://doi.org/10.1093/gji/ggab298.

Zhdanov, Michael S, Masashi Endo, Noel Black, Lee Spangler, Stacey Fairweather, A Hibbs, GA Eiskamp, and R Will. 2013. “Electromagnetic Monitoring of CO2 Sequestration in Deep Reservoirs.” First Break 31 (2).

Abstract

Introduction

Fourier neural operators

Inversion with learned surrogates

Learned constraints with normalizing flows

Training normalizing flows

Trained normalizing flows as constraints

Inversion with progressively relaxed learned constraints

Algorithm 1: Inversion with relaxed learned constraints

Numerical Experiments

Unconstrained/constrained permeability inversion from CO2 saturation data

Unconstrained/constrained permeability inversion from well observations

Multiphysics end-to-end inversion

Jointly inverting time-lapse seismic data and well measurements

CO2 plume estimation and forecast

Analysis of computational gains

Discussion and conclusions

Availability of data and materials

Abbreviations

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

References

Unconstrained/constrained permeability inversion from CO₂ saturation data

CO₂ plume estimation and forecast