--- title: "Enhancing robustness of Digital Shadow for CO~2~ Storage Monitoring with Augmented Rock Physics Modeling" author: - name: Abhinav Prakash Gahlot affiliations: name: Georgia Institute of Technology - name: Felix J. Herrmann affiliations: name: Georgia Institute of Technology bibliography: abstract.bib crossref: fig-prefix: figure eq-prefix: equation filters: - SLIM - pseudocode format: html: page-layout: full sidebar: false lightbox: true crossrefs-hover: true arxiv-pdf: keep-tex: true linenumbers: false doublespacing: false pdf: template: EAGEabstractTemplate.latex csl: apa.csl keep-tex: true # abstract: --- ::: {.hidden} \newcommand{\argmin}{\mathop{\mathrm{argmin}\,}\limits} \newcommand{\argmax}{\mathop{\mathrm{argmax}\,}\limits} $$ \def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} $$ ::: # Introduction In order to achieve climate targets, the IPCC emphasizes the importance of technologies capable of removing gigatonnes of CO~2~ each year, with Geological Carbon Storage (GCS) being a central component of this strategy [@ipcc2018global; @ringrose2020store]. GCS involves capturing CO~2~ and injecting it into deep geological formations, where it can be safely stored over long periods. To ensure the safety of GCS, accurate monitoring of subsurface CO~2~ behavior is essential, as it guarantees that CO~2~ plumes remain contained and do not leak into surrounding formations [@ringrose2023storage]. Time-lapse seismic imaging has proven vital for tracking the migration of CO~2~ plumes, yet it often fails to fully capture the complexities of multi-phase subsurface flow. Digital Shadows (DS), which utilize machine learning-driven data assimilation techniques such as nonlinear Bayesian filtering and generative AI [@spantini2022coupling; @gahlot2024uads], offer a more detailed and reliable approach for monitoring CO~2~ storage [@herrmann2023president; @gahlot2023NIPSWSifp; @gahlot2024uads]. By incorporating uncertainty in reservoir properties like permeability, this framework improves the accuracy of CO~2~ migration forecasts, including plume pressure and saturation, and thereby reduces risks for GCS projects. However, data assimilation depends on assumptions regarding reservoir properties, rock physics models linking reservoir state to seismic properties, and initial conditions. If these assumptions are inaccurate, predictions may become unreliable, which in turn can jeopardize the safety of GCS operations. One way to mitigate this risk is by augmenting the forecast ensemble used to train the neural networks responsible for the data assimilation process—mapping prior forecast samples to the posterior. In this presentation, we demonstrate that augmenting the forecast ensemble by incorporating various rock physics models mitigates the negative effects of using inaccurate models (e.g., uniform versus patchy saturation models). Additionally, we find that ensemble augmentation can enhance predictive accuracy in certain scenarios. # Methodology Building on the formulation in @gahlot2024uads for CO~2~-plume dynamics and observations, we introduce an uncertainty-aware Digital Shadow framework, described by the following equation: $$ \begin{aligned} \mathbf{x}_k & = \mathcal{M}_k\bigl(\mathbf{x}_{k-1}, \boldsymbol{\kappa}_k; t_{k-1}, t_k\bigr)\\ & = \mathcal{M}_k\bigl(\mathbf{x}_{k-1}, \boldsymbol{\kappa}_k\bigr), \ \boldsymbol{\kappa}_k \sim p(\boldsymbol{\kappa}) \quad \text{for}\quad k=1, \dots, K. \end{aligned} $$ {#eq-dynamics} Here, $\mathbf{x}_k$ represents the time-varying spatial distribution of CO~2~ saturation and pressure perturbations measured at time steps $k = 1, \dots, K$, while $\mathcal{M}_k$ is the multi-phase fluid-flow simulation operator that evolves the state from time $t_{k-1}$ to $t_k$ for each time step. The symbol $\boldsymbol{\kappa}$ denotes the permeability distribution, which is highly heterogeneous [@ringrose2020store; @ringrose2023storage]. Since exact permeability values are unknown, we assume a statistical distribution $p(\boldsymbol{\kappa})$ and sample from it during each timestep. The state of CO~2~ saturation and reservoir pressure perturbations, $\mathbf{x}_{k-1}$, is then propagated forward to $\mathbf{x}_k$. While reservoir simulations provide accurate modeling of CO~2~ plume dynamics, relying solely on these simulations is impractical due to the inherent stochasticity of permeability and the uncertainty in rock physics models, which remain poorly constrained. To improve the accuracy of CO~2~ plume forecasts, monitoring is key. Time-lapse seismic data [@lumley20104d] offer essential insights into plume behavior, and we integrate these data into our framework to enhance the characterization of the CO~2~ plume. The seismic data are modeled as: $$ \mathbf{y}_k = \mathcal{H}_k(\mathbf{x}_k;\mathcal{R}_k) + \boldsymbol{\epsilon}_k, \quad \boldsymbol{\epsilon}_k \sim p(\boldsymbol{\epsilon}), \quad \mathcal{R}_k \sim p(\mathcal{R}) \quad \text{for}\quad k=1, \dots, K $$ {#eq-obs} where $\mathcal{H}_k$ is the observation operator, $\mathbf{y}_k$ is the seismic data observed at timestep $k$, and $\boldsymbol{\epsilon}_k$ represents colored Gaussian noise added to the seismic shot records before reverse-time migration. The symbol $\mathcal{R}_k$ denotes the rock physics model, randomly drawn from the family of Brie Saturation models [@avseth2010quantitative], with the exponent $e$ uniformly sampled from the range $e \sim \mathcal{U}(1,10)$. To combine fluid-flow simulations with time-lapse seismic data, we use a nonlinear Bayesian filtering approach, incorporating neural posterior density estimation to approximate the posterior distribution of the CO~2~ plume state conditioned on seismic observations. This approach merges sequential Bayesian inference with amortized neural posterior density estimation [@nf]. To estimate the posterior distribution of the CO~2~ plume from seismic data, we employ Conditional Normalizing Flows (CNFs) [@gahlot2024uads]. In this simulation-driven framework, we generate simulation pairs consisting of CO~2~ plume dynamics and corresponding seismic observations (cf. equations [-@eq-dynamics] and [-@eq-obs]) and train the CNFs on these pairs. To account for different rock physics models, we create multiple seismic images for each CO~2~ plume sample, resulting in an augmented dataset. # Synthetic Case Study To evaluate our approach, we use a synthetic 2D Earth model derived from the Compass model [@BG], representative of the North Sea region. We focus on a subset of this model, containing subsurface features suitable for CO~2~ injection, which is discretized into a grid of $512 \times 256$ with a grid spacing of $6.25 \ \mathrm{m}$. To initialize the probabilistic digital shadow, we require an ensemble of potential CO~2~ plume scenarios. Given the uncertainty in the permeability distribution of CO~2~ storage sites, we first establish a probabilistic baseline for permeability, which we use to initialize the plume. The permeability distribution is obtained through probabilistic full-waveform inversion [@yin2024wise], assuming that a baseline seismic survey was conducted prior to CO~2~ injection. This permeability distribution is then converted into permeability samples using the empirical relationship from [@gahlot2024uads]. ## Multi-Phase Flow Simulations Flow simulations are carried out using the open-source tool JutulDarcy [JutulDarcy.jl](https://github.com/sintefmath/JutulDarcy.jl) [@jutuldarcy]. Initially, the reservoir is filled with brine, and supercritical CO~2~ is injected at a rate of $0.0500 \ \mathrm{m^3/s}$ over a period of 1920 days. The injection depth is approximately $1200 \ \mathrm{m}$. The simulation is run over four time-lapse intervals, $t_k$, producing predicted CO~2~ saturation for each of the $N=128$ ensemble members at each timestep. ## Augmented Seismic Simulations The outputs from the 128 flow simulations are converted into changes in subsurface acoustic properties using 10 different exponents of the Brie Saturation model [@avseth2010quantitative], thus augmenting the data tenfold by generating different acoustic changes corresponding to each flow simulation. Seismic surveys are conducted with 8 receivers and 200 sources, using a dominant frequency of 15 Hz and a recording duration of 1.8 seconds. 28 dB SNR colored Gaussian noise is added to the shot records. Nonlinear wave simulations and imaging are performed using the open-source package [JUDI.jl](https://github.com/slimgroup/JUDI.jl) [@witte2018alf; @JUDI]. ## CNF Training The $1280$ ensemble members are used as training pairs, consisting of forecasted CO~2~ plumes and corresponding seismic observations. We train a Conditional Normalizing Flow (CNF) using the open-source package [InvertibleNetworks.jl](https://github.com/slimgroup/InvertibleNetworks.jl) [@orozco2023invertiblenetworks]. The network weights $\boldsymbol{\phi}$ are optimized by minimizing the following objective over 120 epochs using the $\textsc{ADAM}$ optimizer [@Kingma2014AdamAM]: $$ \widehat{\boldsymbol{\phi}} = \argmin_{\boldsymbol{\phi}} \frac{1}{M}\sum_{m=1}^M \Biggl(\frac{\Big\|f_{\boldsymbol{\phi}}(\mathbf{x}^{(m)};\mathbf{y}^{(m)})\Big\|_2^2}{2} - \log\Bigl |\det\Bigl(\mathbf{J}^{(m)}_{f_{\boldsymbol{\phi}}}\Bigr)\Bigr |\Biggr). $$ {#eq-loss-CNF} where $\mathbf{J}$ is the Jacobian of the network $f_{\theta}$ with respect to its input, and $M$ is the number of training samples. For further details, we refer to [@gahlot2024uads]. # Results The performance of our enhanced Digital Shadow framework, incorporating rock physics, is shown in @fig-sup. The top row in each figure presents, from left to right, the ground truth (GT) CO~2~ plume, the conditional mean of the posterior samples, and a sample from the posterior. The bottom row displays, from left to right, the seismic observation, the error between the conditional mean and the ground truth, and the uncertainty. As shown in @fig-naug-idy, the non-augmented DS performs well when the seismic observation matches the correct rock physics model, with the conditional mean closely approximating the GT. However, performance degrades when the seismic observation is based on a different, unknown rock physics model, as seen in @fig-naug-oody, where the conditional mean deviates from the GT, leading to increased error and uncertainty. Augmenting the ensemble improves generalization when seismic observations are based on an unknown rock physics model, as demonstrated in @fig-aug-oody. In this case, the conditional mean is closer to the GT, and both error and uncertainty are reduced compared to the non-augmented approach. ::: {#fig-sup layout-nrow=3} ![Non-augmented, in-distribution y](./figs/k1_patchy.png){#fig-naug-idy} ![Non-augmented, out-of-distribution y](./figs/k1_unisat.png){#fig-naug-oody} ![Augmented, out-of-distribution y](./figs/k1_augm.png){#fig-aug-oody} Inference results of digital shadow at $k=1$ *(a)* without data augmentation and known rock physics model for seismic observation *(b)* without data augmentation and unknown rock physics model for seismic observation *(c)* with data augmentation and unknown rock physics model for seismic observation. ::: # Conclusions This study demonstrates that uncertainties in permeability and rock physics models can significantly impact the accuracy of CO~2~ plume predictions. By augmenting the forecast ensemble, DS is able to account for unknown rock physics models (at least within the family of Brie saturation models where the exponent $e$ is not specified) and fluid-flow properties. The enriched dataset improves the fidelity of CO~2~ plume forecasts, thereby enhancing the reliability of GCS monitoring. # Acknowledgement This research was carried out with the support of Georgia Research Alliance, partners of the ML4Seismic Center and in part by the US National Science Foundation grant OAC 2203821. The overall readability is enhanced using ChatGPT 4. # References {.unnumbered} ::: {#refs} :::