CO2 reservoir monitoring through Bayesian data assimilation

Grant Bruer Felix Herrmann Edmond Chow
Grant Bruer image Felix Herrmann image Edmond Chow image

Motivation for this talk

Premises:

  • We want to use data assimilation to monitor CO2 reservoirs.
  • We want to use ML (normalizing flow) for data assimilation.
  • ML techniques can be difficult to debug, trust, and interpret.


Therefore, we should have:

  • a well-established non-ML assimilator to compare results
  • a non-ML assimilation to easily compare algorithms

The ensemble Kalman filter is the solution.

Outline

  1. Motivation for data assimilation
  2. CO2 reservoir fluid flow and seismic imaging
  3. Data assimilation theory
  4. Classic technique: Kalman filter
  5. Data assimilation with normalizing flow
  6. Ensemble Kalman filter

Data assimilation

  • Data assimilation: the process of combining information from multiple sources.
  • We want to monitor CO2 reservoirs with indirect, noisy seismic observations.
  • But observations alone are insufficient.
  • Solution: take advantage of prior knowledge and known physics.
  • Benefits:
    • more accurate state estimates
    • more accurate predictions
    • estimates of uncertainty
  • Relevant field with success:
    • weather forecasting (high-dimensional, highly nonlinear physics)

Two-phase flow equations (highly nonlinear)

  • Two fluids: brine and supercritical CO2.

  • For fluid \(i = 1,2\):

Mass conservation: \(\frac{\partial(\color{red}{\phi}\rho_i \color{teal}{S_i})}{\partial t} = \nabla \cdot (\rho_i v_i) + \rho_i q_i\)
Darcy flow law: \(v_i = - \frac{k_{ri}}{\mu_i} \color{red}{K} \nabla (\color{teal}{P_i} - \rho_i g Z)\)
Modified Brooks-Corey: \(k_{ri} = \text{clamp}(1.25^2(\color{teal}{S_i} - 0.1)^2, 0, 1)\)

Definitions:

Saturation: \(\color{teal}{S_i}\), \(\, \color{teal}{S_1}+\color{teal}{S_2} = 1\)
Pressure: \(\color{teal}{P_i}\), \(\, \color{teal}{P_1}-\color{teal}{P_2} = P_c\)
Porosity: \(\color{red}{\phi}\)
Permeability: \(\color{red}{K}\)
Density: \(\rho_i\)
Viscosity: \(\mu_i\)
Injection: \(q_2\)
Depth: \(Z\)
Gravity: \(g\)
Capillary pressure: \(P_c\)

Seismic wave equations (highly nonlinear)

  • The presence of CO2 modifies the squared slowness \(m\).
Wave equation: \(m \frac{\partial^2 \color{#d95f02}{p}}{\partial t^2} - \nabla \cdot \nabla \color{#d95f02}{p} + \eta \frac{\partial \color{#d95f02}{p}}{\partial t} = q\)
Patchy-saturation model: \(m = (\rho_0 + \color{teal}{S_2} \phi (\rho_2 - \rho_1))\left(\lambda_{s1}^{-1} + \color{teal}{S_2}(\lambda_{s2}^{-1}-\lambda_{s1}^{-1}) \right)\)


Definitions:

Saturation: \(\color{teal}{S_i}\), \(\, \color{teal}{S_1}+\color{teal}{S_2} = 1\)
Pressure change: \(\color{#d95f02}{p}\)
Squared slowness: \(m\)
Seismic source: \(q\)
Brine rock density: \(\rho_0\)
Fluid density: \(\rho_1\), \(\rho_2\)
P-wave modulus: \(\lambda_{s1}\), \(\lambda_{s2}\)
PML parameter: \(\eta\)

Our CO2 system

GIF representing fluid flow

CO2 saturation \(x_k\)

GIF representing seismic observation

Seismic image \(y_k\)

 


Notation for data assimilation:

  • timestep \(k\)
  • Fluid flow physics: \(x_k = g(x_{k-1}) + \epsilon_g\)
  • Seismic observation: \(y_k = h(x_k) + \epsilon_h\)

Data assimilation

  • Initialize: Start with a prior distribution at timestep \(k = 0\).
    • \(\color{#d95f02}{\text{initial prior: } p(x_0)}\)

Then for each timestep \(k\), we have a measurement \(y_k\) and a previous estimate \(x_{k-1}\).

  1. Predict: Marginalize over the previous state to get the current state’s probability distribution.
    • \(\color{#1b9e77}{\text{predictive prior: } p(x_k | y_{1:k-1})} = \int \color{#050505}{p(x_k | x_{k-1})} \color{#d95f02}{p(x_{k-1} | y_{1:k-1})} dx_{k-1}\)
  2. Update: Use Bayes’ rule to incorporate measurements.
    • \(\color{#d95f02}{\text{posterior: } p(x_k | y_{1:k})} \propto \color{#050505}{p(y_k | x_k)} \color{#1b9e77}{p(x_k | y_{1:k-1})}\)

Classic technique (Kalman filter)

Assume everything is Gaussian.

  • The initial distribution is Gaussian: \(\color{#d95f02}{p(x_0)}\)
  • The transition density is Gaussian: \(\color{#050505}{p(x_k | x_{k-1})}\)
    • Therefore, the predictive prior is Gaussian: \(\color{#1b9e77}{p(x_k | y_{1:k-1})}\)
    • This implies the transition is linear with Gaussian noise.
  • The data likelihood is Gaussian: \(\color{#050505}{p(y_k | x_k)}\)
    • Therefore, the posterior is Gaussian: \(\color{#d95f02}{p(x_k | y_{1:k})}\)
    • This implies the observation is linear with Gaussian noise.

Thus, we only need to track a mean and covariance and update them linearly.


However, to properly handle nonlinearity, we need a different viewpoint.

Sample point of view

Instead of probability densities, let’s work in terms of samples from distributions.

  1. Predict: Marginalize over the previous state to get the current state’s probability distribution. \[\begin{align} \color{#1b9e77}{p(x_k | y_{1:k-1})} = \int \color{#050505}{p(x_k | x_{k-1})} \color{#d95f02}{p(x_{k-1} | y_{1:k-1})} dx_{k-1} \end{align}\]
    • We have samples \(\color{#d95f02}{x^{(i)}_{k-1}}\) that are drawn from \(\color{#d95f02}{p(x_{k-1} | y_{1:k-1})}\).
    • Compute samples of the predictive prior by computing \(\color{#1b9e77}{\hat x^{(i)}_k} = g(\color{#d95f02}{x^{(i)}_{k-1}}) + \epsilon_g\) for each sample.

Sample point of view

  1. Predict: \(\color{#1b9e77}{\hat x^{(i)}_k} = g(\color{#d95f02}{x^{(i)}_{k-1}}) + \epsilon_g\) for each sample.

Sample point of view

  1. Predict: \(\color{#1b9e77}{\hat x^{(i)}_k} = g(\color{#d95f02}{x^{(i)}_{k-1}}) + \epsilon_g\) for each sample.

  2. Update: Use Bayes’ rule to incorporate measurements. \[\begin{align} \color{#d95f02}{p(x_k | y_{1:k})} \propto \color{#050505}{p(y_k | x_k)} \color{#1b9e77}{p(x_k | y_{1:k-1})} \end{align}\]

    • Compute samples of the data likelihood by computing \(\color{#050505}{\hat y^{(i)}_k} = h(\color{#1b9e77}{\hat x^{(i)}_k}) + \epsilon_h\) for each sample.
    • Now we have joint samples \(\color{#1b9e77}{\hat x^{(i)}_k}, \color{#050505}{\hat y^{(i)}_k} \sim p(x_k, y_k | y_{1:k-1})\) but we want to sample from \(\color{#d95f02}{p(x_k | y_{1:k})}\).

Sample point of view

But how do we sample from the posterior?

  • Keep only samples that are close to \(y_k\)
    • This removes most of our samples.
  • Move the samples to give results closer to \(y_k\).
    • How? Normalization.

Tangent: why do we want to normalize?

For a unit normal, the condition distribution is identical for each conditioned value.

Image showing 2D unit normal heatmap

\(p(x, y)\): 2D unit normal
  • Look at slices of specific \(y\) values.
  • Each \(p(x, y)\) slice has the same mean, covariance, and shape.

Image showing 2D unit normal slices

\(p(x, y)\): 1D slices for fixed \(y\)
  • So after we normalize it using \(\color{#e7298a}{p(x | y) = p(x, y) / p(y)}\), we get identical distributions for any value of \(y\).

Tangent: why do we want to normalize?

For a unit normal, the condition distribution is identical for each conditioned value.

Image showing 2D conditioned unit normal heatmap

\(p(x | y)\): 2D conditioned unit normal
  • Look at slices of specific \(y\) values.
  • Each \(p(x, y)\) slice has the same mean, covariance, and shape.

Image showing 2D conditioned unit normal slices

\(p(x | y)\): 1D slices for fixed \(y\)
  • So after we normalize it using \(\color{#e7298a}{p(x | y) = p(x, y) / p(y)}\), we get identical distributions for any value of \(y\).

Tangent: why do we want to normalize?

Moving joint samples to the posterior is simple in the normal case.

Image showing 2D conditioned normal heatmap with samples

Joint samples \(x, y\)

Image showing 2D conditioned normal heatmap with samples

Conditioned samples \(p(x | y=1.1)\)
  • Here, I easily transform joint samples of \(p(x, y)\) to samples of \(p(x | y = 1.1)\).

Tangent: why do we want to normalize?

This does not work for a correlated distribution.

Image showing 2D correlated heatmap

\(p(x, y)\): 2D correlated
  • Look at slices of specific \(y\) values.
  • Each \(p(x, y)\) slice is different.

Image showing 2D correlated slices

\(p(x, y)\): 1D slices for fixed \(y\)
  • So after we normalize it using \(\color{#e7298a}{p(x | y) = p(x, y) / p(y)}\), we don’t get identical distributions.

Tangent: why do we want to normalize?

This does not work for a correlated distribution.

Image showing 2D conditioned correlated heatmap

\(p(x | y)\): 2D conditioned correlated
  • Look at slices of specific \(y\) values.
  • Each \(p(x, y)\) slice is different.

Image showing 2D conditioned correlated slices

\(p(x | y)\): 1D slices for fixed \(y\)
  • So after we normalize it using \(\color{#e7298a}{p(x | y) = p(x, y) / p(y)}\), we don’t get identical distributions.

Conditional normalization

  • Conditional normalization removes correlations in \(x\) and \(y\).
  • Conditional sampling is very easy in the de-correlated space.
  • Invertible normalization lets us convert conditioned samples back to the correlated space.

Conditional normalization

  1. Find an invertible transformation \(N_k\) such that \(z_k = N_k(x_k; y_k)\) is normal.

    • Conditional normalizing function: \(N_k\)
    • Latent variable: \(z_k\)
  1. Sample \(\color{#d95f02}{x^{(i)}_k}\) from posterior \(\color{#d95f02}{p(x_k | y_{1:k})}\)

    • Compute latent space \(\hat z^{(i)}_k = N_k(\hat x^{(i)}_k; \hat y^{(i)}_k)\)
    • Convert back to real space using observed data: \(\color{#d95f02}{x^{(i)}_k} = N_k^{-1}(\hat z^{(i)}_k; y_k)\)

Then \(\color{#d95f02}{x^{(i)}_k}\) are samples of the posterior.



\(\implies\) Let’s see how this relates to classic techniques.

Classic technique (Kalman filter)

Assume the variables are Gaussian.

\[\begin{align} \begin{bmatrix}x \\ y \end{bmatrix}\sim \mathcal{N}\left(\begin{bmatrix}\mu_x \\ \mu_y \end{bmatrix}, \begin{bmatrix}B_x & B_{yx} \\ B_{xy} & B_y \end{bmatrix}\right) \end{align}\]

The conditional distribution \(p(x|y)\) is a known Gaussian.

\[\begin{align} p(x|y) = \mathcal{N}(\mu_x + B_{xy}B_y^{-1} (\mu_y - y), \,\, B_x - B_{xy}B_y^{-1}B_{yx}) \end{align}\]

Gaussians can be normalized with the Cholesky factor \(C\) of the covariance. \[\begin{align} z = C^{-1} \big( (x - \mu_x) + B_{xy}B_y^{-1} (\mu_y - y) \big) \end{align}\]

The Kalman filter uses a linear normalizer!

Major conclusion: Data assimilation via conditional normalizing flow is well-established.

Classic technique (Kalman filter)

Success: Efficient data assimilation that is correct for Gaussians.


Failures:

  • Linearizes transition and observation operators
  • Assumes Gaussianity
  • Stores the covariance matrix


Solution: Ensemble Kalman filter

  • Uses nonlinear transition and observation operators
  • Assumes Gaussianity.
  • Uses low-rank covariance matrix

Ensemble Kalman filter

  1. Initialize ensemble at time step \(k = 0\) with \(N\) members: \(\{\color{#d95f02}{x_0^{(1)}}, \ldots, \color{#d95f02}{x_0^{(N)}}\}\).
  2. Predict: Advance each ensemble member to the next time step.
    • \(\color{#1b9e77}{\hat x_k^{(i)}} = g(\color{#d95f02}{x_{k-1}^{(i)}}) + \epsilon_{g}\) for each sample \(i\)
    • Important! Each sample has a different instance of unknown parameters (permeability).

Ensemble Kalman filter

  1. Update: assimilate observation \(y\) at time step \(k\)
    • \(\color{#050505}{\hat y_k^{(i)}} = h(\color{#1b9e77}{\hat x_k^{(i)}})\) for each sample \(i\), \(\quad \epsilon_h \sim \mathcal{N}(0, R)\)
    • Compute mean and deviations: \(\hat X = \{\color{#1b9e77}{\hat x_k^{(i)}} - \mu_x\} / \sqrt{N - 1}\), \(\hat Y = \{\color{#050505}{\hat y_k^{(i)}} - \mu_y\} / \sqrt{N - 1}\)
    • Linear update: \(\color{#d95f02}{x_k^{(i)}} = \color{#1b9e77}{\hat x_k^{(i)}} + \hat X \hat Y^T(\hat Y \hat Y^T + R)^{-1}(y - (\color{#050505}{\hat y_k^{(i)}} + \epsilon_h))\)

Ensemble Kalman filter

  1. Update: assimilate observation \(y\) at time step \(k\)
    • \(\color{#050505}{\hat y_k^{(i)}} = h(\color{#1b9e77}{\hat x_k^{(i)}})\) for each sample \(i\), \(\quad \epsilon_h \sim \mathcal{N}(0, R)\)
    • Compute mean and deviations: \(\hat X = \{\color{#1b9e77}{\hat x_k^{(i)}} - \mu_x\} / \sqrt{N - 1}\), \(\hat Y = \{\color{#050505}{\hat y_k^{(i)}} - \mu_y\} / \sqrt{N - 1}\)
    • Linear update: \(\color{#d95f02}{x_k^{(i)}} = \color{#1b9e77}{\hat x_k^{(i)}} + \hat X \hat Y^T(\hat Y \hat Y^T + R)^{-1}(y - (\color{#050505}{\hat y_k^{(i)}} + \epsilon_h))\)
  • The linear update is identical to using a linear normalizer:
    • Sample covariances: \(B_x = XX^T\), \(B_y = YY^T\), \(B_{xy} = XY^T\)
    • Conditionally normalize: \(z_k^{(i)} = C^{-1} (\color{#1b9e77}{\hat x_k^{(i)}} - \mu_x) + C^{-1} B_{xy}B_y^{-1} (\mu_y - \color{#050505}{\hat y_k^{(i)}})\)
    • Conditionally unnormalize: \(\color{#d95f02}{x_k^{(i)}} = C z_k^{(i)} + \mu_x + B_{xy} B_y^{-1} (y_k - \mu_y)\)
    • The Cholesky factor \(C\) cancels with its inverse, thus yielding the linear update formula.

Our test system

  • Compass model (realistic synthetic model based on North Sea)

  • 3.2 km by 2.1 km, discretized on 256 by 341 grid, 12.5 m by 6.25 m cell size

  • 8 sources spaced across surface

  • 200 receivers spaced along ocean bottom

  • Shots generated with Born approximation (linear in impedance)

    • Linearized about smoothed version of the true model.
  • 8 dB noise added to shots

  • 1.96 Mtons CO2 injected per year

  • Heterogeneous anisotropic permeability

  • Seismic measurements every 400 days.

Our test system

GIF representing fluid flow

CO2 saturation (80 day intervals)

GIF representing seismic observation

RTM minus baseline (400 day intervals)

 

Simulated observation every 400 days.

Preliminary results

GIF ground-truth

Ground-truth

GIF ensemble Kalman filter

Ensemble Kalman filter

 

256 ensemble members each evolving with different permeability models.

Comparison with concurrent work

  • Ensemble Kalman filter

    • Makes Gaussian assumption (linear conditional normalizer)

  • Conditional normalizing flow

    • Makes no Gaussian or linear assumptions
    • Allows a nonlinear update
    • Requires training a neural network
  • Runtime for both is dominated by simulating fluid flow and seismic measurement.

Conclusion

  • We found a non-ML method for DA that is mathematically identical to using normalizing flows for DA.
  • It allows marginalizing over unknown parameters such as permeability.
  • We are currently working to develop this method for seismic monitoring.


Future work

  • Explore where the ensemble Kalman filter can outperform conditional normalizing flow.
  • Estimate saturation, pressure, permeability, porosity, velocity, and density.
  • Use nonlinear modeling instead of Born modeling.
  • Use a worse background model for RTMs.
    • Use offset gathers (Described in Francis’s talk yesterday).

Time for questions and acknowledgements

  • Many thanks to Abhinav, Francis, and Rafael for help with simulating the plume and seismic measurements.