ML4Seismic Partners Meeting - 2024

sidebar

Date:

November 13-15, 2024

Venue:

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology.

Hotels:

Lodging options recommended by the institute (with Georgia Tech deals) can be found here.

More hotel choices can be found here.

Contact:

Ghassan AlRegib, co-director
alregib@gatech.edu
Felix Herrmann, co-director
felix.herrmann@gatech.edu

overview

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology. The meeting is scheduled for November 13—15, 2024.

Center for Machine Learning for Seismic (ML4Seismic)

A joint initiative at the Georgia Institute of Technology between Omni Lab for Intelligent Visual Engineering and Science (Olives) lead by professor Ghassan AlRegib (ECE) and the Seismic Laboratory for Imaging and Modeling (SLIM) lead by professor Felix J. Herrmann (EAS, CSE, ECE), innovators in the energy sector, and major Cloud providers.

Georgia Tech’s Center for Machine Learning for Seismic (ML4Seismic) is designed to foster research partnerships aimed to drive innovations in artificial-intelligence assisted seismic imaging, interpretation, analysis, and monitoring in the cloud.

Through training and collaboration, the ML4Seismic Industry Partners Program promotes novel approaches that balance new insights from machine learning with established approaches grounded in physics and geology. Areas of interest include, but are not limited to, low-environmental impact time-lapse acquisition, data-constrained image segmentation, classification, physics-constrained machine learning, and uncertainty quantification. These research areas are well aligned with Georgia Tech’s strengths in computational/data sciences and engineering.

participants

ML4Seismic Industry Partners

This page will be updated as partners register to this event…

program

Program 2024 ML4Seismic Partners Meeting

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology. The meeting is scheduled for November 13-15, 2024.

Wednesday November 13

Program for Wednesday November 13 of the ML4Seismic Partners Meeting
08:00—09:00 AM Everyone Breakfast (provided)
09:00—09:15 AM Felix J. Herrmann, Ghassan AlRegib Introduction
Theme: Fault & Leakage Detection and Segmentation Approaches
(chairs TBD)
09:15—09:40 AM Jorge Quesada Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources
09:40—10:05 AM Chen Zhou Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations
10:05—10:30 AM Shiqin Zeng (new student) Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data
10:30—10:55 AM Ghazal Kaviani ExpertMatch: Consistency-Based Learning for Seismic Fault Detection
10:55—11:10 AM Break
11:10—11:35 AM Prithwijit Chowdhury Aligning Model and Label Uncertainty for Seismic Fault Interpretation
11:35—12:00 AM Yavuz Yarici Explainable AI for Seismic Fault Detection: Assisting Experts with Probability Maps
12:00—12:25 PM Seulgi Kim Predicting Faults
12:25—12:40 PM Discussion
12:40—01:40 PM Lunch (provided)
Theme: Imaging & physics-based machine learning
(Chairs TBA)
01:40—01:55 PM Jeongjin Park (new student) Physical Bayesian Inference for Two-Phase Flow Problems
01:55—02:20 PM Richard Rex Hierarchical Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models
02:20—02:45 PM Rafael Orozco The effect of patch-based training on large scale imaging problems
Theme: Generative and Ensemble Learning Techniques
(chairs TBA)
02:45—03:10 PM Chen Zhou Rethinking Generative Modeling for Seismic Interpretation
03:10—03:25 PM Break
03:25—03:50 PM Zoe Fowler Ensemble Learning and Model Aggregation Practices for Seismic Data
04:05—04:30 PM Huseyin Tuna Erdinc SAGE – Subsurface foundational model with AI-driven Geostatical Extraction
04:30—04:55 PM Mohammad Alotaibi Logs Insights for Salt Segmentation
04:55—05:10 PM Discussion
05:15—08:00 PM Industry-student mixer TBA

Thursday November 14

Program for Thursday November 14 of the ML4Seismic Partners Meeting
08:00—09:00 AM Everyone Breakfast (provided)
Theme: Digital Twins & Uncertainty Quantification
(chairs TBA)
09:00—09:25 AM Grant Bruer Seismic monitoring of CO2 plume dynamics using ensemble Kalman filtering
09:25—09:50 AM Felix J. Herrmann A Digital Shadow for Geological Carbon Storage
09:50—10:15 AM Abhinav Prakash Gahlot A Digital Twin for Geological Carbon Storage with Controlled Injectivity
10:15—10:40 AM Zijun Deng (new student) Probabilistic Joint Recovery Method for CO2 plume monitoring
10:40—10:55 AM Discussion
10:55—11:10 AM Break
Learning and Domain Generalization in Seismic Interpretation & Processing
(chairs TBA)
11:10—11:35 AM Kiran Kokilepersaud SSL in Seismic Requires Additional Volumetric Spread
11:35—12:00 PM Prithwijit Chowdhury Domain Generalized Semantic Segmentation in Seismic Facies Classification
12:00—12:15 PM Shiqin Zeng (new student) Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes
12:15—12:40 PM Mohammad Alotaibi Feature Transformation to Achieve Synthetic-to-Real Domain Adaptation
12:40—12:50 PM Discussion
12:50—01:50 PM Lunch (provided)
Wave-equation based inference and monitoring
(chairs TBA)
01:50—02:15 PM Yunlin Zeng (new student) Enhancing Full-Waveform Variational Inference through Stochastic Resampling
02:15—02:40 PM Rafael Orozco End-to-end demonstration of SAGE and WISE workflow with field data case studies
02:40—03:05 PM Richard Rex Inverse Design Framework for Optimizing Velocity Models in Seismic Imaging
03:05—03:30 PM Haoyun Li Assessing increased storage capacity due to CO2-dissolution
03:30—03:45 PM Ipsita Bhar (new student) Sensitivity of SH waves to Geological Carbon Storage
03:45—04:00 PM Discussion
06:00 PM Dinner TBA

Seismic monitoring of CO2 plume dynamics using ensemble Kalman filtering

Grant Bruer Abhinav P. Gahlot, Edmond Chow, and Felix J. Herrmann, SLIM

Abstract. Monitoring CO2 injected and stored in subsurface reservoirs is critical for avoiding failure scenarios and enables real-time optimization of CO2 injection rates. Sequential Bayesian data assimilation (DA) is a statistical method for combining information over time from multiple sources to estimate a hidden state, such as the spread of the subsurface CO2 plume. An example of scalable and efficient sequential Bayesian DA is the ensemble Kalman filter (EnKF). We improve upon existing DA literature in the seismic-CO2 monitoring domain by applying this scalable DA algorithm to a high-dimensional CO2 reservoir using two-phase flow dynamics and time-lapse full waveform seismic data with a realistic surface-seismic survey design. We show more accurate estimates of the CO2 saturation field using the EnKF compared to using either the seismic data or the fluid physics alone. Furthermore, we test a range of values for the EnKF hyperparameters and give guidance on their selection for seismic CO2 reservoir monitoring.


Physical Bayesian Inference for Two-Phase Flow Problems

Jeongjin Park, Huseyin Tuna Erdinc, Haoyun Li, Richard Rex Arockiasamy, Nisha Chandramoorthy, and Felix J. Herrmann, SLIM

Abstract. *Previous research on neural surrogate modeling of multiphase flow systems (Yin et al., 2023) has shown that even models with low generalization error in forward predictions can generate posterior estimates that are out of distribution and physically unrealistic. To address this, we propose a regularization method that leverages the Fisher Information Matrix (FIM) to guide the training process. By integrating the FIM into a differentiable optimization framework, we aim to improve the reliability of surrogate models, such as Fourier Neural Operators (FNO) (Li et al., 2020), for both forward predictions and posterior inference.

Our experiments on benchmark problems, including the Lorenz-63 system and Navier-Stokes equations, demonstrate that our approach significantly enhances physical consistency throughout time evolution, keeping predictions within the correct spatial distribution. Looking ahead, we plan to extend our framework to more complex applications, such as Geological Carbon Storage, with an emphasis on scaling FIM computations for high-dimensional problems.*


SAGE – Subsurface foundational model with AI-driven Geostatical Extraction

Huseyin Tuna Erdinc, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. In this study, we present a novel approach for synthesizing diverse subsurface velocity models using diffusion-based generative models. Traditional methods often depend on large, high-quality datasets of 2D velocity models, which can be difficult to obtain in subsurface applications. In contrast, our method leverages incomplete well and imaged seismic data to generate high-fidelity velocity samples without requiring fully sampled training datasets. The results demonstrate that the generative model accurately captures long-range geological structures and aligns well with unseen “ground-truth” velocity models. Furthermore, it is shown that the diversity of generated velocity models can be increased through prior guidance in the training phase, and model uncertainties can be reduced with well conditioning during inference. Experiments conducted with multiple datasets (Compass model, Synthoseis, and North Sea data) and velocity models featuring various geological structures (e.g., faults, salt bodies) suggest that our approach facilitates realistic subsurface velocity synthesis, providing valuable inputs for full-waveform inversion and enhancing seismic-based subsurface modeling.


A Digital Twin for Geological Carbon Storage with Controlled Injectivity

Abhinav Prakash Gahlot, Haoyun Li, Ziyi Yin, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. We present an uncertainty-aware Digital Twin (DT) for Geologic Carbon Storage (GCS), capable of handling multimodal time-lapse data and controlling CO2 injectivity to mitigate reservoir fracturing risks and optimize operations. In GCS, DT represents virtual replicas of subsurface systems that incorporate real-time data and advanced generative Artificial Intelligence (genAI) techniques, including neural posterior density estimation via simulation-based inference and sequential Bayesian inference. These methods enable the effective monitoring and control of CO2 storage projects, addressing challenges such as subsurface complexity, operational optimization, and risk mitigation. By integrating diverse monitoring data, e.g., geophysical well observations and imaged seismic, DT can bridge the gaps between seemingly distinct fields like geophysics and reservoir engineering. In addition, the recent advancements in genAI also facilitate DT with principled uncertainty quantification. Through recursive training and inference, DT utilizes simulated current state samples, e.g., CO2 saturation, paired with corresponding geophysical field observations to train its neural networks and enable posterior sampling upon receiving new field data. However, it lacks decision-making and control capabilities, which is necessary for full DT functionality. This study aims to demonstrate how DT can inform decision-making processes to prevent risks such as cap rock fracturing during CO2 storage operations.


Hierarchical Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models

Richard Rex, Srikanth Avasarala, Thomas Grady, and Felix J. Herrmann, SLIM

Abstract. Simulating two-phase flow via PDEs is computationally expensive due to the inversion of large, ill-conditioned matrices. To accelerate these computations, we reformulate Hierarchical Tucker Tensor (HTT) decompositions into Kronecker products, enabling scalable Fourier Neural Operators (FNOs) for CO2 saturation predictions in subsurface environments. This reformulation allows efficient scaling across multiple GPUs while maintaining a large number of modes. Building on our existing matrix-free abstraction library, we extend its capabilities to support distributed tensor operators. The extended library is auto-differentiable, with customized AD rules for training complex networks. We demonstrate the performance and scalability of our approach by evaluating FNO simulations against traditional PDE solvers for predicting time-varying CO2 saturations from permeability models in large-scale subsurface environments.


Probabilistic Joint Recovery Method for CO2 plume monitoring

Zijun Deng, Rafael Orozco, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Accurately predicting fluid flow patterns in Geological Carbon Storage (GCS) is a challenging task, particularly due to uncertainties in CO2 plume dynamics and reservoir properties. While previous deterministic methods such as the Joint Recovery Method (JRM) have provided valuable insights, their effectiveness is limited as tools for decision-making since they do not communicate uncertainty. To address this, we propose a Probabilistic Joint Recovery Method (PJRM) that computes the posterior distribution at each monitoring survey while leveraging the shared structure among surveys through a common generative model. By efficiently computing posterior distributions for each monitoring survey, this method aims to provide valuable uncertainty information to decision-makers in GCS projects, augmenting their workflow with principled risk minimization.


Assessing increased storage capacity due to CO2-dissolution

Haoyun Li, Abhinav Prakash Gahlot, and Felix J. Herrmann, SLIM

Abstract. During this talk, we discuss a reservoir similation study to assess increased storage capacity due to the dissolution of CO2 into brine. We will also investigate to what extend changes in the density of the brine can be detect seismically using SH-waves.


Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data

Shiqin Zeng, Huseyin Tuna Erdinc, Ziyi (Francis) Yin, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Ensuring CO2 non-leakage is a critical aspect of Geological Carbon Storage (GCS). While previous approaches that develop deep neural networks demonstrate promising automatic leakage detection and potential cost reduction in dataset collection from time-lapse seismic images, they face challenges, such as a limited ability to reduce false alarms in CO2 leakage instances and a lack of uncertainty analysis in detection results. This paper introduced a framework aimed at enhancing the deep neural network model’s ability to detect GCS leakage risk through a multi-criteria decision-making (MCDM)-based ensemble algorithm. The proposed method can improve the detection ability of leakage cases while accurately distinguishing them from non-leakage instances. Furthermore, the proposed uncertainty analysis method utilizing Monte Carlo (MC) dropout technique efficiently identifies misclassified non-leakage cases and categorizes them as undetermined for further investigation. This comprehensive approach enhances both the reliability and performance of the model in detecting GCS leakage risks.


Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes

Shiqin Zeng, Rafael Orozco, Huseyin Tuna Erdinc, and Felix J. Herrmann, SLIM

Abstract. Seismic denoising is essential for enhancing the clarity, accuracy, and reliability of seismicimages. Traditional seismic denoising methods, while effective for specific types of noise, often rely on well-established mathematical techniques that can be time-consuming, require manual tuning, and struggle with more complex noise patterns. Leveraging the 500 paired synthetic seismic datasets provided by the Think Onward community, we incorporate a 3D U-Net deep learning model with residual blocks and spatial attention to capture both local and global features for the seismic denoising task. During training, we apply the Laplacian operator to preserve edge details, followed by the Structural Similarity Index Measure (SSIM) loss to fine-tune the model, effectively removing concurrent noise and recovering the original seismic information. The resulting individual model achieves an SSIM of 0.99 compared to the ground-truth imaged seismic data. Additionally, we implement Langevin dynamics and Equivariant Bootstrapping techniques to estimate uncertainty during the training and inference phases, further improving the robustness of the denoising process.


Sensitivity of SH waves to Geological Carbon Storage

Ipsita Bhar, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. With the growing focus on Geological Carbon Storage (GCS) activities, the significance of SH-wave monitoring of CO2 plumes is becoming increasingly important because of their potential increased sensitivity to density changes. In addition, shear waves can be instrumental in detecting CO2 leakage, evaluating GCS-induced seismicity, and identifying caprock failure. According to Biot-Gassmann rock physics models, changes in seismic velocity are expected during CO2 injection. This work focuses specifically on horizontally polarized (SH) waves, which are more sensitive to density variations than P-waves. While P-wave velocity tends to decrease with supercritical CO2 injection, the investigation of its effects on the density presents a particularly compelling area of study. Therefore, we simulated the wave equation incorporating both P-wave and SH waves to better understand their impacts on imaging and CO2 injection processes.


Enhancing Full-Waveform Variational Inference through Stochastic Resampling

Yunlin Zeng, Ziyi (Francis) Yin, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. Recent developments in simulation-based inference, like the full-waveform variational inference via subsurface extensions (WISE), enable rapid online estimation of subsurface velocities by leveraging pre-trained models. To achieve this, WISE employs subsurface-offset common-image gathers to convert shot data into physics-informed summary statistics. While common-image gathers effectively retain critical information even when initial velocity estimates are inaccurate, WISE relied on a single 1D initial migration-velocity model. To improve inference and generalizability, we develop a stochastic resampling method to generate diverse migration-velocity models. This technique allows us to enhance the posterior sample quality while reducing dependency on the migration-velocity model.


End-to-end demonstration of SAGE and WISE workflow with field data case studies

Rafael Orozco, Huseyin Tuna Erdinc, Thales Souza, Yunlin Zeng, Ziyi (Francis) Yin, and Felix J. Herrmann, SLIM

Abstract. The ill-posed nature of geophysical inverse problems necessitates the inclusion of prior information. While synthetic datasets have shown promise by serving as training examples with prior information, we are primarily interested in matching the true Earth prior as closely as possible. To achieve this, we have developed the SAGE workflow, which derives training samples that, in principle, require only observed data, such as migrated seismic images and borehole well data. In this talk, we demonstrate the full SAGE-WISE workflow, where we first use SAGE to create training pairs and then feed them into the WISE network for generalized uncertainty-aware inference. We begin by presenting synthetic case studies using the Synthoseis and SEAM datasets, where we validate that the posterior samples match the observed seismic waveforms. In cases of misfits, we apply non-amortized WISER updates to the diffusion models. Finally, we present two field data case studies: one from a shallow water survey and another involving complex salt structures from the North Sea. We discuss the impact of prior information on these field datasets and validate the derived velocity models using borehole well data. ## End-to-end demonstration of SAGE and WISE workflow with field data case studies


The effect of patch-based training on large scale imaging problems

Rafael Orozco, Tristan van Leeuwen, and Felix J. Herrmann, SLIM

Abstract. Uncertainty quantification is crucial for risk-averse imaging applications, where Bayesian methods excel by naturally representing uncertainty through posterior variance. However, scaling these methods to large problems poses challenges due to the curse of dimensionality. In practice, Bayesian methods are often trained on small patches of the input data to avoid GPU memory limitations, but this approach has its pitfalls. In this talk, we discuss these pitfalls of patch-based training and demonstrate the practical use of invertible networks to efficiently train amortized Bayesian methods for large-scale 2D and 3D imaging problems. By leveraging memory-efficient implementations of normalizing flows and diffusion models, we achieve two key objectives: (1) perform high-dimensional inference on large-scale 2D and 3D inverse problems, and (2) expose the limitations of patch-based training in generative models. Through a stylized example, we show that patch-based training fails to capture the full posterior statistics, as indicated by the convergence of the posterior covariance matrix to the analytical posterior covariance. Finally, we apply our framework to field datasets in computed tomography with grid sizes of 1024x1024, and seismic imaging with grid sizes of 1024x6114, demonstrating the scalability of our approach to practical applications in large-scale inverse problems.


Inverse Design Framework for Optimizing Velocity Models in Seismic Imaging

Richard Rex, Ziyi Yin, Felix J. Herrmann, SLIM

Abstract. Common-Image Gathers (CIGs) play a crucial role in Migration-Velocity Analysis (MVA), yet traditional methods are hindered by their computational cost. We propose a novel inverse design framework using Fourier Neural Operators (FNOs) to optimize velocity models for seismic imaging. By training FNOs on a range of velocity models and their corresponding CIGs, our surrogate model can rapidly predict desirable subsurface imaging attributes, significantly reducing computational demands. This approach is augmented with jointly training a Conditional Normalizing Flow (CNF) that refines the starting velocity models, enabling sharper velocity realizations. Our method achieves a 3200× speedup over traditional approaches and provides fast uncertainty quantification.

A Digital Shadow for Geological Carbon Storage

Felix J. Herrmann

Abstract. During this talk, the latest developments will be shared on the development of a Digital Twin for Geological Carbon Storage that includes principled uncertainty quantification. It is also shown, that the proposed approach can be seen as a nonlinear and non-linear extension of the enemble Kalman filter. Instead of relying on forming approximations to the covariance from the ensemble for the predicted states and corresponding observations, the ensemble is used to train conditional neural networks. These networks are trained to carry out the Kalman corrections via the latent space of conditional Normalizing Flows.

SSL in Seismic Requires Additional Volumetric Spread

Kiran Kokilepersaud, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Self-supervised learning (SSL) approaches are seeing increased popularity within annotation-scarce domains due to their focus on training without explicit access to labeled data. For this reason, these approaches have received widespread attention within the seismic community as obtaining quality labeled data is challenging within this application domain. However, self-supervised algorithms were trained and tested within the domain of large natural image datasets. Consequently, it is unclear whether conventional self-supervised approaches are appropriately formulated for the unique challenges of the seismic domain. Specifically, traditional self-supervised approaches 1) lack the capability to assess what features a quality seismic representation space should possess and 2) how to integrate these optimal features appropriately. In this work, we show a quality self-supervised seismic representation space is one that is more distributed across the overall representation space. We then propose a novel volumetric-based loss function that explicitly induces additional spread within the representation space. We show visually and numerically that the resultant model is better able to rectify fine-grained structures within a seismic segmentation task.


Ensemble Learning and Model Aggregation Practices for Seismic Data

Zoe Fowler, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Seismic surveys are crucial elements for oil and gas exploration, producing a large amount of data that requires labeling. Due to this labeling burden, deep learning is considered as an alternative to manual labeling. However, when faced with complex data, such as seismic images, the risk of deep learning models overfitting to the data increases. Instead, alternative methods to training machine learning models on seismic data have been considered. Specifically, ensemble learning combines several models to build an overall model, where each smaller model is typically trained on a different data split. However, prior work fails to account for the context of seismic images, where seismic images can vary drastically based on their location within a volume. In this work, we incorporate seismic volume information within an ensemble learning framework and explore different model aggregation strategies to form the overall model.


Aligning Model and Label Uncertainty for Seismic Fault Interpretation

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. The annotation of seismic datasets is inherently reliant on expert interpretative judgment by geoscientists. Disagreements among experts frequently arise due to the subjective nature of interpreting complex subsurface features from seismic data. The existence of uncertainty in these interpretations is important because it reflects the inherent limitations of seismic data resolution and the complexity of geological structures. If we want ML models to be trusted in an already uncertain situation, we want their behavior to explicitly display the decision making patterns of their counterpart users. One way this “trust” can be achieved is to include their decisions in the disagreement conversation. Is the model uncertain about the same faults as the user? Does the model look at the same features as the user? Is it closer in attention and performance to an expert or novice? We propose to draw these conclusions by comparing the aleatoric uncertainty with the label disagreement or human uncertainty in the CRACKS dataset.


Domain Generalized Semantic Segmentation in Seismic Facies Classification

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Domain Generalized Semantic Segmentation (DGSS) has been explored in fields like scene understanding, where models are designed to generalize across different environments and datasets. Seismic facies segmentation presents some unique challenges such as variability in geological structures, acquisition methods, and a scarcity of labeled data. Despite the critical importance of generalization in this field, DGSS remains underexplored. In this work, we introduce domain generalization to seismic facies classification by adapting techniques proven successful in scene understanding and augmenting them with domain-specific strategies. In addition to utilizing domain-invariant feature learning, we incorporate spectral decomposition and texture learning to further enhance model performance. Spectral decomposition enables us to capture frequency-based features critical to seismic data, while texture learning helps models identify fine-grained patterns that improve segmentation accuracy. We train a set of Deep Neural Networks and Vision Foundation Models on synthetic data and test them on real-world seismic datasets, assessing how well they generalize to new, unseen domains


Explainable AI for Seismic Fault Detection: Assisting Experts with Probability Maps

Yavuz Yarici, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Seismic fault detection is a critical task in geophysical exploration, often requiring extensive manual labeling by experts. This process can be labor-intensive and subjective due to the complex nature of seismic data. Various seismic methods exist to detect and segment faults in seismic images; however, both human labeling and machine learning model predictions can result in mislabels. In this work, we present a framework to assist expert annotators by leveraging explainability maps generated by a deep learning model for seismic fault detection. These explainability maps show which pixels or regions in the seismic image contribute the most to the model’s fault prediction, indicating the likelihood of fault occurrences at different locations in the seismic data. By visualizing the model’s decision-making process, the maps provide valuable insights to expert geoscientists, assisting them in refining their labeling tasks and potentially reducing human error and bias.


ExpertMatch: Consistency-Based Learning for Seismic Fault Detection

Ghazal Kaviani, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Instead of relying on a small set of expert-labeled data, our approach leverages annotations from multi level expertise, practitioners and novices, who are more readily available but provide labels of varying quality. By enforcing consistency between the predictions made on novice, practitioner, and expert annotations, our model aligns predictions across different label sources while giving more weight to expert-labeled data. This method allows the model to benefit from both the extensive novice and practitioner-labeled data and the high-quality expert labels, leading to a more robust fault detection system. Our approach minimizes the reliance on costly expert annotations by incorporating lower-quality labels in a structured, confidence-weighted framework. Mention CRACK dataset here.


Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources

Jorge Quesada, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Segmenting faults is of paramount importance in the seismic interpretation pipeline, albeit involving both costly and labor-intensive expert annotation. Alternatives to expertly labeled data imply often relying on synthetic data or weakly labeled data. In this work, we present the CRACKS dataset, a comprehensive fault segmentation dataset spanning labels across multiple levels of expertise and confidence. We benchmark the effectiveness of this dataset by evaluating different machine learning strategies to exploit its multifaceted structure, as well as comparing it with the results we achieve when using either synthetic or weak labels sources.


Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations

Chen Zhou, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we discuss the potential of leveraging labels from lay annotators to enhance seismic fault interpretation while reducing the need for expert labeling. Interpretations exhibit disagreement within and between different levels of expertise, e.g., a geophysicist expert and less experienced practitioners. Conventionally, this disagreement is viewed as disadvantageous for machine learning models that rely on gold standard labels for training. We show that leveraging practitioner-labeled faults in the seismic sections that exhibit less expertise-based disagreements can reduce the need for expert labeling. Thus, it is important to characterize expertise-based disagreements. We develop a framework that first identifies a small number of seismic sections which entail the highest degree of expertise-based disagreements for expert labeling. The framework then uses practitioner annotations on the large amount of remaining data to augment the training set. We show that 1) augmenting with a large number of faults labeled by lay annotators achieves better fault interpretation than using only a small number of expert labels, and 2) is more effective than using synthetic data for pre-training.


Rethinking Generative Modeling for Seismic Interpretation

Chen Zhou, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we discuss a complementary view of generative modeling for seismic interpretation. Generative modeling has been exploited for seismic data processing and interpretation purposes. Numerous generative-based techniques are applied to generate seismic data for augmentation, reconstruct or denoise seismic data for imaging, and infer subsurface properties for inversion. We provide an alternative perspective of generative modeling - generating diverse sets of labels for enhancing interpretation. Generative models can capture joint distribution of data and labels. Interpretation labels disagree depending on expertise and task granularity. Thus, we frame generative modeling as capturing the dependency between seismic data, expertise and task granularity. Based on this, we provide three use cases of generative modeling: 1) assessing reliability of interpreters 2) producing a variety of interpretations for different tasks, and 3) synthesizing data.


Predicting Faults

Seulgi Kim, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Fault segmentation in seismic datasets is essential for geological analysis, earthquake prediction, and subsurface fluid flow estimation. Among the three dimensions in seismic data, fault segmentation along the depth axis plays a crucial role in identifying fault structures in each section. However, the labeling data used for these tasks are often inaccurate or incomplete, negatively affecting the performance of fault segmentation models that require high precision. Additionally, labeling tasks rely heavily on experts, making the process labor-intensive and time-consuming. Non-expert annotations or automatic labeling processes can also introduce incomplete or incorrect labels, further degrading segmentation performance.

To address this issue, we propose to apply a semantic segmentation forecasting to seismic datasets to predict continuous fault segmentation along the depth axis. By leveraging the sequential nature of both the temporal and depth axes, we utilize the F2F(Feature to Feature) module to forecast the next depth section’s fault patterns. These predictions can be used as pseudo labels, enabling more accurate and efficient labeling. First, we treat the depth axis of seismic datasets similarly to the temporal axis. This is because the fault patterns in previous depth sections are strongly correlated with those in the next sections. Given this continuity, temporal sequence forecasting models can be directly applied to depth axis prediction. Second, we aim to predict the fault patterns in the next depth slice based on the fault segmentation patterns in the previous slice by applying semantic segmentation forecasting to seismic datasets. Finally, we use the predicted segmentation results for future depth slices as pseudo labels. These pseudo labels can supplement existing incomplete labels to enhance the model’s performance and improve the overall fault segmentation accuracy.


Logs Insights for Salt Segmentation

Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. The workflow of subsurface structure characterization involves going back and forth between the seismic and well logs data. The seismic data provide large scale and more spatial information while well logs data is more localized and has a higher sampling rate. Both of them can contribute to the structure characterization in a different way. Here we try to mimic this workflow by combining and image features of the seismic data with the log features. Our results show This integration enhances the accuracy of the salt segmentation when applied on SEAM dataset.


Feature Transformation to Achieve Synthetic-to-Real Domain Adaptation

Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

There exists a shared feature space, where the feature space of the source domain (synthetic data), and the feature space of the target domain (seismic data) can be transformed into. By achieving this transformation, our model that has been trained on synthetic data, with very few real data, can generalize to the real data.We will show some initial results.