ML4Seismic Partners Meeting - 2024 - Program

program

Program 2024 ML4Seismic Partners Meeting

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology. The meeting is scheduled for November 13-15, 2024. On November 13-14, meetings will be held in Room 114 of the CODA Building. On November 15, two tutorial sessions will run in parallel in Rooms 114 and 230 of the CODA Building. The address of CODA is 756 W Peachtree St NW Atlanta, GA 30308.

The 2024 ML4Seismic Partners Meeting can also be joined remotely via this Teams meeting link or http://bit.ly/4eo3eGE.

Wednesday November 13

Program for Wednesday November 13 of the ML4Seismic Partners Meeting
Time	Presenter(s)	Topic
08:00—09:00 AM	Everyone	Breakfast (provided)
09:00—09:15 AM	Felix J. Herrmann, Ghassan AlRegib	Introduction
		Theme: Fault & Leakage Detection and Segmentation Approaches (chairs Prithwijit Chowdhury)
09:15—09:40 AM	Jorge Quesada	Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources
09:40—10:05 AM	Chen Zhou	Expertise-based Label Fusion for Seismic Fault Delineation
10:05—10:30 AM	Shiqin Zeng (new student)	Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data
10:30—10:55 AM	Prithwijit Chowdhury	Leveraging Uncertainty and Disagreement for Enhanced Annotation in Seismic Interpretation
10:55—11:10 AM		Break
11:10—11:35 AM	Chen Zhou	Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations
11:35—12:00 PM	Kiran Kokilepersaud	SSL in Seismic Requires Additional Volumetric Spread
12:00—12:25 PM	Yavuz Yarici (new student)	Assisting Experts with Probability Maps for Seismic Fault Detection
12:25—12:40 PM		Discussion
12:40—01:40 PM		Lunch (provided)
		Theme: Imaging & physics-based machine learning (Chair Yunlin Zeng)
01:40—01:55 PM	Ipsita Bhar (new student)	Sensitivity of SH waves to Geological Carbon Storage
01:55—02:20 PM	Richard Rex	Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models
02:20—02:45 PM	Rafael Orozco	Machine-learning enabled velocity-model building with uncertainty quantification
		Theme: Generative and Ensemble Learning Techniques (Chairs Kiran Kokilepersaud)
02:45—03:10 PM	Ghazal Kaviani (new student)	Learning from Multiview Multimodal Sparse Data
03:10—03:25 PM		Break
03:25—03:50 PM	Seulgi Kim (new student)	Hierarchical Information Encoding for Action Anticipation using Multi-level label
04:05—04:30 PM	Huseyin Tuna Erdinc	SAGE – Subsurface foundational model with AI-driven Geostatistical Extraction
04:30—04:55 PM	Zoe Fowler (new student)	Tackling Generalization and Personalization in Federated Learning
04:55—05:10 PM		Discussion
05:15—08:00 PM	Industry-student mixer	Boho Taco

Thursday November 14

Program for Thursday November 14 of the ML4Seismic Partners Meeting
Time	Presenter(s)	Topic
08:00—09:00 AM	Everyone	Breakfast (provided)
		Theme: Digital Twins & Uncertainty Quantification (chair Grant Bruer)
09:00—09:25 AM	Grant Bruer	Seismic monitoring of CO₂ plume dynamics using ensemble Kalman filtering
09:25—09:50 AM	Felix J. Herrmann	A Digital Twin for Geological Carbon Storage with Controlled Injectivity and optimized well placement
09:50—10:15 AM	Abhinav Prakash Gahlot	Optimizing CO₂ Storage Monitoring with Enhanced Rock Physics Modeling
10:15—10:40 AM	Zijun Deng (new student)	Probabilistic Joint Recovery Method for CO₂ plume monitoring
10:40—10:55 AM		Discussion
10:55—11:10 AM		Break
		Learning and Domain Generalization in Seismic Interpretation & Processing (chairs Jorge Quesada)
11:10—11:35 AM	Jorge Quesada	[Indicative features of prompting performance in non-natural domains]
11:35—12:00 PM	Prithwijit Chowdhury	Optimizing Prompting for Foundation Models in Seismic Image Segmentation
12:00—12:15 PM	Shiqin Zeng (new student)	Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes
12:15—12:40 PM	Mohammad Alotaibi	Redefining Prompting for Seismic
12:40—12:50 PM		Discussion
12:50—01:50 PM		Lunch (provided)
		Wave-equation based inference and monitoring (chair Rafael Orozco)
01:50—02:15 PM	Yunlin Zeng (new student)	Enhancing Full-Waveform Variational Inference through Stochastic Resampling
02:15—02:40 PM	Richard Rex	FNO-charged ASPIRE
02:40—03:05 PM	Jeongjin Park (new student)	DeFINO: Derivative-Based Fisher Score-Informed Neural Operator
03:05—03:30 PM	Haoyun Li	Reconstructing Permeability and Saturation in Reservoir Simulation Using Diffusion PDE Models
03:30—04:00 PM		Discussion
06:00 PM		El Valle

Friday November 15

08:00–09:00 AM

Everyone

Breakfast (provided)

Session 1

Time	Presenter(s)	Topic
09:00–09:45 AM	Abhinav Prakash Gahlot	Introduction to JUDI
09:45–10:30 AM	Rafael Orozco	Wave-equation Based Inference
10:30–11:15 AM	Huseyin Tuna Erdinc, Thales Souza	Generative Modelling with SAGE & North Sea Data Repository Dataset Curation
11:15–12:00 PM	Haoyun Li	Introduction to Jutul
12:00–12:45 PM	Grant Bruer	Digital Twin

Session 2

Time	Presenter(s)	Topic
09:00–09:30 AM	Mohit Prabhushankar	An overview of the 11 tutorials delivered at CVPR, BigData, AAAI, WACV on the topic “Robust Neural Networks: Towards Explainability, Uncertainty, and Intervenability”
09:30–10:15 AM	Quesada Pacora, Jorge Gerardo	Fault Label Annotation and Disagreement Visualization
10:15–11:00 AM	Prithwijit Chowdhury, Mohammad Alotaibi	SAM for Facies Segmentation
11:00–11:45 PM	Chen Zhou	Probabilistic Modelling of Seismic Interpretation
11:45–12:30 PM	Ghassan AlRegib, Mohit Prabhushankar, Greg Krudysz	Georgia Tech’s AI Makerspace

01:00 PM

Everyone

Lunch (provided)

Seismic monitoring of CO₂ plume dynamics using ensemble Kalman filtering

Grant Bruer Abhinav P. Gahlot, Edmond Chow, and Felix J. Herrmann, SLIM

Abstract. Monitoring CO₂ injected and stored in subsurface reservoirs is critical for avoiding failure scenarios and enables real-time optimization of CO₂ injection rates. Sequential Bayesian data assimilation (DA) is a statistical method for combining information over time from multiple sources to estimate a hidden state, such as the spread of the subsurface CO₂ plume. An example of scalable and efficient sequential Bayesian DA is the ensemble Kalman filter (EnKF). We improve upon existing DA literature in the seismic-CO₂ monitoring domain by applying this scalable DA algorithm to a high-dimensional CO₂ reservoir using two-phase flow dynamics and time-lapse full waveform seismic data with a realistic surface-seismic survey design. We show more accurate estimates of the CO₂ saturation field using the EnKF compared to using either the seismic data or the fluid physics alone. Furthermore, we test a range of values for the EnKF hyperparameters and give guidance on their selection for seismic CO₂ reservoir monitoring.

DeFINO: Derivative-Based Fisher Score-Informed Neural Operator

Jeongjin Park, Huseyin Tuna Erdinc, Haoyun Li, Richard Rex Arockiasamy, Nisha Chandramoorthy, and Felix J. Herrmann, SLIM

Abstract. *Previous research on neural surrogate modeling of multiphase flow systems (Yin et al., 2023) has shown that even models with low generalization error in forward predictions can generate posterior estimates that are out of distribution and physically unrealistic. To address this, we propose a regularization method that leverages the Fisher Information Matrix (FIM) to guide the training process. By integrating the FIM into a differentiable optimization framework, we aim to improve the reliability of surrogate models, such as Fourier Neural Operators (FNO) (Li et al., 2020), for both forward predictions and posterior inference.

Our experiments on benchmark problems, including the Lorenz-63 system and Navier-Stokes equations, demonstrate that our approach significantly enhances physical consistency throughout time evolution, keeping predictions within the correct spatial distribution. Looking ahead, we plan to extend our framework to more complex applications, such as Geological Carbon Storage, with an emphasis on scaling FIM computations for high-dimensional problems.*

SAGE – Subsurface foundational model with AI-driven Geostatistical Extraction

Huseyin Tuna Erdinc, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. In this study, we present a novel approach for synthesizing diverse subsurface velocity models using diffusion-based generative models. Traditional methods often depend on large, high-quality datasets of 2D velocity models, which can be difficult to obtain in subsurface applications. In contrast, our method leverages incomplete well and imaged seismic data to generate high-fidelity velocity samples without requiring fully sampled training datasets. The results demonstrate that the generative model accurately captures long-range geological structures and aligns well with unseen “ground-truth” velocity models. Furthermore, it is shown that the diversity of generated velocity models can be increased through prior guidance in the training phase, and model uncertainties can be reduced with well conditioning during inference. Experiments conducted with multiple datasets (Compass model, Synthoseis, and North Sea data) and velocity models featuring various geological structures (e.g., faults, salt bodies) suggest that our approach facilitates realistic subsurface velocity synthesis, providing valuable inputs for full-waveform inversion and enhancing seismic-based subsurface modeling.

A Digital Twin for Geological Carbon Storage with Controlled Injectivity and optimized well placement

Abhinav Prakash Gahlot, Haoyun Li, Ziyi Yin, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. We present an uncertainty-aware Digital Twin (DT) for Geologic Carbon Storage (GCS), capable of handling multimodal time-lapse data and controlling CO₂ injectivity to mitigate reservoir fracturing risks and optimize operations. In GCS, DT represents virtual replicas of subsurface systems that incorporate real-time data and advanced generative Artificial Intelligence (genAI) techniques, including neural posterior density estimation via simulation-based inference and sequential Bayesian inference. These methods enable the effective monitoring and control of CO₂ storage projects, addressing challenges such as subsurface complexity, operational optimization, and risk mitigation. By integrating diverse monitoring data, e.g., geophysical well observations and imaged seismic, DT can bridge the gaps between seemingly distinct fields like geophysics and reservoir engineering. In addition, the recent advancements in genAI also facilitate DT with principled uncertainty quantification. Through recursive training and inference, DT utilizes simulated current state samples, e.g., CO₂ saturation, paired with corresponding geophysical field observations to train its neural networks and enable posterior sampling upon receiving new field data. However, it lacks decision-making and control capabilities, which is necessary for full DT functionality. This study aims to demonstrate how DT can inform decision-making processes to prevent risks such as cap rock fracturing during CO₂ storage operations.

Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models

Richard Rex, Srikanth Avasarala, Thomas Grady, and Felix J. Herrmann, SLIM

Abstract. Simulating two-phase flow via PDEs is computationally expensive due to the inversion of large, ill-conditioned matrices. To accelerate these computations, we reformulate Tucker Tensor (TT) decompositions into Kronecker products, enabling scalable Fourier Neural Operators (FNOs) for CO₂ saturation predictions in subsurface environments. This reformulation allows efficient scaling across multiple GPUs while maintaining a large number of modes. Building on our existing matrix-free abstraction library, we extend its capabilities to support distributed tensor operators. The extended library is auto-differentiable, with customized AD rules for training complex networks. We demonstrate the performance and scalability of our approach by evaluating FNO simulations against traditional PDE solvers for predicting time-varying CO₂ saturations from permeability models in large-scale subsurface environments.

Probabilistic Joint Recovery Method for CO₂ plume monitoring

Zijun Deng, Rafael Orozco, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Accurately predicting fluid flow patterns in Geological Carbon Storage (GCS) is a challenging task, particularly due to uncertainties in CO₂ plume dynamics and reservoir properties. While previous deterministic methods such as the Joint Recovery Method (JRM) have provided valuable insights, their effectiveness is limited as tools for decision-making since they do not communicate uncertainty. To address this, we propose a Probabilistic Joint Recovery Method (PJRM) that computes the posterior distribution at each monitoring survey while leveraging the shared structure among surveys through a common generative model. By efficiently computing posterior distributions for each monitoring survey, this method aims to provide valuable uncertainty information to decision-makers in GCS projects, augmenting their workflow with principled risk minimization.

Reconstructing Permeability and Saturation in Reservoir Simulation Using Diffusion PDE Models

Haoyun Li, Shiqin Zeng, Abhinav Prakash Gahlot, and Felix J. Herrmann

Abstract. This study explores the application of a diffusion partial differential equation (PDE) model for reservoir simulation, particularly aimed at reconstructing permeability and saturation fields within a saline aquifer. Focusing on pairs of input permeability and output saturation, the model is trained to capture the underlying dynamics governing fluid flow in porous media. Post-training, the model is capable of inferring or recovering the complete permeability and saturation distributions when provided with limited vertical pixel data of permeability and saturation. This approach offers a novel pathway for enhancing the resolution of subsurface characteristics, contributing to more accurate predictions in reservoir engineering and carbon storage simulations.

Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data

Shiqin Zeng, Huseyin Tuna Erdinc, Ziyi (Francis) Yin, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Ensuring CO₂ non-leakage is a critical aspect of Geological Carbon Storage (GCS). While previous approaches that develop deep neural networks demonstrate promising automatic leakage detection and potential cost reduction in dataset collection from time-lapse seismic images, they face challenges, such as a limited ability to reduce false alarms in CO₂ leakage instances and a lack of uncertainty analysis in detection results. This paper introduced a framework aimed at enhancing the deep neural network model’s ability to detect GCS leakage risk through a multi-criteria decision-making (MCDM)-based ensemble algorithm. The proposed method can improve the detection ability of leakage cases while accurately distinguishing them from non-leakage instances. Furthermore, the proposed uncertainty analysis method utilizing Monte Carlo (MC) dropout technique efficiently identifies misclassified non-leakage cases and categorizes them as undetermined for further investigation. This comprehensive approach enhances both the reliability and performance of the model in detecting GCS leakage risks.

Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes

Shiqin Zeng, Rafael Orozco, Huseyin Tuna Erdinc, and Felix J. Herrmann, SLIM

Abstract. Seismic denoising is essential for enhancing the clarity, accuracy, and reliability of seismicimages. Traditional seismic denoising methods, while effective for specific types of noise, often rely on well-established mathematical techniques that can be time-consuming, require manual tuning, and struggle with more complex noise patterns. Leveraging the 500 paired synthetic seismic datasets provided by the Think Onward community, we incorporate a 3D U-Net deep learning model with residual blocks and spatial attention to capture both local and global features for the seismic denoising task. During training, we apply the Laplacian operator to preserve edge details, followed by the Structural Similarity Index Measure (SSIM) loss to fine-tune the model, effectively removing concurrent noise and recovering the original seismic information. The resulting individual model achieves an SSIM of 0.99 compared to the ground-truth imaged seismic data. Additionally, we implement Langevin dynamics and Equivariant Bootstrapping techniques to estimate uncertainty during the training and inference phases, further improving the robustness of the denoising process.

Sensitivity of SH waves to Geological Carbon Storage

Ipsita Bhar, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. With the growing focus on Geological Carbon Storage (GCS) activities, the significance of SH-wave monitoring of CO₂ plumes is becoming increasingly important because of their potential increased sensitivity to density changes. In addition, shear waves can be instrumental in detecting CO₂ leakage, evaluating GCS-induced seismicity, and identifying caprock failure. According to Biot-Gassmann rock physics models, changes in seismic velocity are expected during CO₂ injection. This work focuses specifically on horizontally polarized (SH) waves, which are more sensitive to density variations than P-waves. While P-wave velocity tends to decrease with supercritical CO₂ injection, the investigation of its effects on the density presents a particularly compelling area of study. Therefore, we simulated the wave equation incorporating both P-wave and SH waves to better understand their impacts on imaging and CO₂ injection processes.

Enhancing Full-Waveform Variational Inference through Stochastic Resampling

Yunlin Zeng, Rafael Orozco, Ziyi (Francis) Yin, and Felix J. Herrmann, SLIM

Abstract. Recent developments in simulation-based inference, like the full-waveform variational inference via subsurface extensions (WISE), enable rapid online estimation of subsurface velocities by leveraging pre-trained models. To achieve this, WISE employs subsurface-offset common-image gathers to convert shot data into physics-informed summary statistics. While common-image gathers effectively retain critical information even when initial velocity estimates are inaccurate, WISE relied on a single 1D initial migration-velocity model. To improve inference and generalizability, we develop a stochastic resampling method to generate diverse migration-velocity models. This technique allows us to enhance the posterior sample quality while reducing dependency on the migration-velocity model.

Machine-learning enabled velocity-model building with uncertainty quantification

Rafael Orozco, Huseyin Tuna Erdinc, Thales Souza, Yunlin Zeng, Ziyi (Francis) Yin, and Felix J. Herrmann, SLIM

Abstract. Accurately characterizing subsurface properties is crucial for a wide range of geophysical applications, from hydrocarbon exploration to monitoring of CO₂ sequestration projects. Traditional characterization methods such as Full-Waveform Inversion (FWI) represent powerful tools but often struggle with the inherent complexities of the inverse problem, including noise, limited bandwidth and aperture of data, limited azimuth and computational constraints. To address these challenges, we propose a scalable methodology that integrates generative modeling with physics-informed summary statistics, making it suitable for complicated imaging problems potentially including field datasets. Our approach leverages the power of conditional diffusion networks, and methodologically incorporates physics in the form of summary statistics, allowing for the computationally efficient generation of Bayesian posterior samples that offer an useful assessment of uncertainty of the inferred migration-velocity models. To validate our approach, we introduce a battery of tests that measure the quality of the image estimates as well as the quality of the inferred uncertainties. With modern synthetic datasets, we maximally leverage the advantages of using subsurface-offset Common Image Gathers (CIGs) as the conditioning observable. Next, we tackle the challenging SEAM salt model that requires incorporating salt flooding into our approach based on the iterative refinements of ASPIRE — Amortized posteriors with Summaries that are Physics-based and Iteratively REfined.

FNO-charged ASPIRE

Richard Rex, Ziyi Yin, Felix J. Herrmann, SLIM

Abstract. During this talk, we will demonstrate how extended re-migrations—i.e, formation of subsurface-offset Common-Image Gathers (CIGs) for a new velocity model, can be avoided altogether by training Fourier Neural Operators during training of ASPIRE — Amortized posteriors with Summaries that are Physics-based and Iteratively REfined. In this approach, FNOs are trained as surrogates capable of mapping CIGs for one migration-velocity model to the other. The approach is computationally feasible because it uses the same training set as used during ASPIRE. As a result, additional training costs are small and the inference costs are reduced by a factor equal to the number of ASPIRE refinements.

Optimizing CO₂ Storage Monitoring with Enhanced Rock Physics Modeling

Abhinav P. Gahlot

Abstract. Based on the latest data-assimilation and machine-learning techniques, Digital Twins (DTs) have shown promise for high-fidelity monitoring and control of underground CO₂ storage. While the use of these techniques have important advantages, they do rely on certain assumptions. If these assumptions are not met, the DT’s neural networks may no longer infer the state of the CO₂ plume (pressure/saturation) accurately. By augmenting the forecast ensemble, we address this issue.

SSL in Seismic Requires Additional Volumetric Spread

Kiran Kokilepersaud, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Self-supervised learning (SSL) approaches are seeing increased popularity within annotation-scarce domains due to their focus on training without explicit access to labeled data. For this reason, these approaches have received widespread attention within the seismic community as obtaining quality labeled data is challenging within this application domain. However, self-supervised algorithms were trained and tested within the domain of large natural image datasets. Consequently, it is unclear whether conventional self-supervised approaches are appropriately formulated for the unique challenges of the seismic domain. Specifically, traditional self-supervised approaches 1) lack the capability to assess what features a quality seismic representation space should possess and 2) how to integrate these optimal features appropriately. In this work, we show a quality self-supervised seismic representation space is one that is more distributed across the overall representation space. We then propose a novel volumetric-based loss function that explicitly induces additional spread within the representation space. We show visually and numerically that the resultant model is better able to rectify fine-grained structures within a seismic segmentation task.

Hierarchical Information Encoding for Action Anticipation using Multi-level label

Zoe Fowler, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Statistical heterogeneity is a challenge in federated learning algorithms from both a local and global viewpoint, where the global model has difficulties generalizing to a broad variety of data and personalizing to each local client’s data. Furthermore, statistical heterogeneity increases catastrophic forgetting, where test samples previously learned become incorrect after a model update. Prior work tends to focus on the generalization and personalization challenge separately, despite these issues being connected through catastrophic forgetting. In this abstract, we consider both personalization and generalization, establishing how both challenges can be ameliorated through the reduction of catastrophic forgetting. Specifically, this can be accomplished via modifications to the local training stage of each client and the global model aggregation process. We show results on medical and natural images, providing insights on how results can be extended to the seismic domain.

Leveraging Uncertainty and Disagreement for Enhanced Annotation in Seismic Interpretation

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Data selection for deep learning in seismic interpretation is crucial, especially given the challenges of label scarcity and interpreter disagreement. Effective training relies on identifying the most informative samples, yet seismic datasets are often limited and subject to inconsistencies among interpreters. To address these challenges, a novel data selection framework is proposed that incorporates interpretation disagreement as a key factor. By modeling disagreement through representation shifts within neural networks, the approach enhances data selection by focusing on geologically significant regions. Integrated with active learning, this framework offers a comprehensive strategy for training set selection. Experimental results show that our method consistently outperforms traditional active learning methods, achieving up to a 12% improvement in mean intersection-over-union. These findings underscore the potential of incorporating uncertainty and disagreement to improve the generalization of deep learning models in seismic interpretation.

Optimizing Prompting for Foundation Models in Seismic Image Segmentation

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. The advent of large foundation models has transformed artificial intelligence by providing generalized frameworks for large-scale downstream tasks. Segment Anything (SAM) is a vision-based model that performs image segmentation using “inclusion” and “exclusion” point prompts. In geophysics and seismic image analysis, facies and fault segmentation are cost-intensive, and SAM’s prompt-based approach offers fast segmentation without the need for model training on labeled data, thus saving time and resources. However, prompting is intuitive; too few prompts lead to errors, while over-prompting degrades performance. To optimize this process, our work aims to identify an ideal combination of prompts by measuring each prompt’s importance through its impact on segmentation quality, using the Intersection over Union (IoU) as a metric. We employ sufficiency to calculate prompt importance, with our algorithm guiding the user to stop prompting when the optimal level is reached. This approach not only minimizes manual efforts but also enables automated prompting for subsequent slices in facies or fault data without requiring ground truth labels.

Assisting Experts with Probability Maps for Seismic Fault Detection

Yavuz Yarici, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Seismic fault detection is a critical task in geophysical exploration, often requiring extensive manual labeling by experts. This process can be labor-intensive and subjective due to the complex nature of seismic data. Various seismic methods exist to detect and segment faults in seismic images; however, both human labeling and machine learning model predictions can result in mislabels. In this work, we present a framework to assist expert annotators by leveraging probability maps generated by a deep learning model for seismic fault detection. These probability maps indicate the likelihood of fault occurrences at different locations in the seismic data, allowing experts to focus on regions with high predicted fault likelihood. These maps provide valuable insights to expert geoscientists, assisting them in refining their labeling tasks and potentially reducing human error and bias.

Learning from Multiview Multimodal Sparse Data

Ghazal Kaviani, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Human daily activity data is inherently sparse unless trimmed and curated for training machine learning models. Within each activity pattern, certain data segments are more representative. Data collected from different sensors capture these activity patterns with varying levels of detail and specificity, resulting in differing degrees of sparsity across each signal. For instance, a sensor on the hand captures diverse hand interactions, whereas an insole sensor records similar standing or sitting patterns during the same period. A multimodal learning approach is essential for effectively detecting and segmenting these patterns.

Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources

Jorge Quesada, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Segmenting faults is of paramount importance in the seismic interpretation pipeline, albeit involving both costly and labor-intensive expert annotation. Alternatives to expertly labeled data imply often relying on synthetic data or weakly labeled data. In this work, we present the CRACKS dataset, a comprehensive fault segmentation dataset spanning labels across multiple levels of expertise and confidence. We benchmark the effectiveness of this dataset by evaluating different machine learning strategies to exploit its multifaceted structure, as well as comparing it with the results we achieve when using either synthetic or weak labels sources.

Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations

Chen Zhou, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we discuss the potential of leveraging labels from lay annotators to enhance seismic fault interpretation while reducing the need for expert labeling. Interpretations exhibit disagreement within and between different levels of expertise, e.g., a geophysicist expert and less experienced practitioners. Conventionally, this disagreement is viewed as disadvantageous for machine learning models that rely on gold standard labels for training. We show that leveraging practitioner-labeled faults in the seismic sections that exhibit less expertise-based disagreements can reduce the need for expert labeling. Thus, it is important to characterize expertise-based disagreements. We develop a framework that first identifies a small number of seismic sections which entail the highest degree of expertise-based disagreements for expert labeling. The framework then uses practitioner annotations on the large amount of remaining data to augment the training set. We show that 1) augmenting with a large number of faults labeled by lay annotators achieves better fault interpretation than using only a small number of expert labels, and 2) is more effective than using synthetic data for pre-training.

Expertise-based Label Fusion for Seismic Fault Delineation

Chen Zhou, Jorge Quesada, Yavuz Yarici, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we present an effective fusion framework that utilizes annotations across multiple levels of expertise to enhance the ML model’s performance on fault delineation. In another presentation, crowdsourced annotations are shown to be useful. Commonly, crowdsourced labels exhibit expertise-based discrepancies. The question is how to utilize labels from different expertise levels to enhance the ML model’s performance. Our intuition is that the labels from multiple expertise levels contain complementary information, which can be fused during pre-training to effectively approximate expert-level annotations. We validate our intuition on the CRACKS dataset. We pre-train a fault delineation model with fusion labels from two expertise levels, and then fine-tune it with lesser amount of expert-level labels. We then conduct a study on label fusion between multiple practitioners and novices in different weighting configurations. Our results show that 1) label fusion from different expertise during pre-training enhances fault delineation, and 2) better performance can be achieved with larger weight on higher expertise.

Anticipation for sparse dataset

Seulgi Kim, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. We aim to address the challenge of predicting key events in sparse data environments by leveraging hierarchical labeling and temporal pattern learning. Both human daily activity and seismic data share a common trait of sparsity: in human activity datasets, most frames lack meaningful cues related to action transitions, making it difficult to pinpoint crucial moments. For example, in a one-hour video of daily activities, the critical clues required to predict the next action may only span a few seconds of actual behavioral change. Similarly, in seismic datasets, structures of interest such as faults and salt domes appear sporadically. Less informative features like horizons dominate the data, making it challenging to focus on the sparse yet significant events.

To tackle this, we propose a hierarchical labeling with temporal sequence models to accurately capture the essential patterns within sparse data. By refining the granularity of labels and focusing on key temporal points, our method can better anticipate future actions and their precise timing. Our approach not only improves next-action predictions in human anticipation tasks but also provides a robust framework that can be extended to other domains facing sparse data challenges.

Tackling Generalization and Personalization in Federated Learning

Jorge Quesada, Zoe Fowler, Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Foundation models constitute a paradigm shift in the way machine learning tasks are approached, moving now to prompting-based approaches. However, there is little understanding of what factors make a prompting strategy effective, particularly in the visual domain. We present the PointPrompt dataset, the first visual segmentation prompting dataset across several image domains. Our benchmark tasks provide an array of opportunities to improve the understanding of the way human prompts differ from automated ones and what underlying factors make for effective visual prompts. Overall, our experiments not only showcase the differences between human prompts and automated methods, but also highlight potential avenues through which these differences can be leveraged to improve effective visual prompt design.

Redefining Prompting for Seismic

Mohammad Alotaibi, Mohit Prabhushankar, Kiran Kokilepersaud and Ghassan AlRegib, OLIVES

Abstract. Machine-learning (ML) algorithms have emerged as a tool for seismic interpretation. However they lack the expertise that human experts bring to the interpretation process. We hypothesize that cooperating ML methods with the domain expertise can address ML limitations. In particular, interactive prompting-based models like ChatGPT and the Segment Anything Model (SAM) enable this interactivity between AI and experts. To verify our hypotheses, we analyze this interaction between the Segment anything model (SAM) and different users tasked with a seismic labeling problem. We show that although users achieved an mIoU of 0.9, they struggled to influence SAM to segment the desired area. Moreover, we found that users approach SAM with presumptions that more prompts lead to better segmentation, and only accurate prompting is adequate for accurate segmentation. However, they tend to modify their approaches upon realizing that these assumptions are not true.

Introduction to JUDI

Abhinav Prakash Gahlot

Problem setup
Data acquisition
Modeling
RTM Imaging
Parallelization – workers and threads

Wave-equation Based Inference

Rafael Orozco, Yunlin Zeng

Conditional normalizing flows
WISE
WISER
Memory efficient training for general neural networks

Generative Modelling with SAGE & North Sea Data Repository Dataset Curation

Huseyin Tuna Erdinc, Thales Souza

North Sea data (NDR) creation
Review on diffusion models
Self-supervised training for generative models & SAGE training framework
Evaluation of SAGE results

Introduction to Jutul

Haoyun Li

Motivation of differentiable simulator
Application on reservoir simulation
Math and Julia implementation
3D visualization of CO₂ injection into saline aquifer
Publicly available corner-point test models

Digital Twin

Grant Bruer

Brief math review
Operator definitions for geologic CO₂ storage
Synthetic system
Ensemble Kalman filter loop
Analyze error and uncertainty

Fault Label Annotation and Disagreement Visualization

Quesada Pacora, Jorge Gerardo

Demonstrating the process of fault annotation and visualizing annotator disagreements.

SAM for Facies Segmentation

Prithwijit Chowdhury, Mohammad Alotaibi

Using seismic horizons as prompts and text inputs for facies segmentation with the Segment Anything Model.

Probabilistic Modelling of Seismic Interpretation

Chen Zhou

Showcasing a generative approach to model the seismic interpretation workflow probabilistically.

Georgia Tech’s AI Makerspace

Ghassan AlRegib

Seismic Laboratory for Imaging and Modeling