ML4Seismic Partners Meeting - 2024

sidebar

Date:

November 13-15, 2024

Venue:

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology.

Hotels:

Lodging options recommended by the institute (with Georgia Tech deals) can be found here.

More hotel choices can be found here.

Contact:

Ghassan AlRegib, co-director
alregib@gatech.edu
Felix Herrmann, co-director
felix.herrmann@gatech.edu

overview

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology. The meeting is scheduled for November 13—15, 2024.

Center for Machine Learning for Seismic (ML4Seismic)

A joint initiative at the Georgia Institute of Technology between Omni Lab for Intelligent Visual Engineering and Science (Olives) lead by professor Ghassan AlRegib (ECE) and the Seismic Laboratory for Imaging and Modeling (SLIM) lead by professor Felix J. Herrmann (EAS, CSE, ECE), innovators in the energy sector, and major Cloud providers.

Georgia Tech’s Center for Machine Learning for Seismic (ML4Seismic) is designed to foster research partnerships aimed to drive innovations in artificial-intelligence assisted seismic imaging, interpretation, analysis, and monitoring in the cloud.

Through training and collaboration, the ML4Seismic Industry Partners Program promotes novel approaches that balance new insights from machine learning with established approaches grounded in physics and geology. Areas of interest include, but are not limited to, low-environmental impact time-lapse acquisition, data-constrained image segmentation, classification, physics-constrained machine learning, and uncertainty quantification. These research areas are well aligned with Georgia Tech’s strengths in computational/data sciences and engineering.

participants

ML4Seismic Industry Partners

This page will be updated as partners register to this event…

program

Program 2024 ML4Seismic Partners Meeting

The 2024 ML4Seismic Industry Partners Meeting will be held in person at the Georgia Institute of Technology. The meeting is scheduled for November 13-15, 2024.

Wednesday November 13

Program for Wednesday November 13 of the ML4Seismic Partners Meeting
08:00—09:00 AM Everyone Breakfast (provided)
09:00—09:15 AM Felix J. Herrmann, Ghassan AlRegib Introduction
Theme: Fault & Leakage Detection and Segmentation Approaches
(chairs TBD)
09:15—09:40 AM Jorge Quesada Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources
09:40—10:05 AM Chen Zhou Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations
10:05—10:30 AM Shiqin Zeng (new student) Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data
10:30—10:55 AM Ghazal Kaviani Learning from Multiview Multimodal Sparse Data
10:55—11:10 AM Break
11:10—11:35 AM Prithwijit Chowdhury Leveraging Uncertainty and Disagreement for Enhanced Annotation in Seismic Interpretation
11:35—12:00 AM Yavuz Yarici Assisting Experts with Probability Maps for Seismic Fault Detection
12:00—12:25 PM Seulgi Kim Anticipation for Sparse Dataset
12:25—12:40 PM Discussion
12:40—01:40 PM Lunch (provided)
Theme: Imaging & physics-based machine learning
(Chairs TBA)
01:40—01:55 PM Jeongjin Park (new student) Physical Bayesian Inference for Two-Phase Flow Problems
01:55—02:20 PM Richard Rex Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models
02:20—02:45 PM Richard Rex FNO-charged ASPIRE
Theme: Generative and Ensemble Learning Techniques
(chairs TBA)
02:45—03:10 PM Chen Zhou Expertise-based Label Fusion for Seismic Fault Delineation
03:10—03:25 PM Break
03:25—03:50 PM Zoe Fowler Tackling Generalization and Personalization in Federated Learning
04:05—04:30 PM Huseyin Tuna Erdinc SAGE – Subsurface foundational model with AI-driven Geostatistical Extraction
04:30—04:55 PM Jorge Quesada Indicative features of prompting performance in non-natural domains
04:55—05:10 PM Discussion
05:15—08:00 PM Industry-student mixer Boho Taco

Thursday November 14

Program for Thursday November 14 of the ML4Seismic Partners Meeting
08:00—09:00 AM Everyone Breakfast (provided)
Theme: Digital Twins & Uncertainty Quantification
(chairs TBA)
09:00—09:25 AM Grant Bruer Seismic monitoring of CO2 plume dynamics using ensemble Kalman filtering
09:25—09:50 AM Felix J. Herrmann A Digital Shadow for Geological Carbon Storage
09:50—10:15 AM Abhinav Prakash Gahlot A Digital Twin for Geological Carbon Storage with Controlled Injectivity
10:15—10:40 AM Zijun Deng (new student) Probabilistic Joint Recovery Method for CO2 plume monitoring
10:40—10:55 AM Discussion
10:55—11:10 AM Break
Learning and Domain Generalization in Seismic Interpretation & Processing
(chairs TBA)
11:10—11:35 AM Kiran Kokilepersaud SSL in Seismic Requires Additional Volumetric Spread
11:35—12:00 PM Prithwijit Chowdhury Optimizing Prompting for Foundation Models in Seismic Image Segmentation
12:00—12:15 PM Shiqin Zeng (new student) Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes
12:15—12:40 PM Mohammad Alotaibi Visual Prompting: A Hitchhiker’s Guide to Segment Anything
12:40—12:50 PM Discussion
12:50—01:50 PM Lunch (provided)
Wave-equation based inference and monitoring
(chairs TBA)
01:50—02:15 PM Yunlin Zeng (new student) Enhancing Full-Waveform Variational Inference through Stochastic Resampling
02:15—02:50 PM Rafael Orozco Machine-learning enabled velocity-model building with uncertainty quantification
02:50—03:15 PM Haoyun Li Assessing increased storage capacity due to CO2-dissolution
03:15—03:30 PM Ipsita Bhar (new student) Sensitivity of SH waves to Geological Carbon Storage
03:30—04:00 PM Discussion
06:00 PM El Valle

Friday November 15

08:00—09:00 AM Everyone Breakfast (provided)

Session 1

Ensemble Kalman Filtering

Grant Bruer

  • Brief math review
  • Operator definitions for geologic CO2 storage
  • Synthetic system
  • Ensemble Kalman filter loop
  • Analyze error and uncertainty

Wave-equation Based Inference

Rafael Orozco, Yunlin Zeng

  • Conditional normalizing flows
  • WISE
  • WISER
  • Memory efficient training for general neural networks

Generative Modelling with SAGE & North Sea Data Repository Dataset Curation

Huseyin Tuna Erdinc Thales Souza

  • North Sea data (NDR) creation
  • Review on diffusion models
  • Self-supervised training for generative models & SAGE training framework
  • Evaluation of SAGE Results

Introduction to Jutul

Haoyun Li

  • Motivation of differentiable simulator
  • Application on reservoir simulation
  • Math and Julia implementation
  • 3D visualization of CO2 injection into saline aquifer
  • Publicly available corner-point test models

Introduction to JUDI

Abhinav Prakash Gahlot

  • Problem setup
  • Data acquisition
  • Modeling
  • RTM Imaging
  • Parallelization – workers and threads

Session 2

Fault Label Annotation and Disagreement Visualization

Quesada Pacora, Jorge Gerardo

  • Demonstrating the process of fault annotation and visualizing annotator disagreements.

SAM for Facies Segmentation

Prithwijit Chowdhury, Mohammad Alotaibi

  • Using seismic horizons as prompts and text inputs for facies segmentation with the Segment Anything Model.

Probabilistic Modelling of Seismic Interpretation

Chen Zhou

  • Showcasing a generative approach to model the seismic interpretation workflow probabilistically.

Georgia Tech’s AI Makerspace

Ghassan AlRegib


Seismic monitoring of CO2 plume dynamics using ensemble Kalman filtering

Grant Bruer Abhinav P. Gahlot, Edmond Chow, and Felix J. Herrmann, SLIM

Abstract. Monitoring CO2 injected and stored in subsurface reservoirs is critical for avoiding failure scenarios and enables real-time optimization of CO2 injection rates. Sequential Bayesian data assimilation (DA) is a statistical method for combining information over time from multiple sources to estimate a hidden state, such as the spread of the subsurface CO2 plume. An example of scalable and efficient sequential Bayesian DA is the ensemble Kalman filter (EnKF). We improve upon existing DA literature in the seismic-CO2 monitoring domain by applying this scalable DA algorithm to a high-dimensional CO2 reservoir using two-phase flow dynamics and time-lapse full waveform seismic data with a realistic surface-seismic survey design. We show more accurate estimates of the CO2 saturation field using the EnKF compared to using either the seismic data or the fluid physics alone. Furthermore, we test a range of values for the EnKF hyperparameters and give guidance on their selection for seismic CO2 reservoir monitoring.


Physical Bayesian Inference for Two-Phase Flow Problems

Jeongjin Park, Huseyin Tuna Erdinc, Haoyun Li, Richard Rex Arockiasamy, Nisha Chandramoorthy, and Felix J. Herrmann, SLIM

Abstract. *Previous research on neural surrogate modeling of multiphase flow systems (Yin et al., 2023) has shown that even models with low generalization error in forward predictions can generate posterior estimates that are out of distribution and physically unrealistic. To address this, we propose a regularization method that leverages the Fisher Information Matrix (FIM) to guide the training process. By integrating the FIM into a differentiable optimization framework, we aim to improve the reliability of surrogate models, such as Fourier Neural Operators (FNO) (Li et al., 2020), for both forward predictions and posterior inference.

Our experiments on benchmark problems, including the Lorenz-63 system and Navier-Stokes equations, demonstrate that our approach significantly enhances physical consistency throughout time evolution, keeping predictions within the correct spatial distribution. Looking ahead, we plan to extend our framework to more complex applications, such as Geological Carbon Storage, with an emphasis on scaling FIM computations for high-dimensional problems.*


SAGE – Subsurface foundational model with AI-driven Geostatistical Extraction

Huseyin Tuna Erdinc, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. In this study, we present a novel approach for synthesizing diverse subsurface velocity models using diffusion-based generative models. Traditional methods often depend on large, high-quality datasets of 2D velocity models, which can be difficult to obtain in subsurface applications. In contrast, our method leverages incomplete well and imaged seismic data to generate high-fidelity velocity samples without requiring fully sampled training datasets. The results demonstrate that the generative model accurately captures long-range geological structures and aligns well with unseen “ground-truth” velocity models. Furthermore, it is shown that the diversity of generated velocity models can be increased through prior guidance in the training phase, and model uncertainties can be reduced with well conditioning during inference. Experiments conducted with multiple datasets (Compass model, Synthoseis, and North Sea data) and velocity models featuring various geological structures (e.g., faults, salt bodies) suggest that our approach facilitates realistic subsurface velocity synthesis, providing valuable inputs for full-waveform inversion and enhancing seismic-based subsurface modeling.


A Digital Twin for Geological Carbon Storage with Controlled Injectivity

Abhinav Prakash Gahlot, Haoyun Li, Ziyi Yin, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. We present an uncertainty-aware Digital Twin (DT) for Geologic Carbon Storage (GCS), capable of handling multimodal time-lapse data and controlling CO2 injectivity to mitigate reservoir fracturing risks and optimize operations. In GCS, DT represents virtual replicas of subsurface systems that incorporate real-time data and advanced generative Artificial Intelligence (genAI) techniques, including neural posterior density estimation via simulation-based inference and sequential Bayesian inference. These methods enable the effective monitoring and control of CO2 storage projects, addressing challenges such as subsurface complexity, operational optimization, and risk mitigation. By integrating diverse monitoring data, e.g., geophysical well observations and imaged seismic, DT can bridge the gaps between seemingly distinct fields like geophysics and reservoir engineering. In addition, the recent advancements in genAI also facilitate DT with principled uncertainty quantification. Through recursive training and inference, DT utilizes simulated current state samples, e.g., CO2 saturation, paired with corresponding geophysical field observations to train its neural networks and enable posterior sampling upon receiving new field data. However, it lacks decision-making and control capabilities, which is necessary for full DT functionality. This study aims to demonstrate how DT can inform decision-making processes to prevent risks such as cap rock fracturing during CO2 storage operations.


Tucker Compression for Scalable Operator Learning in Large-Scale Parametric PDE Models

Richard Rex, Srikanth Avasarala, Thomas Grady, and Felix J. Herrmann, SLIM

Abstract. Simulating two-phase flow via PDEs is computationally expensive due to the inversion of large, ill-conditioned matrices. To accelerate these computations, we reformulate Tucker Tensor (TT) decompositions into Kronecker products, enabling scalable Fourier Neural Operators (FNOs) for CO2 saturation predictions in subsurface environments. This reformulation allows efficient scaling across multiple GPUs while maintaining a large number of modes. Building on our existing matrix-free abstraction library, we extend its capabilities to support distributed tensor operators. The extended library is auto-differentiable, with customized AD rules for training complex networks. We demonstrate the performance and scalability of our approach by evaluating FNO simulations against traditional PDE solvers for predicting time-varying CO2 saturations from permeability models in large-scale subsurface environments.


Probabilistic Joint Recovery Method for CO2 plume monitoring

Zijun Deng, Rafael Orozco, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Accurately predicting fluid flow patterns in Geological Carbon Storage (GCS) is a challenging task, particularly due to uncertainties in CO2 plume dynamics and reservoir properties. While previous deterministic methods such as the Joint Recovery Method (JRM) have provided valuable insights, their effectiveness is limited as tools for decision-making since they do not communicate uncertainty. To address this, we propose a Probabilistic Joint Recovery Method (PJRM) that computes the posterior distribution at each monitoring survey while leveraging the shared structure among surveys through a common generative model. By efficiently computing posterior distributions for each monitoring survey, this method aims to provide valuable uncertainty information to decision-makers in GCS projects, augmenting their workflow with principled risk minimization.


Assessing increased storage capacity due to CO2-dissolution

Haoyun Li, Abhinav Prakash Gahlot, and Felix J. Herrmann, SLIM

Abstract. During this talk, we discuss a reservoir similation study to assess increased storage capacity due to the dissolution of CO2 into brine. We will also investigate to what extend changes in the density of the brine can be detect seismically using SH-waves.


Enhancing Performance with Uncertainty Estimation in Geological Carbon Storage Leakage Detection from Time-Lapse Seismic Data

Shiqin Zeng, Huseyin Tuna Erdinc, Ziyi (Francis) Yin, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. Ensuring CO2 non-leakage is a critical aspect of Geological Carbon Storage (GCS). While previous approaches that develop deep neural networks demonstrate promising automatic leakage detection and potential cost reduction in dataset collection from time-lapse seismic images, they face challenges, such as a limited ability to reduce false alarms in CO2 leakage instances and a lack of uncertainty analysis in detection results. This paper introduced a framework aimed at enhancing the deep neural network model’s ability to detect GCS leakage risk through a multi-criteria decision-making (MCDM)-based ensemble algorithm. The proposed method can improve the detection ability of leakage cases while accurately distinguishing them from non-leakage instances. Furthermore, the proposed uncertainty analysis method utilizing Monte Carlo (MC) dropout technique efficiently identifies misclassified non-leakage cases and categorizes them as undetermined for further investigation. This comprehensive approach enhances both the reliability and performance of the model in detecting GCS leakage risks.


Image Impeccable Challenge: An Effective Machine Learning Denoising Method for 3D Seismic Volumes

Shiqin Zeng, Rafael Orozco, Huseyin Tuna Erdinc, and Felix J. Herrmann, SLIM

Abstract. Seismic denoising is essential for enhancing the clarity, accuracy, and reliability of seismicimages. Traditional seismic denoising methods, while effective for specific types of noise, often rely on well-established mathematical techniques that can be time-consuming, require manual tuning, and struggle with more complex noise patterns. Leveraging the 500 paired synthetic seismic datasets provided by the Think Onward community, we incorporate a 3D U-Net deep learning model with residual blocks and spatial attention to capture both local and global features for the seismic denoising task. During training, we apply the Laplacian operator to preserve edge details, followed by the Structural Similarity Index Measure (SSIM) loss to fine-tune the model, effectively removing concurrent noise and recovering the original seismic information. The resulting individual model achieves an SSIM of 0.99 compared to the ground-truth imaged seismic data. Additionally, we implement Langevin dynamics and Equivariant Bootstrapping techniques to estimate uncertainty during the training and inference phases, further improving the robustness of the denoising process.


Sensitivity of SH waves to Geological Carbon Storage

Ipsita Bhar, Abhinav P. Gahlot, and Felix J. Herrmann, SLIM

Abstract. With the growing focus on Geological Carbon Storage (GCS) activities, the significance of SH-wave monitoring of CO2 plumes is becoming increasingly important because of their potential increased sensitivity to density changes. In addition, shear waves can be instrumental in detecting CO2 leakage, evaluating GCS-induced seismicity, and identifying caprock failure. According to Biot-Gassmann rock physics models, changes in seismic velocity are expected during CO2 injection. This work focuses specifically on horizontally polarized (SH) waves, which are more sensitive to density variations than P-waves. While P-wave velocity tends to decrease with supercritical CO2 injection, the investigation of its effects on the density presents a particularly compelling area of study. Therefore, we simulated the wave equation incorporating both P-wave and SH waves to better understand their impacts on imaging and CO2 injection processes.


Enhancing Full-Waveform Variational Inference through Stochastic Resampling

Yunlin Zeng, Ziyi (Francis) Yin, Rafael Orozco, and Felix J. Herrmann, SLIM

Abstract. Recent developments in simulation-based inference, like the full-waveform variational inference via subsurface extensions (WISE), enable rapid online estimation of subsurface velocities by leveraging pre-trained models. To achieve this, WISE employs subsurface-offset common-image gathers to convert shot data into physics-informed summary statistics. While common-image gathers effectively retain critical information even when initial velocity estimates are inaccurate, WISE relied on a single 1D initial migration-velocity model. To improve inference and generalizability, we develop a stochastic resampling method to generate diverse migration-velocity models. This technique allows us to enhance the posterior sample quality while reducing dependency on the migration-velocity model.


Machine-learning enabled velocity-model building with uncertainty quantification

Rafael Orozco, Huseyin Tuna Erdinc, Thales Souza, Yunlin Zeng, Ziyi (Francis) Yin, and Felix J. Herrmann, SLIM

Abstract. Accurately characterizing subsurface properties is crucial for a wide range of geophysical applications, from hydrocarbon exploration to monitoring of CO2 sequestration projects. Traditional characterization methods such as Full-Waveform Inversion (FWI) represent powerful tools but often struggle with the inherent complexities of the inverse problem, including noise, limited bandwidth and aperture of data, limited azimuth and computational constraints. To address these challenges, we propose a scalable methodology that integrates generative modeling with physics-informed summary statistics, making it suitable for complicated imaging problems potentially including field datasets. Our approach leverages the power of conditional diffusion networks, and methodologically incorporates physics in the form of summary statistics, allowing for the computationally efficient generation of Bayesian posterior samples that offer an useful assessment of uncertainty of the inferred migration-velocity models. To validate our approach, we introduce a battery of tests that measure the quality of the image estimates as well as the quality of the inferred uncertainties. With modern synthetic datasets, we maximally leverage the advantages of using subsurface-offset Common Image Gathers (CIGs) as the conditioning observable. Next, we tackle the challenging SEAM salt model that requires incorporating salt flooding into our approach based on the iterative refinements of ASPIRE — Amortized posteriors with Summaries that are Physics-based and Iteratively REfined.


FNO-charged ASPIRE

Richard Rex, Ziyi Yin, Felix J. Herrmann, SLIM

Abstract. During this talk, we will demonstrate how extended re-migrations—i.e, formation of subsurface-offset Common-Image Gathers (CIGs) for a new velocity model, can be avoided altogether by training Fourier Neural Operators during training of ASPIRE — Amortized posteriors with Summaries that are Physics-based and Iteratively REfined. In this approach, FNOs are trained as surrogates capable of mapping CIGs for one migration-velocity model to the other. The approach is computationally feasible because it uses the same training set as used during ASPIRE. As a result, additional training costs are small and the inference costs are reduced by a factor equal to the number of ASPIRE refinements.


A Digital Shadow for Geological Carbon Storage

Felix J. Herrmann

Abstract. During this talk, the latest developments will be shared on the development of a Digital Twin for Geological Carbon Storage that includes principled uncertainty quantification. It is also shown, that the proposed approach can be seen as a nonlinear and non-linear extension of the ensemble Kalman filter. Instead of relying on forming approximations to the covariance from the ensemble for the predicted states and corresponding observations, the ensemble is used to train conditional neural networks. These networks are trained to carry out the Kalman corrections via the latent space of conditional Normalizing Flows.


SSL in Seismic Requires Additional Volumetric Spread

Kiran Kokilepersaud, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Self-supervised learning (SSL) approaches are seeing increased popularity within annotation-scarce domains due to their focus on training without explicit access to labeled data. For this reason, these approaches have received widespread attention within the seismic community as obtaining quality labeled data is challenging within this application domain. However, self-supervised algorithms were trained and tested within the domain of large natural image datasets. Consequently, it is unclear whether conventional self-supervised approaches are appropriately formulated for the unique challenges of the seismic domain. Specifically, traditional self-supervised approaches 1) lack the capability to assess what features a quality seismic representation space should possess and 2) how to integrate these optimal features appropriately. In this work, we show a quality self-supervised seismic representation space is one that is more distributed across the overall representation space. We then propose a novel volumetric-based loss function that explicitly induces additional spread within the representation space. We show visually and numerically that the resultant model is better able to rectify fine-grained structures within a seismic segmentation task.


Tackling Generalization and Personalization in Federated Learning

Zoe Fowler, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Statistical heterogeneity is a challenge in federated learning algorithms from both a local and global viewpoint, where the global model has difficulties generalizing to a broad variety of data and personalizing to each local client’s data. Furthermore, statistical heterogeneity increases catastrophic forgetting, where test samples previously learned become incorrect after a model update. Prior work tends to focus on the generalization and personalization challenge separately, despite these issues being connected through catastrophic forgetting. In this abstract, we consider both personalization and generalization, establishing how both challenges can be ameliorated through the reduction of catastrophic forgetting. Specifically, this can be accomplished via modifications to the local training stage of each client and the global model aggregation process. We show results on medical and natural images, providing insights on how results can be extended to the seismic domain.


Leveraging Uncertainty and Disagreement for Enhanced Annotation in Seismic Interpretation

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Data selection for deep learning in seismic interpretation is crucial, especially given the challenges of label scarcity and interpreter disagreement. Effective training relies on identifying the most informative samples, yet seismic datasets are often limited and subject to inconsistencies among interpreters. To address these challenges, a novel data selection framework is proposed that incorporates interpretation disagreement as a key factor. By modeling disagreement through representation shifts within neural networks, the approach enhances data selection by focusing on geologically significant regions. Integrated with active learning, this framework offers a comprehensive strategy for training set selection. Experimental results show that our method consistently outperforms traditional active learning methods, achieving up to a 12% improvement in mean intersection-over-union. These findings underscore the potential of incorporating uncertainty and disagreement to improve the generalization of deep learning models in seismic interpretation.


Optimizing Prompting for Foundation Models in Seismic Image Segmentation

Prithwijit Chowdhury, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. The advent of large foundation models has transformed artificial intelligence by providing generalized frameworks for large-scale downstream tasks. Segment Anything (SAM) is a vision-based model that performs image segmentation using “inclusion” and “exclusion” point prompts. In geophysics and seismic image analysis, facies and fault segmentation are cost-intensive, and SAM’s prompt-based approach offers fast segmentation without the need for model training on labeled data, thus saving time and resources. However, prompting is intuitive; too few prompts lead to errors, while over-prompting degrades performance. To optimize this process, our work aims to identify an ideal combination of prompts by measuring each prompt’s importance through its impact on segmentation quality, using the Intersection over Union (IoU) as a metric. We employ sufficiency to calculate prompt importance, with our algorithm guiding the user to stop prompting when the optimal level is reached. This approach not only minimizes manual efforts but also enables automated prompting for subsequent slices in facies or fault data without requiring ground truth labels.


Assisting Experts with Probability Maps for Seismic Fault Detection

Yavuz Yarici, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Seismic fault detection is a critical task in geophysical exploration, often requiring extensive manual labeling by experts. This process can be labor-intensive and subjective due to the complex nature of seismic data. Various seismic methods exist to detect and segment faults in seismic images; however, both human labeling and machine learning model predictions can result in mislabels. In this work, we present a framework to assist expert annotators by leveraging probability maps generated by a deep learning model for seismic fault detection. These probability maps indicate the likelihood of fault occurrences at different locations in the seismic data, allowing experts to focus on regions with high predicted fault likelihood. These maps provide valuable insights to expert geoscientists, assisting them in refining their labeling tasks and potentially reducing human error and bias.


Learning from Multiview Multimodal Sparse Data

Ghazal Kaviani, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Human daily activity data is inherently sparse unless trimmed and curated for training machine learning models. Within each activity pattern, certain data segments are more representative. Data collected from different sensors capture these activity patterns with varying levels of detail and specificity, resulting in differing degrees of sparsity across each signal. For instance, a sensor on the hand captures diverse hand interactions, whereas an insole sensor records similar standing or sitting patterns during the same period. A multimodal learning approach is essential for effectively detecting and segmenting these patterns.


Crowdsourcing Annotations for Fault Segmentation: Benchmarking Label Sources

Jorge Quesada, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Segmenting faults is of paramount importance in the seismic interpretation pipeline, albeit involving both costly and labor-intensive expert annotation. Alternatives to expertly labeled data imply often relying on synthetic data or weakly labeled data. In this work, we present the CRACKS dataset, a comprehensive fault segmentation dataset spanning labels across multiple levels of expertise and confidence. We benchmark the effectiveness of this dataset by evaluating different machine learning strategies to exploit its multifaceted structure, as well as comparing it with the results we achieve when using either synthetic or weak labels sources.


Disagreement-based Seismic Fault Labeling with Reduced Expert Annotations

Chen Zhou, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we discuss the potential of leveraging labels from lay annotators to enhance seismic fault interpretation while reducing the need for expert labeling. Interpretations exhibit disagreement within and between different levels of expertise, e.g., a geophysicist expert and less experienced practitioners. Conventionally, this disagreement is viewed as disadvantageous for machine learning models that rely on gold standard labels for training. We show that leveraging practitioner-labeled faults in the seismic sections that exhibit less expertise-based disagreements can reduce the need for expert labeling. Thus, it is important to characterize expertise-based disagreements. We develop a framework that first identifies a small number of seismic sections which entail the highest degree of expertise-based disagreements for expert labeling. The framework then uses practitioner annotations on the large amount of remaining data to augment the training set. We show that 1) augmenting with a large number of faults labeled by lay annotators achieves better fault interpretation than using only a small number of expert labels, and 2) is more effective than using synthetic data for pre-training.


Expertise-based Label Fusion for Seismic Fault Delineation

Chen Zhou, Jorge Quesada, Yavuz Yarici, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. In this work, we present an effective fusion framework that utilizes annotations across multiple levels of expertise to enhance the ML model’s performance on fault delineation. In another presentation, crowdsourced annotations are shown to be useful. Commonly, crowdsourced labels exhibit expertise-based discrepancies. The question is how to utilize labels from different expertise levels to enhance the ML model’s performance. Our intuition is that the labels from multiple expertise levels contain complementary information, which can be fused during pre-training to effectively approximate expert-level annotations. We validate our intuition on the CRACKS dataset. We pre-train a fault delineation model with fusion labels from two expertise levels, and then fine-tune it with lesser amount of expert-level labels. We then conduct a study on label fusion between multiple practitioners and novices in different weighting configurations. Our results show that 1) label fusion from different expertise during pre-training enhances fault delineation, and 2) better performance can be achieved with larger weight on higher expertise.


Anticipation for sparse dataset

Seulgi Kim, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. We aim to address the challenge of predicting key events in sparse data environments by leveraging hierarchical labeling and temporal pattern learning. Both human daily activity and seismic data share a common trait of sparsity: in human activity datasets, most frames lack meaningful cues related to action transitions, making it difficult to pinpoint crucial moments. For example, in a one-hour video of daily activities, the critical clues required to predict the next action may only span a few seconds of actual behavioral change. Similarly, in seismic datasets, structures of interest such as faults and salt domes appear sporadically. Less informative features like horizons dominate the data, making it challenging to focus on the sparse yet significant events.

To tackle this, we propose a hierarchical labeling with temporal sequence models to accurately capture the essential patterns within sparse data. By refining the granularity of labels and focusing on key temporal points, our method can better anticipate future actions and their precise timing. Our approach not only improves next-action predictions in human anticipation tasks but also provides a robust framework that can be extended to other domains facing sparse data challenges.


Indicative features of prompting performance in non-natural domains

Jorge Quesada, Zoe Fowler, Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib, OLIVES

Abstract. Foundation models constitute a paradigm shift in the way machine learning tasks are approached, moving now to prompting-based approaches. However, there is little understanding of what factors make a prompting strategy effective, particularly in the visual domain. We present the PointPrompt dataset, the first visual segmentation prompting dataset across several image domains. Our benchmark tasks provide an array of opportunities to improve the understanding of the way human prompts differ from automated ones and what underlying factors make for effective visual prompts. Overall, our experiments not only showcase the differences between human prompts and automated methods, but also highlight potential avenues through which these differences can be leveraged to improve effective visual prompt design.


Visual Prompting: A Hitchhiker’s Guide to Segment Anything

Mohammad Alotaibi, Mohit Prabhushankar, Kiran Kokilepersaud and Ghassan AlRegib, OLIVES

Abstract. Machine-learning (ML) algorithms have emerged as a tool for seismic interpretation. However they lack the expertise that human experts bring to the interpretation process. We hypothesize that cooperating ML methods with the domain expertise can address ML limitations. In particular, interactive prompting-based models like ChatGPT and the Segment Anything Model (SAM) enable this interactivity between AI and experts. To verify our hypotheses, we analyze this interaction between the Segment anything model (SAM) and different users tasked with a seismic labeling problem. We show that although users achieved an mIoU of 0.9, they struggled to influence SAM to segment the desired area. Moreover, we found that users approach SAM with presumptions that more prompts lead to better segmentation, and only accurate prompting is adequate for accurate segmentation. However, they tend to modify their approaches upon realizing that these assumptions are not true.