OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI
| Title | OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI |
| Publication Type | Unpublished |
| Year of Publication | 2026 |
| Authors | Bhar, I, Huseyin Tuna Erdinc, Souza, T, Charles Jones, Felix J. Herrmann |
| Month | 3 |
| Keywords | deep learning, generative model, IMAGE, Imaging, Inverse problems, real seismic data, SEG, time-to-depth conversion, well logs, WISE |
| Abstract | The advent of machine learning (ML) and computer vision has significantly accelerated seismic inversion workflows by reducing the computational cost of traditionally expensive iterative methods. However, the development and evaluation of ML methods remains limited by the scarcity of realistic velocity models, as most high-quality data are privately owned by oil and gas companies. To address this gap, we present *OpenSeisML*, a collection of real seismic datasets designed to support generative AI (Gen-AI) workflows for seismic inversion. The datasets are curated from publicly available surveys in the UK National Data Repository (NDR). When seismic volumes are in the time domain and wells are in depth, a time-to-depth conversion is required. We use checkshot data to establish the time–depth relationship and construct a velocity model through interpolation for accurate conversion of post-stack seismic data. Here, we present an automated data curation pipeline that enables seismic data preparation while ensuring reproducibility. The objective is to train a generative model that captures the statistical distribution of subsurface properties, enabling the synthesis of multiple statistically consistent realizations for uncertainty quantification which can act as a prior for seismic inversion. |
| URL | https://slim.gatech.edu/Publications/Public/Submitted/2026/bhar2026IMAGEolr/abstract.html |
| URL2 | |
| Citation Key | bhar2026IMAGEolr |
