--- authors: | Philipp A. Witte``^1``, Mathias Louboutin``^1``, \ Charles Jones``^2`` and Felix J. Herrmann``^1`` \ ``^1`` Georgia Institute of Technology, School of Computational Science and Engineering \ ``^2`` Osokey Ltd. \ Email: pwitte3@gatech.edu \ Submission date: November 27, 2019 bibliography: - references.bib title: Serverless seismic imaging in the cloud ... ## Abstract This talk presents a serverless approach to seismic imaging in the cloud based on high-throughput containerized batch processing, event-driven computations and a domain-specific language compiler for solving the underlying wave equations. A 3D case study on Azure demonstrates that this approach allows reducing the operating cost of up to a factor of 6, making the cloud a viable alternative to on-premise HPC clusters for seismic imaging. ## Audience The content of this abstract is intended as a technical talk for an audience with little to no prior knowledge about cloud computing, but basic knowledge about reverse-time migration and HPC. This talk is designed for people who are interested in how the cloud can be adapted for large-scale seismic imaging and inversion in both research and production environments. ## Introduction Seismic imaging and parameter estimation are among the most computationally challenging problems in scientific computing and thus require access to high-performance computing (HPC) clusters for working on relevant problem sizes as encountered in today’s oil and gas (O&G) industry. Some companies such as BP, PGS and Exxon Mobile operate private HPC clusters with maximum achievable performance in the order of petaflops [@pgs2019; @bp2009], while some companies are even moving towards exascale computing [@dugmccloud2019]. However, the high upfront and maintenance cost of on-premise HPC clusters make this option only financially viable in a production environment where computational resources constantly operate close to maximum capacity. Many small and medium sized O&G companies, academic institutions and service companies have a highly varying demand for access to compute and/or are financially not in a position to purchase on-premise HPC resources. Furthermore, researchers in seismic inverse problems and machine learning oftentimes require access to a variety of application-dependent hardware, such as graphical processing units (GPUs) or memory optimized compute nodes for reverse-time migration (RTM). Cloud computing thus offers a valuable alternative to on-premise HPC clusters, as it provides a large variety of (theoretically) unlimited computational resources without any upfront cost. Access to resources in the cloud is based on a pay-as-you-go pricing model, making it ideal for providing temporary access to compute or for supplementing on-premise HPC resources to meet short-term increases in computing demands. However, some fundamental differences regarding hardware and how computational resources are exposed to users exist between on-premise HPC clusters and the cloud. While cloud providers are increasingly investing in HPC technology, the majority of existing hardware is not HPC optimized and networks are conventionally based on Ethernet. Additionally, the pay-as-you-go pricing model is a usage-based system, which means users are constantly charged for running instances. This possibly results in very high operating costs if instances sit idle for extended amounts of time, which is common in standard RTM workflows based on a client-server model in which the master process distributes the workload to the parallel workers. In this work, we demonstrate a serverless approach to seismic imaging on Microsoft Azure, which does not rely on a cluster of permanently running virtual machines (VMs). Instead, expensive compute instances are automatically launched and scaled by the cloud environment, thus preventing instances from sitting idle. For solving the underlying forward and adjoint wave equations, we use a domain-specific language compiler called *Devito* [@louboutin2018; @luporini2018], which combines a symbolic user interface with automated performance optimization for generating fast and parallel C code using just-in-time compilation. The separation of concerns between the wave equation solver and the serverless workflow implementation leads to a seismic imaging framework that scales to large-scale problem sizes and allows reducing the operating cost in the cloud up to a factor of 2-6, as demonstrated in our subsequent RTM case study. ## Current state of the art The cloud is increasingly adopted by O&G companies for general purpose computing, marketing, data storage and analysis [@customer2019], but utilizing the cloud for HPC applications such as (least-squares) RTM and full-waveform inversion (FWI) remains challenging. A wide range of performance benchmarks on various cloud platforms find that the cloud can generally not provide the same performance in terms of latency, bandwidth and resilience as conventional on-premise HPC clusters [@ramakrishnan2012; @benedict2013; @mehrotra2016], or only at considerable cost. Recently, cloud providers such as Amazon Web Services (AWS) or Azure have increasingly extended their HPC capabilities and improved their networks [@vmpricing2019], but HPC instances are oftentimes considerably more expensive than standard cloud VMs [@vmpricing2019]. On the other hand, the cloud offers a range of novel technologies such as massively parallel objective storage, containerized batch computing and event-driven computations that allow addressing computational bottlenecks in novel ways. Adapting these technologies requires re-structuring seismic inversion codes, rather than running legacy codes on a virtual cluster of permanently running cloud instances (*lift and shift*). Companies that have taken steps towards the development of cloud-native technology include S-Cube, whose FWI workflow for AWS utilizes object storage, but is still based on a master-worker scheme [@scube2019]. Another example is Osokey [@ashley2019], a company offering fully cloud-native and serverless software for seismic data visualization and interpretation. In a previous publication, we have adapted these concepts for seismic imaging and introduced a fully cloud-native workflow for serverless imaging on AWS [@witte2019c]. Here, we describe the implementation of this approach on Azure and present a 3D imaging case study. ## Methods and key results ### Workflow The key contribution of this talk is a serverless implemenation of an RTM workflow on Azure. The two main steps of a (generic) RTM workflow are the parallel computation of individual images for each source location and the subsequent summation of all components into a single image, which can be interpreted as an instance of a MapReduce program [@dean2008]. Rather than running RTM on a cluster of permanently running VMs, we utilize a combination a high-throughput batch processing and event-driven computations to compute images for separate source locations as an embarrassingly parallel workflow (Figure #f1 and #f2). The parallel computation of RTM images for separate source locations is implemented with Azure Batch, a service for scheduling and running containerized workloads. The image of each respective source location is processed by Azure Batch as a separate job, each of which can be executed on a single or multiple VMs (i.e. using MPI-based domain decomposition). Azure Batch accesses computational resources from a batch pool and automatically adds and removes VMs from the pool based on the number of pending jobs, thus mitigating idle instances. The software for solving the underlying forward and adjoint wave equations is deployed to the batch workers through Docker containers and Devito’s compiler automatically performs a series of performance optimization to generate optimized C code for solving the PDEs (Figure #f2). #### Figure: {#f1} ![](figures/figure1.png){width=70%} : Azure setup for computing an RTM image as a containerized embarrassingly parallel workload. #### Figure: {#f2} ![](figures/figure2.png){width=70%} : Software stack of the Docker container for computing the RTM images. As communication between individual jobs is not possible, we separately implement the reduce part of RTM (i.e. the summation of all images into a single data cube) using Azure functions. These event-driven functions are automatically invoked when a batch workers writes its computed image to the object storage system (blob storage) and sends the corresponding object identifier to a message queue, which collects the IDs of all results (Figure #f3). As soon as object IDs are added to the queue, Azure functions that sum up to 10 images from the queue are automatically invoked by the cloud environment. Each function writes its summed image back to the storage and the process is repeated recursively until all images have been summed into a single volume. As such, the summation process is both asynchronous and parallel, as the summation is started as soon as the first images are available and multiple Azure functions can be invoked at the same time. #### Figure: {#f3} ![](figures/figure3.png){width=70%} : Event-driven gradient summation on Azure, using Azure Functions and Queue Storage. ### RTM Case study For our 3D RTM case study on Azure, we use a synthetic velocity model derived from the 3D SEG Salt and Overthrust models, with dimensions of 3.325 x 10 x 10 km. We discretize the model using a 12.5 m grid, which results in 267 x 801 x 801 grid points. We generate data at 15 Hz peak frequency for a randomized seismic acquisition geometry, with data being recorded by 1,500 receivers that are randomly distributed along the ocean floor. The source vessel fires the seismic source on a dense regular grid, consisting of 799 x 799 source locations (638,401 in total). For imaging, we assume source-receiver reciprocity, which means that sources and receivers are interchangeable and data can be sorted into 1,500 shot records with 638,401 receivers each. We model wave propagation for generating the seismic data with an anisotropic pseudo-acoustic TTI wave equation and implement discretized versions of the forward and adjoint (linearized) equations with Devito, as presented in @louboutin2018seg. #### Figure: {#f4} ![#f4a](figures/figure4a.png){width=36.4%} ![#f4b](figures/figure4b.png){width=42.6%} : (a) Sorted runtimes for computing the RTM image. Each job corresponds to the image of a single source location. (b) Cumulative idle time for computing this workload as a function of the number of parallel workers on either a fixed cluster of VMs or using Azure Batch. The right-hand y-axis shows the corresponding cost, which is proportional to the idle time. For the computations, we use Azure’s memory optimized E64 and E64s VMs, which have 432 GB of memory, 64 vCPUs and a 2.3 GHz Intel Xeon E5-2673 processor [@vmpricing2019]. To fit the forward wavefields in memory, we utilize two VMs per source and use MPI-based domain decomposition for solving the wave equations. The time-to-solution of each individual image as a function of the source location is plotted in Figure #f4a, with the average container runtime being 119.28 minutes per image. The on-demand price of the E64/E64s instances is 3.629\$ per hour, which results in a cumulative cost of 10,750\$ for the full experiment and a total runtime of approximately 30 hours using 100 VMs. Figure #f4b, shows the cumulative idle time for computing the workload from Figure #f4a on a fixed cluster as a function of the number of parallel VMs. We model the idle time by assuming that all 1,500 jobs (one job per source location) are distributed to the parallel workers on a first-come-first-serve basis and that a master worker collects all results. The idle time using a fixed VM cluster results from the fact that all cluster workers have to wait for the final worker to finish its computations, while Azure Batch automatically scales down the cluster, thus preventing instances from sitting idle. While VM clusters on Azure in principle support auto-scaling as well, this is not possible if MPI is used to distribute the data/sources to the workers. Thus, performing RTM on a fixed cluster of VMs results in additional costs due to idle time up to a factor of 2x. By utilizing low-priority instances, it is possible to further reduce the operating cost by a factor of 2-3 (i.e. up to a factor 6 in total). #### Figure: {#f5} ![](figures/figure5a.png){width=100%} : Horizontal depth slice through the final 3D image cube at 1,500 m depth. ## Conclusion Adapting the cloud using serverless workflows, in contrast to lift and shift, allows us to leverage cloud services such as batch computing and reduce the operating cost of RTM by a factor of 2-6. This transition is made possible through abstract user interfaces and an automatic code generation framework, which is highly flexible, but provides the necessary performance to work on industry-scale problems. ## Presenter bio P. Witte, M. Louboutin and F. J. Herrmann are part of the core Devito development team and have closely collaborated with multiple cloud companies to develop serverless implementations of RTM. They believe that the computational cost and complexity of seismic inversion can only be managed through abstractions, automatic code generation and software based on a separation of concerns. C. Jones is the head of development at Osokey, a company specialized in cloud-native software for seismic data analysis. ## Acknowledgements We would like to acknowledge S. Brandsberg-Dahl, A. Morris, K. Umay, H. Shel, S. Roach and E. Burness (all with Microsoft Azure) for collaborating with us on this project. Many thanks also to H. Modzelewski (University of British Columbia) and J. Selvage (Osokey). This research was funded by the Georgia Research Alliance. ## References