JUDIAgent: a scientific coding agent for JUDI workflows in wave-equation imaging

Authors

Haoyun Li

Abhinav Prakash Gahlot

Felix J. Herrmann

Abstract

We present JUDIAgent, a scientific coding assistant for JUDI.jl, an open-source Julia framework for wave-equation-based seismic modeling, imaging, and inversion. In seismic experimentation, the practical difficulty is not only writing executable code, but assembling a complete seismic workflow with the required physical model, acquisition geometry, modeling or migration operator, and saved outputs needed for inspection.

JUDIAgent addresses this problem by retrieving JUDI examples, generating Julia code, running the code in the target environment, and checking whether the generated script contains the requested workflow pieces and outputs, such as acquisition geometry and saved figures. We demonstrate the system with validated 2D forward-modeling and reverse-time migration (RTM) examples. These case studies show that, for JUDI-based seismic scripts, domain-aware validation can turn a lightweight user request into a more complete experimental workflow by detecting missing workflow steps that are not captured by runtime correctness alone and by making the resulting scripts easier to inspect and revise.

Introduction

Running a seismic experiment usually requires more than calling a single modeling or imaging operator. The user must still construct a workflow that specifies the physical model, acquisition geometry, source wavelet, operator sequence, and saved outputs needed for inspection and reruns. In this paper, those workflows are constructed in JUDI.jl, a Julia environment for wave-equation-based seismic modeling, imaging, and inversion (Witte et al. 2019; Herrmann, Witte, and Louboutin 2019), with Devito providing the underlying finite-difference wave-equation solves and stencil generation (Louboutin et al. 2019; Luporini et al. 2020).

JUDIAgent targets that workflow-construction step. Given a natural-language request, the system retrieves relevant JUDI examples and documentation, writes Julia code, runs it, and checks whether the generated script contains the workflow pieces and outputs requested in the prompt. The paper describes a domain-specific coding assistant that lowers the barrier to running seismic experiments by making JUDI-based modeling and imaging workflows easier to generate, validate, and inspect.

More broadly, recent coding-agent work has shown that retrieval, tool use, and iterative repair can improve performance on software tasks (Lewis et al. 2020; Yao et al. 2023; Yang et al. 2024). We use those ideas in a geophysical setting. Here retrieval-augmented generation means that code is not produced from the prompt alone: the system first looks up relevant JUDI examples and documentation, passes that material into generation, and then uses failures from execution or workflow review to guide the next repair step. Figure 1 summarizes this loop.

The implementation was developed for JUDI-based seismic workflows, but it also benefited from the broader idea that agent tooling can be paired with scientific simulators. The open-source JutulGPT project provided a reference point for agent-oriented scientific software design in another physics domain (SINTEF AgentLab 2026), while recent work on scientific workflows, benchmark design, and simulator-centered decision pipelines helped frame the evaluation (Gahlot et al. 2024; Li, Gahlot, and Herrmann 2025; Stodden and Miguez 2014). The paper focuses on forward-modeling and imaging tasks in JUDI. Its main contribution is a workflow-oriented coding assistant that combines code generation with runtime checking and with task-specific review, where the validation checklist changes with the seismic task named in the prompt.

Benchmark specification

The system is easiest to understand through one benchmark task and the rules that define what the generated script must contain. JUDIAgent is paired with a benchmark catalog that covers multiple seismic tasks, including forward modeling and RTM, together with task-specific workspace rules that supply defaults a user would otherwise need to state manually. One RTM prompt can therefore stay short:

Write a basic RTM example using JUDI.jl and save one RTM figure and one migrated image.

The short prompt is possible because the task-specific rules and prompt engineering were prepared in advance. For RTM, the benchmark rules require the generated script to distinguish the model used to generate the synthetic data from the migration-velocity model, construct Jacobian-adjoint imaging, save the requested migrated image, and follow plotting conventions taken from retrieved JUDI examples. The same rules also specify gray colormaps, the plotted spatial extent of the migrated image, symmetric clipping chosen from image magnitude, and muting used to suppress shallow and early-time acquisition artifacts. This design reduces how much domain knowledge the user must spell out while still keeping the generated experiment aligned with expert expectations.

JUDIAgent retrieves JUDI examples and then returns Julia code such as:

using JUDI, PythonPlot
model_true = Model(n, d, o, m_true)
model_mig = Model(n, d, o, m_mig)
xsrc = 150f0 .+ 300f0 .* (0:4); xrec = 0f0 .+ 12f0 .* (0:99)
src_geometry = Geometry(xsrc, ysrc, zsrc; dt=dt_src, t=t_src)
rec_geometry = Geometry(xrec, yrec, zrec; dt=dt, t=tn, nsrc=nsrc)
F_true = judiModeling(model_true, src_geometry, rec_geometry); d_obs = F_true * q
F_mig = judiModeling(model_mig, src_geometry, rec_geometry); d_residual = d_obs - F_mig * q
J = judiJacobian(F_mig, q)
rtm_image_raw = reshape(adjoint(J) * d_residual, n)
rtm_image_muted[:, 1:mute_rows] .= 0f0

This example shows how the user-facing prompt stays lightweight while the benchmark rules carry much of the task specification. The RTM imaging workflow and its saved outputs are constrained before generation starts, so the produced script can be checked against task requirements rather than against runtime success alone.

Validation

Validation in JUDIAgent has two layers. The first establishes whether the generated Julia script runs in the target environment. The second establishes whether the requested seismic workflow was produced. We refer to that second layer as task-specific review because its checklist changes with the seismic task named in the prompt. For forward modeling, the review checks for a model, acquisition geometry, a source wavelet, a forward operator, saved synthetic data, and saved figures. For reverse-time migration (RTM), it additionally checks for a migration-velocity model, synthetic observed data, Jacobian-adjoint imaging, and a saved migrated image (Baysal, Kosloff, and Sherwood 1983; Virieux and Operto 2009). Figure 1 summarizes this loop from prompt to generated code, execution, review, and repair.

The difference between these two layers matters because passing a runtime check is not the same as producing a complete seismic workflow. A script may execute but fail to save the requested data, omit the migration model, or never produce the image named in the prompt. In those cases JUDIAgent returns a concrete reason for failure and asks the model to repair the script rather than treating runtime success alone as enough.

Figure 1. Iterative JUDIAgent validation loop. A user request is matched with retrieved JUDI examples and documentation, translated into Julia code, executed in the target environment, and then checked for task-required workflow pieces and outputs before either passing or returning repair feedback.

The current codebase also supports a command-line interface that exposes the prompt, generated code, validation messages, and saved outputs during a session, as shown in Figure 2.

Figure 2. JUDIAgent Console view showing the user prompt, generated Julia code, validation feedback, and saved outputs exposed during one coding session. The interface illustrates how the system presents its main functions to the user rather than only returning a final script.

Benchmark results

We evaluate JUDIAgent as a scientific coding system rather than as a new numerical imaging method. The question is whether it can generate executable and inspectable JUDI workflows for two representative tasks: 2D forward modeling and RTM. This setup also serves a reproducibility goal in computational science because the system should produce scripts and outputs that another user can inspect and rerun (Stodden and Miguez 2014; Wilson et al. 2017; Li, Gahlot, and Herrmann 2025).

The two benchmark tasks stress different parts of the workflow. Forward modeling tests whether the agent can assemble a script that defines the model, geometry, synthetic data, and requested outputs. RTM is stricter because, in this synthetic benchmark, it requires the agent to keep track of the model used to generate the synthetic data, a separate migration-velocity model, Jacobian-adjoint imaging, and image export.

The forward-modeling case asks for a two-layer acoustic model, five sources, one hundred surface receivers, synthetic data, and two saved figures. The resulting shot gather shows a clear direct arrival and a weaker later reflection that is consistent with the layered model. The setup panel documents the physical model that the prompt requested. Figure 3 shows both outputs side by side.

The RTM case asks for a migration workflow with two subsurface models: one model used to produce the synthetic data and a smoother migration-velocity model used to form the image. The generated script also has to form residual data against the migration background, apply Jacobian-adjoint imaging, suppress shallow and early-time acquisition artifacts with muting, and save the migrated image to disk. Unlike the single-model forward setup in Figure 3, Figure 4 shows the two-model RTM setup together with the saved image from this benchmark.

Figure 3: Figure 3. Validated forward-modeling benchmark. The generated workflow produces both the requested two-layer setup panel and a central-shot gather whose direct arrival and later reflection are consistent with the layered model and acquisition geometry.

Figure 4: Figure 4. RTM case study generated by JUDIAgent. The left panel shows the two-model benchmark setup; the right panel shows the saved RTM image after subtracting the migration-background prediction from the observed data, applying muting to suppress shallow and early-time acquisition artifacts, and clipping the display by a robust percentile rule. The resulting image recovers the main horizontal reflector and demonstrates that the requested imaging workflow was generated and executed successfully.

Taken together, these cases show what we can support today: JUDIAgent can generate executable JUDI workflows that save figures and data products a user can inspect after runtime success. That is a practical step toward lowering the effort required to run and inspect seismology workflows in JUDI.

Conclusion

JUDIAgent addresses a practical problem in seismology: translating natural-language requests into JUDI workflows that not only run, but also produce the model setup, operator sequence, saved figures, and saved data products that the user asked for. By combining retrieval from JUDI examples with runtime checking and task-specific review, the system can reject incomplete scripts and repair them with concrete feedback such as missing data files, missing figures, or missing migration steps. The contribution is a lower-barrier interface for generating, validating, and inspecting seismic experiments rather than a new imaging method.

The current forward-modeling and RTM examples are modest, and we do not claim state-of-the-art imaging performance. An important next step is to extend workflow-completeness review with task-aware diagnostics. For migration tasks, these may include image-residual or illumination summaries that can be evaluated in future benchmark design. The codebase is maintained at https://github.com/haoyunl2/JUDIAgent.

Acknowledgment

This work was supported by the Georgia Tech SLIM Group. We thank the developers of JUDI.jl, Devito, LangChain, LangGraph, and JutulGPT for releasing open-source tools that informed this work. During preparation of this manuscript, the authors used generative AI tools in text revision and figure iteration. All content and figures were subsequently reviewed and edited by the authors.

References

Baysal, Edip, Dan D Kosloff, and John WC Sherwood. 1983. “Reverse Time Migration.” Geophysics 48 (11): 1514–24. https://doi.org/10.1190/1.1441434.

Gahlot, Abhinav Prakash, Haoyun Li, Ziyi Yin, Rafael Orozco, and Felix J. Herrmann. 2024. “A Digital Twin for Geological Carbon Storage with Controlled Injectivity.” arXiv Preprint arXiv:2403.19819. https://arxiv.org/abs/2403.19819.

Herrmann, Felix J, Philipp A Witte, and Mathias Louboutin. 2019. “JUDI: An Open-Source Software for Seismic Modeling and Inversion.” The Leading Edge 38 (9): 660–67.

Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, et al. 2020. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Advances in Neural Information Processing Systems 33: 9459–74.

Li, Haoyun, Abhinav Prakash Gahlot, and Felix J. Herrmann. 2025. “SeisFlowBench: A Benchmark for Seismic Wave Propagation and Flow Simulations.” Zenodo. https://doi.org/10.5281/zenodo.14927938.

Louboutin, Mathias, Michael Lange, Fabio Luporini, Navjot Kukreja, Philipp A Witte, Felix J Herrmann, Paulius Velesko, and Gerard J Gorman. 2019. “Devito (V3.1.0): An Embedded Domain-Specific Language for Finite Differences and Geophysical Exploration.” Geoscientific Model Development 12 (3): 1165–87. https://doi.org/10.5194/gmd-12-1165-2019.

Luporini, Fabio, Mathias Louboutin, Michael Lange, Navjot Kukreja, Philipp A Witte, Felix J Herrmann, Paulius Velesko, and Gerard J Gorman. 2020. “Architecture and Performance of Devito, a System for Automated Stencil Computation.” ACM Transactions on Mathematical Software (TOMS) 46 (1): 1–28. https://doi.org/10.1145/3374916.

SINTEF AgentLab. 2026. “JutulGPT: An AI Assistant for JutulDarcy.” https://github.com/SINTEF-agentlab/JutulGPT.

Stodden, Victoria, and Sheila Miguez. 2014. “Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research.” Journal of Open Research Software 2 (1). https://doi.org/10.5334/jors.ay.

Virieux, Jean, and Stéphane Operto. 2009. “An Overview of Full-Waveform Inversion in Exploration Geophysics.” Geophysics 74 (6): WCC1–26. https://doi.org/10.1190/1.3238367.

Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Computational Biology 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.

Witte, Philipp A, Mathias Louboutin, Navjot Kukreja, Fabio Luporini, Michael Lange, Gerard J Gorman, and Felix J Herrmann. 2019. “A Large-Scale Framework for Symbolic Implementations of Seismic Inversion Algorithms in Julia.” Geophysics 84 (3): F57–71. https://doi.org/10.1190/geo2018-0174.1.

Yang, John, Carlos E Jimenez, Alexander Wettig, Kilian Liber, Karthik Narasimhan, and Ofir Press. 2024. “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering.” arXiv Preprint arXiv:2405.15793.

Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. “ReAct: Synergizing Reasoning and Acting in Language Models.” In International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2210.03629.

Other Formats