JUDIAgent: a scientific coding agent for JUDI workflows in wave-equation imaging
We present JUDIAgent, a scientific coding assistant for JUDI.jl, an open-source Julia framework for wave-equation-based seismic modeling, imaging, and inversion. In seismic experimentation, the practical difficulty is not only writing executable code, but assembling a complete seismic workflow with the required physical model, acquisition geometry, modeling or migration operator, and saved outputs needed for inspection.
JUDIAgent addresses this problem by retrieving JUDI examples, generating Julia code, running the code in the target environment, and checking whether the generated script contains the requested workflow pieces and outputs, such as acquisition geometry and saved figures. We demonstrate the system with validated 2D forward-modeling and reverse-time migration (RTM) examples. These case studies show that, for JUDI-based seismic scripts, domain-aware validation can turn a lightweight user request into a more complete experimental workflow by detecting missing workflow steps that are not captured by runtime correctness alone and by making the resulting scripts easier to inspect and revise.
Introduction
Running a seismic experiment usually requires more than calling a single modeling or imaging operator. The user must still construct a workflow that specifies the physical model, acquisition geometry, source wavelet, operator sequence, and saved outputs needed for inspection and reruns. In this paper, those workflows are constructed in JUDI.jl, a Julia environment for wave-equation-based seismic modeling, imaging, and inversion (Witte et al. 2019; Herrmann, Witte, and Louboutin 2019), with Devito providing the underlying finite-difference wave-equation solves and stencil generation (Louboutin et al. 2019; Luporini et al. 2020).
JUDIAgent targets that workflow-construction step. Given a natural-language request, the system retrieves relevant JUDI examples and documentation, writes Julia code, runs it, and checks whether the generated script contains the workflow pieces and outputs requested in the prompt. The paper describes a domain-specific coding assistant that lowers the barrier to running seismic experiments by making JUDI-based modeling and imaging workflows easier to generate, validate, and inspect.
More broadly, recent coding-agent work has shown that retrieval, tool use, and iterative repair can improve performance on software tasks (Lewis et al. 2020; Yao et al. 2023; Yang et al. 2024). We use those ideas in a geophysical setting. Here retrieval-augmented generation means that code is not produced from the prompt alone: the system first looks up relevant JUDI examples and documentation, passes that material into generation, and then uses failures from execution or workflow review to guide the next repair step. Figure 1 summarizes this loop.
The implementation was developed for JUDI-based seismic workflows, but it also benefited from the broader idea that agent tooling can be paired with scientific simulators. The open-source JutulGPT project provided a reference point for agent-oriented scientific software design in another physics domain (SINTEF AgentLab 2026), while recent work on scientific workflows, benchmark design, and simulator-centered decision pipelines helped frame the evaluation (Gahlot et al. 2024; Li, Gahlot, and Herrmann 2025; Stodden and Miguez 2014). The paper focuses on forward-modeling and imaging tasks in JUDI. Its main contribution is a workflow-oriented coding assistant that combines code generation with runtime checking and with task-specific review, where the validation checklist changes with the seismic task named in the prompt.
Benchmark specification
The system is easiest to understand through one benchmark task and the rules that define what the generated script must contain. JUDIAgent is paired with a benchmark catalog that covers multiple seismic tasks, including forward modeling and RTM, together with task-specific workspace rules that supply defaults a user would otherwise need to state manually. One RTM prompt can therefore stay short:
Write a basic RTM example using JUDI.jl and save one RTM figure and one migrated image.
The short prompt is possible because the task-specific rules and prompt engineering were prepared in advance. For RTM, the benchmark rules require the generated script to distinguish the model used to generate the synthetic data from the migration-velocity model, construct Jacobian-adjoint imaging, save the requested migrated image, and follow plotting conventions taken from retrieved JUDI examples. The same rules also specify gray colormaps, the plotted spatial extent of the migrated image, symmetric clipping chosen from image magnitude, and muting used to suppress shallow and early-time acquisition artifacts. This design reduces how much domain knowledge the user must spell out while still keeping the generated experiment aligned with expert expectations.
JUDIAgent retrieves JUDI examples and then returns Julia code such as:
using JUDI, PythonPlot
model_true = Model(n, d, o, m_true)
model_mig = Model(n, d, o, m_mig)
xsrc = 150f0 .+ 300f0 .* (0:4); xrec = 0f0 .+ 12f0 .* (0:99)
src_geometry = Geometry(xsrc, ysrc, zsrc; dt=dt_src, t=t_src)
rec_geometry = Geometry(xrec, yrec, zrec; dt=dt, t=tn, nsrc=nsrc)
F_true = judiModeling(model_true, src_geometry, rec_geometry); d_obs = F_true * q
F_mig = judiModeling(model_mig, src_geometry, rec_geometry); d_residual = d_obs - F_mig * q
J = judiJacobian(F_mig, q)
rtm_image_raw = reshape(adjoint(J) * d_residual, n)
rtm_image_muted[:, 1:mute_rows] .= 0f0This example shows how the user-facing prompt stays lightweight while the benchmark rules carry much of the task specification. The RTM imaging workflow and its saved outputs are constrained before generation starts, so the produced script can be checked against task requirements rather than against runtime success alone.
Validation
Validation in JUDIAgent has two layers. The first establishes whether the generated Julia script runs in the target environment. The second establishes whether the requested seismic workflow was produced. We refer to that second layer as task-specific review because its checklist changes with the seismic task named in the prompt. For forward modeling, the review checks for a model, acquisition geometry, a source wavelet, a forward operator, saved synthetic data, and saved figures. For reverse-time migration (RTM), it additionally checks for a migration-velocity model, synthetic observed data, Jacobian-adjoint imaging, and a saved migrated image (Baysal, Kosloff, and Sherwood 1983; Virieux and Operto 2009). Figure 1 summarizes this loop from prompt to generated code, execution, review, and repair.
The difference between these two layers matters because passing a runtime check is not the same as producing a complete seismic workflow. A script may execute but fail to save the requested data, omit the migration model, or never produce the image named in the prompt. In those cases JUDIAgent returns a concrete reason for failure and asks the model to repair the script rather than treating runtime success alone as enough.
The current codebase also supports a command-line interface that exposes the prompt, generated code, validation messages, and saved outputs during a session, as shown in Figure 2.
Benchmark results
We evaluate JUDIAgent as a scientific coding system rather than as a new numerical imaging method. The question is whether it can generate executable and inspectable JUDI workflows for two representative tasks: 2D forward modeling and RTM. This setup also serves a reproducibility goal in computational science because the system should produce scripts and outputs that another user can inspect and rerun (Stodden and Miguez 2014; Wilson et al. 2017; Li, Gahlot, and Herrmann 2025).
The two benchmark tasks stress different parts of the workflow. Forward modeling tests whether the agent can assemble a script that defines the model, geometry, synthetic data, and requested outputs. RTM is stricter because, in this synthetic benchmark, it requires the agent to keep track of the model used to generate the synthetic data, a separate migration-velocity model, Jacobian-adjoint imaging, and image export.
The forward-modeling case asks for a two-layer acoustic model, five sources, one hundred surface receivers, synthetic data, and two saved figures. The resulting shot gather shows a clear direct arrival and a weaker later reflection that is consistent with the layered model. The setup panel documents the physical model that the prompt requested. Figure 3 shows both outputs side by side.
The RTM case asks for a migration workflow with two subsurface models: one model used to produce the synthetic data and a smoother migration-velocity model used to form the image. The generated script also has to form residual data against the migration background, apply Jacobian-adjoint imaging, suppress shallow and early-time acquisition artifacts with muting, and save the migrated image to disk. Unlike the single-model forward setup in Figure 3, Figure 4 shows the two-model RTM setup together with the saved image from this benchmark.
Taken together, these cases show what we can support today: JUDIAgent can generate executable JUDI workflows that save figures and data products a user can inspect after runtime success. That is a practical step toward lowering the effort required to run and inspect seismology workflows in JUDI.
Conclusion
JUDIAgent addresses a practical problem in seismology: translating natural-language requests into JUDI workflows that not only run, but also produce the model setup, operator sequence, saved figures, and saved data products that the user asked for. By combining retrieval from JUDI examples with runtime checking and task-specific review, the system can reject incomplete scripts and repair them with concrete feedback such as missing data files, missing figures, or missing migration steps. The contribution is a lower-barrier interface for generating, validating, and inspecting seismic experiments rather than a new imaging method.
The current forward-modeling and RTM examples are modest, and we do not claim state-of-the-art imaging performance. An important next step is to extend workflow-completeness review with task-aware diagnostics. For migration tasks, these may include image-residual or illumination summaries that can be evaluated in future benchmark design. The codebase is maintained at https://github.com/haoyunl2/JUDIAgent.
Acknowledgment
This work was supported by the Georgia Tech SLIM Group. We thank the developers of JUDI.jl, Devito, LangChain, LangGraph, and JutulGPT for releasing open-source tools that informed this work. During preparation of this manuscript, the authors used generative AI tools in text revision and figure iteration. All content and figures were subsequently reviewed and edited by the authors.





