Accurate forward modeling is important for solving inverse problems. An inaccurate wave-equation simulation, as a forward operator, will offset the results obtained via inversion. In this work, we consider the case where we deal with incomplete physics. One proxy of incomplete physics is an inaccurate discretization of Laplacian in simulation of wave equation via finite-difference method. We exploit intrinsic one-to-one similarities between timestepping algorithm with Convolutional Neural Networks (CNNs), and propose to intersperse CNNs between low-fidelity timesteps. Augmenting neural networks with low-fidelity timestepping algorithms may allow us to take large timesteps while limiting the numerical dispersion artifacts. While simulating the wave-equation with low-fidelity timestepping algorithm, by correcting the wavefield several time during propagation, we hope to limit the numerical dispersion artifact introduced by a poor discretization of the Laplacian. As a proof of concept, we demonstrate this principle by correcting for numerical dispersion by keeping the velocity model fixed, and varying the source locations to generate training and testing pairs for our supervised learning algorithm.
In inverse problem, we heavily rely on having an accurate forward modeling operators. Often, we can not afford being physically or numerically accurate. In other words, being numerically inaccurate can be due to computationally complexity of accurate methods, or incomplete knowledge of the underlying data generation process. In either case, motivated by Ruthotto and Haber (2018), we propose to intersperse CNNs between timestepping algorithm for simulating acoustic wave-equation. We mimic incomplete/inaccurate physics by simulating wave equation with finite-difference method, while utilizing a poor (second-order) discretization of Laplacian.
Conventional method for solving partial differential equations (PDEs), e.g., finite-difference and finite-element method, given enough computational recourses, are able to simulate high-fidelity solutions to PDEs. On one hand, as long as the Courant–Friedrichs–Lewy conditions for stability are satisfied, finite-difference methods are able to compute solutions to PDE, regardless of medium parameters, with arbitrary precision. On the other hand, finite-element method requires careful meshing of the medium in order to carry out the simulation. Another motivation behind this work is to exploit the fact that in seismic, wave simulations are usually carried out for specific families of velocity models and source/receiver distributions. We hope our proposed method meets halfway between two mentioned extremes—i.e., being too generic (finite-difference method) and being too problem specific (finite-element method).
There are several attempts to exploit learning methods in wave-equation simulation. Raissi (2018) approximates the solution to a nonlinear PDE with a neural network. The neural network, given points on the computational grid as input, computes the solution of PDE. Training data is obtained by computing the solution of PDE on several points. Moseley et al. (2018) completely ignore the Laplacian and they solely rely on predicting the next timestep from the previous two timesteps by learning the action of the spatially varying velocity and Laplacian. While possible in principle, their approach needs to train for long times to provide reasonable simulations on relatively simple models. Siahkoohi et al. (2019) instead of ignoring the physics, relies on low-fidelity wave-equation simulation, and by exploiting transfer learning, they utilize a single CNN to correct wavefield snapshots simulated on a “nearby” velocity mode for numerical dispersion at any given timestep. In this work, we extend ideas in Siahkoohi et al. (2019) and propose using multiple CNNs, interspersed between low-fidelity timesteps. Finally, Rizzuti et al. (2019) propose interspersing Krylov-subspace iterations and neural nets while inverting the Helmholtz equation. They show improvement in convergence by “propagating” an approximated wavefield, obtained from a limited number of iterations, with the aid of a trained convolutional neural net. This technique can be seen as the frequency-domain counterpart of our proposed method.
Our paper is organized as follows. First, we describe our approach in detail by first, describing our formulation for learned wave simulation. Next, we introduce our training objective function. Due to dependencies of CNN parameters, we devised an training heuristic that we describe. Before explaining our numerical experiments, we state used CNN architecture and training details. Next, we describe our three numerical experiments we conduct and discuss effectiveness of the proposed method.
We describe how we augment low-fidelity physics with learning techniques to handle incomplete and/or inaccurate physics, where the low-fidelity physics is modeled via finite-difference method with a poor discretization of the Laplacian. To ensure accuracy, the temporal and spatial discretization in high-fidelity wave-equation simulations have to be chosen very fine, typically one to two orders of magnitude smaller than Nyquist sampling rate. As mentioned earlier, we will utilize a poor discretization (only second order) for the Laplacian to carry out low-fidelity wave-equation simulations, but the scheme can be extended to other proxies of incomplete or inaccurate physics.
After discretization of the acoustic wave equation, a single timestep of of scalar wavefields, simulated on 0≤t≤T, can be written as below:
uj+1=2uj−uj−1+δt2c2Δuj,j=0,1,…,N−1,
Motivated by Ruthotto and Haber (2018), we consider every timestep as a single layer in a neural network, where the discretized Laplacian is a linear operator, followed by the (nonlinear) action of the spatial varying velocity. Moreover, noticing the additional terms in the Equation 1, each timestep is similar to a residual block introduced by Szegedy et al. (2017). Figures 1a and 1b schematically indicate each timestep as a block, corresponding to high- and low-fidelity discretization of wave equation. respectively. The similarity of high- and low -fidelity timestepping method and CNNs can be perceived from Figures 2a and low-fidelity-step, respectively, where red and yellow blocks correspond to high- and low-fidelity timestepping equations, respectively.
As it can be seen, high-fidelity simulation of the wave equation up to time t=T requires a lot of high-fidelity timesteps. On the other hand, Figure 2b shows that the low-fidelity simulations can be done with much less low-fidelity timesteps, due to course time sampling, which each timestep is cheaper than the high-fidelity timesteps due to the coarse discretization of Laplacian. Although computationally cheap, the low-fidelity wave-equation simulations suffer from numerical dispersion artifacts.
Depending on the domain of application, we can assume wave simulations are typically carried out for specific families of velocity models and source/receiver distributions. This motivates us to deploy a data-driven wave simulation algorithm which is coupled with low-fidelity and cheap physics and hope to recover high-fidelity wave simulations on a family of velocity models. In our method, we propagate the coarse-grained wavefields according to Equation 2 with a coarsened Laplacian. After k timesteps, where k is a hyperparameter, we apply a correction with a CNN, Gθi, parameterized by θi, to the obtained wavefield at jth timestep and proceed with the timestepping. The proposed data-driven timestepping wave simulation method is formalized in Equation 3.
ˉuj+1={Gθi(2ˉuj−ˉuj−1+δT2ˉc2ˉΔˉuj), i=⌊jk⌋ if j≡k−1 (mod k),2ˉuj−ˉuj−1+δT2ˉc2ˉΔˉuj else
The CNNs correct for the effects of inaccurate physics—i.e., numerical dispersion in our experiments, at every kth low-fidelity timestep. In this work, the parameters, θi, are not shared among the CNNs. We have not explored the possibility of shared weights among the CNNs in different stages of wave propagation.
Note that although parameters of the CNNs are not shared, they are not independent—i.e., after jth timestep, CNN Gθi,i=⌊jk⌋, corrects for errors in the wavefield introduced by low-fidelity timestepping and imperfections present in the output of (i−1)th CNN, which have been propagated through timestepping. Therefore, a small perturbation in the parameters of a CNN in the initial stages of neural network augmented timestepping causes noticeable differences in the input of the CNNs in later stages of the wave propagation. The described dependencies among the CNN parameters introduces difficulties in optimizing the parameters of the CNNs. Below we describe our heuristic for training the CNNs.
During training, we train all the CNNs toward the high-fidelity solution of wave-equation, at the corresponding timestep, obtained by solving Equation 1. As it can be seen from Equation 3, after jth timestep, CNN Gθi,i=⌊jk⌋, is tasked to correct the effects of low-fidelity timestepping. During training, ith CNN maps its input, 2ˉuj−ˉuj−1+δT2ˉc2ˉΔˉuj, to uj+1, result of jth timestep using high-fidelity timestepping, obtained by Equation 1.
Define function ˉFk(.) as the action of k low-fidelity timesteps—i.e., ˉFk(.) represents k consecutive low-fidelity time stepping blocks, depicted in Figure 1b. Clearly, ˉFk is only a function of k, δT, ˉc, and ˉΔ. Using the defined notation, we can write the input to ith CNN, ˆui, i=0,1,…,⌊M−1k⌋, as follows:
ˆui=ˉFk(Gθi−1(ˆui−1)),i=1,2,…,⌊M−1k⌋, ˆu0=ˉFk(q),
The input-output pair of the ith CNN is (ˆui,uτi). We can generated multiple training pairs for CNNs by simulating (ˆui,uτi) pairs, for various velocity models and source locations. Assume we have n pairs of of training data for CNNs, namely, (ˆu(p)i,u(p)τi), p=0,1,…,n−1. The objective function for optimizing ith CNN can be written as follows:
Li=1nn−1∑p=0‖Gθi(ˆu(p)i)−u(p)τi‖1,i=0,1,…,⌊M−1k⌋.
To overcome complexities caused by dependencies between parameters of CNNs, we optimize the objective functions Li with a heuristic described below. We minimize Li, i=0,1,…,⌊M−1k⌋, with respect to θi, respectively. In other words, we minimize Li with respect to θi, by keeping the rest of the parameters fixed. We keep updating all the set of parameters, in a cyclic fashion—i.e., once we updated all the parameters, Li, i=0,1,…,⌊M−1k⌋, we start over and update them again, in order, until a stopping criteria is achieved. We will describe the stopping criteria used in our experiments later.
We minimize objective functions the above objectives 5 with a variant of Stochastic Gradient Descent known as the Adam optimizer (Kingma and Ba, 2015) with momentum parameter β=0.9 and a linearly decaying stepsize with initial value μ=2×10−4 for both the generator and discriminator networks. During each iteration of Adam, the gradient Li is approximated by a single randomly selected training pair. These pairs are selected without replacement. Once all the training pairs have been selected, we start over by randomly picking training pairs, without replacement from the entire training set.
The optimization carries out for a predetermined number of total iterations, where each iterations consists of drawing a random training pair, without replacement, and updating parameters of a CNN. Additionally, while optimizing θi by keeping the rest of the parameters fixed, before proceeding to the next set of parameters, we carry out the optimization to update θi for number of iterations, which we refer to it as mini-iterations. Algorithm 1 indicates the steps for optimizing objective functions 5.
1. INPUT:
MaxItr // total number of iterations to carry out the optimization
MaxMiniItr // mini-iterations before proceeding to the next CNN
ˉFk(.) // k consecutive low-fidelity time stepping blocks
q(p) p=0,1,…,n−1 \\ sources corresponding to different training pairs
u(p)τi, p=0,1,…,n−1, i=0,1,…,⌊M−1k⌋ \\ high-fidelity snapshots
θ0i, i=0,1,…,⌊M−1k⌋ // randomly initialized parameters
2. ItrNum←0
3. FOR i=0:⌊M−1k⌋ DO
4. θi=θ0i
5. FOR p=0:n−1 DO
6. ˆu(p)0=ˉFk(q(p))
7. WHILE itrNum<MaxItr DO
8. FOR i=0:⌊M−1k⌋ DO
9. IF i>0 DO
10. FOR p=0:n−1 DO
11. ˆu(p)i=ˉFk(Gθi−1(ˆu(p)i−1))
12. FOR miniItrNum=1:MaxMiniItr DO
13. p←SampleWithoutReplacement({0,1,…,n−1})
14. θi←argminθi‖Gθi(ˆu(p)i)−u(p)τi‖1
15. ItrNum←ItrNum+1
16. RETURN θi, i=0,1,…,⌊M−1k⌋
Motivated by our previous attempts for numerical dispersion removal from wavefield snapshots (Siahkoohi et al., 2018, 2019), we use the exact architecture provided by Johnson et al. (2016), which includes Residual Blocks, the main building block of ResNets, introduced by He et al. (2016), for all the CNNs Gθi, i=0,1,…,⌊M−1k⌋.
While CNNs are known to generalize well—i.e., maintain the quality of performance when applied to unseen data, they can only be successfully applied to a data set drawn from the same distribution as the training data. Because of the Earth’s heterogeneity and complex geological structures present in realistic-looking models, training a neural network that can generalize well when applied to another velocity model can become challenging. While we have successfully demonstrated that transfer learning (Yosinski et al., 2014) can be used in situations where the neural network is initially trained on data from a proximal survey (Siahkoohi et al., 2019), we chose in this contribution, as a proof of concept, to keep the velocity model fixed, and vary the source locations to generate different training/testing pairs.
We use the Marmousi velocity model and out of 401 available shot locations with 7.5 m spacing, we allocate half of the shot locations to training and use the rest of the shot locations to generate testing pairs, for evaluation purposes. The maximum simulation time in our experiments in 1.1 s.
We designed and implemented our deep architectures in TensorFlow1. To carry out our wave-equation simulations with finite differences, we used Devito2 (Louboutin et al., 2018; Luporini et al., 2018). We used the functionality of Operator Discretization Library3 to wrap Devito operators into a TensorFlow layers. Our implementation can be found on GitHub4.
We ran our algorithm on Amazon Web Services’ g3.4xlarge instance, where we optimize the CNN parameters on a NVIDIA Tesla M60 GPU and Devito utilizes 16 CPU cores to perform finite-difference wave-equation simulations. Initially, we simulate the high-fidelity training wavefield snapshots, u(p)τi, p=0,1,…,n−1, i=0,1,…,⌊M−1k⌋, only once, in the beginning, and store them. In order to limit CPU-GPU communication, before utilizing the GPU to to update θi, i=0,1,…,⌊M−1k⌋, we generate the input to ith CNN, ˆu(p)i, p=0,1,…,n−1 all at once, and store them. Afterwards, ith CNN can be (re)trained using the stored input/output wavefield snapshot pairs for several mini-iterations.
We want to indicate that neural networks, when augmented with inaccurate physics, e.g., a poor discretization of Laplacian, are able to approximate the wavefields obtained by an accurate approximation to wave equation. To demonstrate this, we conduct three numerical experiments in which we keep the velocity model fixed, and vary the source locations to generate different training/testing pairs. The experiments differ in the number of CCNs used throughout learned wave propagation. We use three, five, and ten CNNs while keeping the total number of iterations fixed. This implies that an experiment with more CNNs, optimizes each CNN with a smaller number of iterations per CNN, because, iterations per CNN×number of CNNs=total number of iterations.
A neural network augmented wave simulator with n1 CNNs needs more training iterations and possibly more training data to perform equally as well as a neural network augmented wave simulator with n2 CNNs, when n1>n2. For a fixed number of total iterations, iterations per CNN is inversely proportional to number of CNNs utilized. Therefore, the first n2 CNNs in the neural network augmented wave simulator with n1 CNNs will perform worse than the CNNs in the wave propagator with n2 CNNs. Consequently, the error accumulated by the poor performance of first n2 CNNs, combined with artifacts introduced by low-fidelity timestepping makes the matters worse for the later CNNs in the more complex learned wave propagator. Therefore, in our experiments, since the total number of iterations is fixed, we expect to see the quality of dispersion removal degrade as the number of CNNs increase in a learned wave propagator. Table 1 summarizes the total number of iterations, iterations per CNN, training pairs, training time, and number of tunable parameters for the three different experiments.
CNNs | Iterations | Iterations per CNN | Pairs per CNN | Time | Param. count |
---|---|---|---|---|---|
3 | 100500 | 33500 | 201 | 17.99 hours | 34150272 |
5 | 100500 | 20100 | 201 | 19.79 hours | 56917120 |
10 | 100500 | 10050 | 201 | 49.24 hours | 113834240 |
As described earlier and presented in Table 1, the MaxItr
variable used in the While
condition in line 8 of Algorithm 1 is set to 500 for all our experiments. Figures 4 − 6 demonstrate the values of objective function presented in Equation 5 in orange, and the wavefield correction signal-to-noise ratio (SNR) in blue, evaluated on testing data pairs during training, for experiments with three, five, and ten CNNs, respectively. Note that the SNR curves have not been used to determine when to stop training and they are only depicted for demonstration purposes.
Figures 4a, 4c, and 4e show the wavefield correction SNR for first, second, and third CNN, respectively, in the neural network augmented wave simulator that includes three CNNs. Similarly, Figures 4b, 4d, and 4f depict the training objective values throughout training for first, second, and third CNNs. As it can be seen from objective function curves, the raining heuristic has been effective and the objective function values have a decreasing trend. Note that CNNs has been trained for 33500 iterations, on average, with a total of 100500 iterations. Several equispaced spikes can be noticed on the objective function value curves. For instance, see the objective value function curve of the third CNN, in Figure 4f, at 6030,8040,10050,and 12060 iterations. The mini-batch we use in this experiment is 10. Those spikes occur in moments in training when we have started retraining the third CNN, after updating the first and second CNNs. As discussed before, a change in the parameters of the CNNs preceding a CNN causes changes in the input of the later CNN, and consequently the objective function becomes large when starting to retrain the CNNs in later stages again.
Similar objective function value and SNR curves for other two neural network augmented wave propagators, utilizing five and ten CNNs, can be found in Figures 5 and 6, respectively. First column in Figures 5 and 6, from top to bottom, indicate SNR of wavefield correction obtained by the first to the last CNN, evaluated on testing pairs while training, respectively. The second column of Figures 5 and 6 indicate the objective function value curves throughout optimization of Equation 5 for training neural network augmented wave propagators, utilizing five and ten CNNs, respectively. In both columns, from top to bottom, the objective function value curves correspond to CNNs from beginning to the end of the learned wave propagators, in order.
We make two main observations from Figures 5 and 6. First, the objective function values indicate overall decreases, validating effectiveness of the introduced heuristic. Also, the spikes on the objective function value curve can be seen, which are correlated with the stages in training when Algorithm 1 revisits a CNN after updating the rest of the CNN parameters. As explained before, spikes are caused by change in parameters of preceding CNNs to a CNN, which in turn alters the input training wavefields of the CNN. Second, due to decrease in iterations per CNN as the number of CNNs increases, the SNR curves converge to a lower value when the number of CNNs increase.
Next, we will demonstrate the the corrected wavefields in three conducted experiments evaluated over one testing shot location. For each experiment, we show the high-fidelity wavefield snapshots, uτi, i=0,1,…,⌊M−1k⌋, where i iterates over the CNNs, numerically dispersed low-fidelity wavefields, and the corrected wavefield snapshots by the CNNs. To evaluate the performance of each correction, we also depict the correction error—i.e., difference between the high-fidelity and corrected wavefield snapshots. Figure 7 shows the mentioned wavefield snapshots for the neural network augmented wave simulator with three CNNs. First column shows the high-fidelity wavefields by solving Equation 1, second column depicts low-fidelity simulations by solving Equation 2, third column indicates the result of neural network augmented wavefield simulations, and the fourth column is the learned wave simulation error—i.e., difference between the first and last column in Figure 7. Similarly, Figures 8 − 10 show the high- and low- fidelity and learned wave simulation wavefield snapshots in the first three columns, in order, for the neural network augmented wave simulator with five and ten CNNs, respectively.
As expected because of the reasons stated before, we observe that the quality of neural network augmented wave-equation simulation degrades as the number of CNNs increases. On the other hands, the high quality of learned wave simulation with few CNNs (see Figure 7) suggests the quality of the simulation with more CNNs might be improved by increasing the number of iterations. As it can be seen in the last column of Figures 9 and 10, the learned wave simulation with ten CNNs has the lowest quality. It can be seen that the learned wave simulation has the least accuracy in direct wave, which happen to be the events with largest amplitudes. Also, it appears that most of the numerical dispersion has been removed, the phase has been recovered, and residual is mostly amplitude differences.
In order to evaluate the effectiveness of the proposed method we also train a single CNN similar to our previous attempt to remove numerical dispersion from wavefield snapshots (Siahkoohi et al., 2018, 2019) and compare the result of numerical dispersion removal with the proposed method. To be more precise, for each presented neural network augmented wave-equation simulation experiment, where we use three, five, and ten CNNs, we train a single CNN, Gθ, with the same architecture as the architecture used in learned wave propagators, in order to remove numerical dispersion from all the low-fidelity wavefield snapshots simulated by solving Equation 2 for j≡k−1 (mod k), on training shot locations. Likewise to previous examples, here we also use half of the available shot locations to simulate training pairs, and the rest is used to evaluate the performance of the trained CNN. The input to Gθ during training can be written as follows (compare with Equation 4):
˜ui=ˉFk(˜ui−1),i=1,2,…,⌊M−1k⌋, ˜u0=ˉFk(q),
# Timesteps to correct | Iterations | Pairs per CNN | Time | Param. count |
---|---|---|---|---|
3 | 100500 | 603 | 13.85 hours | 11383424 |
5 | 100500 | 1005 | 13.96 hours | 11383424 |
10 | 100500 | 2010 | 14.56 hours | 11383424 |
The slight difference in runtime among three different cases provided in Table 2 is partly due to different number of training pairs needed to be generated. Figure 11 depicts the wavefield snapshot correction SNR curves, evaluated on testing pairs while training, and the value of objective function 7, in single CNN low-to-high-fidelity mapping experiment, as a function of number of iterations. Figures 11a, 11c, and 11e show the SNR curves, when we trained the CNN on wavefield snapshots correspond to three, five, and ten, timesteps, respectively. Similarly, Figures 11b, 11d, and 11f depict the training objective function value (Equation 7), when the CNN is trained on wavefield snapshots correspond to three, five, and ten, timesteps, respectively. SNR curves depicted in first column of Figure 11 show the evolution of wavefield correction SNR evaluated on randomly selected testing wavefield wavefields form the wavefield snapshots combined from different timesteps. Therefore, Figures 11a, 11c, and 11e indicate that the three different CNNs converge to a wavefield correction SNR around 20 dB, regardless of number of timesteps they are correcting for. Although this does not suggest that the performance will stay the same as we increase the number of timesteps needed to be corrected. By comparing Figure 11a with first column of Figure 4 (SNR curves for neural network augmented wave-equation simulation with three CNNs), we observe that, on average the two methods are performing equally well.
Finally, we will show the wavefield corrected by the single CNN low-to-high-fidelity mapping method, for comparison with our proposed method. Figures 12 and 13 indicate the corrected wavefields, for cases where three and five timesteps need to be corrected. Figures 14 and 15 demonstrate the corrected wavefields, for the case where ten timesteps need to be corrected, in two parts. In Figures 12 − 15, the first, second, third, and fourth columns depict the high-fidelity and low fidelity wavefield snapshots, corrected low-fidelity wavefield snapshots, and the error in numerical dispersion removal, respectively. In each columns, from top to bottom, the simulation time increases.
For comparison between our proposed method and single CNN low-to-high-fidelity mapping, compare Figures 7 with 12, 8 with 13, 9 with 14, 10 with 15. As it can be seen, the single CNN low-to-high-fidelity mapping method maintains the quality of its performance when the number of timesteps that need to be corrected increases. On the other hand, as the number of CNNs in neural network augmented wave-equation simulation increases, the performance drops, by keeping the maximum number of iterations fixed. Also, by comparing Tables 1 and 2, we observe that the training time needed for single CNN low-to-high-fidelity mapping, when number of timesteps needed to be corrected increases, for fixed number of maximum iterations, grows very slowly compared to the training time required for neural network augmented wave-equation simulation.
Our numerical experiments demonstrate that, given suitable training data, the well-trained neural network augmented wave-equation simulator is capable of approximating wavefield snapshots simulated by high-fidelity simulation. In this work, as a proxy of inaccurate physics, we simulate wave-equation with finite-difference method, using a poor discretization of Laplacian. Although not computationally favorable to high-fidelity wave simulation, we showed that the learned wave simulator deals with inaccurate physics. An important observation we made is that training time of the proposed method gets quickly very long and to achieve high accuracy, it may not be possible to utilize too many CNNs. On the other hand, training time required for the single CNN low-to-high-fidelity mapping experiments, conducted for the sake of comparison, grows very slowly as the number of timesteps to be corrected increases. In future, we intend to initialize the CNN parameters in the proposed method with the parameters of a CNN trained by the single CNN low-to-high-fidelity mapping algorithm. The initialization may significantly reduce the training time needed for the neural network augmented wave-equation simulation method, and may give the chance to fine-tune the CNNs to the specific timestep that each CNN is assigned to correct.
The authors thank Xiaowei Hu for his open-access repository5 on GitHub. Our software implementation built on this work.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y., 2014, Generative Adversarial Nets: Advances in Neural Information Processing Systems, 2672–2680.
He, K., Zhang, X., Ren, S., and Sun, J., 2016, Deep Residual Learning for Image Recognition: In The iEEE conference on computer vision and pattern recognition (cVPR) (pp. 770–778). doi:10.1109/CVPR.2016.90
Hu, T., Han, Z., Shrivastava, A., and Zwicker, M., 2019, Render4Completion: Synthesizing multi-view depth maps for 3D shape completion: ArXiv Preprint ArXiv:1904.08366.
Johnson, J., Alahi, A., and Fei-Fei, L., 2016, Perceptual Losses for Real-Time Style Transfer and Super-Resolution: In Computer vision – european conference on computer vision (eCCV) 2016 (pp. 694–711). Springer International Publishing. doi:10.1007/978-3-319-46475-6_43
Kingma, D., and Ba, J., 2015, Adam: A method for stochastic optimization: In International conference on learning representations.
Louboutin, M., Lange, M., Luporini, F., Kukreja, N., Witte, P. A., Herrmann, F. J., … Gorman, G. J., 2018, Devito: An embedded domain-specific language for finite differences and geophysical exploration: CoRR, abs/1808.01995. Retrieved from https://arxiv.org/abs/1808.01995
Luporini, F., Lange, M., Louboutin, M., Kukreja, N., Hückelheim, J., Yount, C., … Herrmann, F. J., 2018, Architecture and performance of devito, a system for automated stencil computation: CoRR, abs/1807.03032. Retrieved from http://arxiv.org/abs/1807.03032
Moseley, B., Markham, A., and Nissen-Meyer, T., 2018, Fast approximate simulation of seismic waves with deep learning: ArXiv Preprint ArXiv:1807.06873.
Raissi, M., 2018, Deep hidden physics models: Deep learning of nonlinear partial differential equations: The Journal of Machine Learning Research, 19, 932–955.
Rizzuti, G., Siahkoohi, A., and Herrmann, F. J., 2019, Learned iterative solvers for the helmholtz equation: EAGE annual conference proceedings. doi:10.3997/2214-4609.201901542
Ruthotto, L., and Haber, E., 2018, Deep neural networks motivated by partial differential equations: CoRR, abs/1804.04272. Retrieved from http://arxiv.org/abs/1804.04272
Siahkoohi, A., Louboutin, M., and Herrmann, F. J., 2019, The importance of transfer learning in seismic modeling and imaging:.
Siahkoohi, A., Louboutin, M., Kumar, R., and Herrmann, F. J., 2018, Deep-convolutional neural networks in prestack seismic: Two exploratory examples: SEG Technical Program Expanded Abstracts 2018, 2196–2200. doi:10.1190/segam2018-2998599.1
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A., 2017, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.: In Proceedings of the thirty-first association for the advancement of artificial intelligence conference on artificial intelligence (aAAI-17) (Vol. 4, pp. 4278–4284). Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H., 2014, How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328).