Massively parallel inverse modelling on GPUs with Enzyme

07/28/2023, 3:30 PM — 4:00 PM UTC
26-100

Abstract:

We present an efficient and scalable approach to inverse PDE-based modelling with the adjoint method. We use automatic differentiation (AD) with Enzyme to automaticaly generate the buidling blocks for the inverse solver. We utilize the efficient pseudo-transient iterative method to achieve performance that is close to the hardware limit for both forward and adjont problems. We demonstrate close to optimal parallel efficiency on GPUs in series of benchmarks.

Description:

Massively parallel hardware architectures such as graphics processing units (GPUs) make it possible to run numerical simulations at unprecedented resolution. Matching the results of simulations to the available observational and experimental data requires estimating the sensitivity of the model to changes in its parameters. The adjoint method for computing sensitivities gains more attention in the scientific and engineering communities. This method allows for computing the sensitivities in the entire parameter space using the results of only one forward solve, in contrast to the direct method that would require runninng the forward simulation for each parameter separately.

Recent breakthrough innovations in compiler technologies resulted in wide adoption of the paradigm of differentiable programming, in which the entire computer program could be differentiated via automatic differentiation (AD). Differentiable programming allows for automated generation of adjoints from the forward model. In this work, we demonstrate the applications of the adjoint method to inverse modelling in geosciences, and share our experiences using the Julia ecosystem for high-performance computing. We developed massively parallel 3D forward and inverse solvers with full GPU support. We used Enzyme.jl for the AD implementation, ParallelStencil.jl to generate efficient computational kernels for multiple backends including GPUs, and ImplicitGlobalGrid.jl for distributed parallelism.

Co-authors: Ludovic Räss¹ ², Samuel Omlin ³

¹ ETH Zurich | ² Swiss Federal Institute for Forest, Snow and Landscape Research (WSL) | ³ Swiss National Supercomputing Centre (CSCS )

Platinum sponsors

JuliaHub

Gold sponsors

ASML

Silver sponsors

Pumas AIQuEra Computing Inc.Relational AIJeffrey Sarnoff

Bronze sponsors

Jolin.ioBeacon Biosignals

Academic partners

NAWA

Local partners

Postmates

Fiscal Sponsor

NumFOCUS