Kompetenznetzwerk für wissenschaftliches Höchstleistungsrechnen in Bayern


Efficient checkpointing technique for the unsteady adjoint solver in SU2


Prof. Dr.-Ing. habil Stefan Becker
Institut für Prozessmaschinen und Anlagetechnik
Friedrich-Alexander-Universität Erlangen-Nürnberg


SU2 (Stanford University Unstructured) is a suite of open-source software tools written in C++ for the numerical solution of partial differential equations and performing PDE constrained optimizations. Because of the flexible implementation of the continuous and discrete adjoint solvers, SU2 has become the pioneering software for shape optimization in the scientific community. The major advantage of SU2 over similar CFD packages is the flexible and efficient implementation of its discrete adjoint solver for gradient computations. This solver utilizes an algorithmic differentiation (AD) library to compute the exact gradient of the underlying discretized objective functions. The use of the AD tool circumvents all inaccuracies stemming from unphysical simplifications such as the frozen turbulence assumption as well as the forward errors due to the ill-conditioned Jacobian. As a result, the shape optimization problem can fully converge to its local minimum owing to the accurately calculated gradients. We have been extensively using the steady discrete adjoint shape optimizer for our research projects. However, the current implementation of SU2 is not suitable for performing complex unsteady flow optimizations on HPC clusters. Therefore, some modifications are necessary to adapt SU2 for such unsteady problems.

In each optimization cycle, first the primal equations (e.g. the governing flow equations) are solved and subsequently, the reverse mode of the AD tool is applied to the discretized primal equation to compute the right-hand side of the ensuing adjoint iterative solver. As the adjoint calculation requires the storage of the full flow history, the gradient evaluation for unsteady problems can become prohibitively expensive. For each time-step of the primal solution, a corresponding adjoint calculation is required while the adjoint iterations are done in reverse. This means that the first time-step of the adjoint solver is associated with the last time-step of the primal problem. Currently, SU2 stores a native restart file on the disk for each time-step of the primal solution in every optimization steps and later reads them from the last time-step to the first one to compute the adjoint solutions. This causes an enormous I/O overhead for our unsteady optimizations. In addition, a separate restart file for each adjoint step is written to the disk as well. The mesh deformation data and the solution of the adjoint and primal problems for each time-step are also stored in separate ASCII files. This means that SU2 stores five files per time-step per optimization cycle onto the disk. As a result, for time-dependent problems such as an aeroacoustic optimization, in order to get a signal of 0.1 seconds at the farfield with a timestep of 10 microseconds, an overall 50000 files per optimization cycle are written to the disk. Additionally, a typical optimization problem needs between 100 to 200 optimization steps in order to reach a local minimum. Consequently, this results in five to ten million files for the optimization of a one-dimensional objective function. The immense number of restart files together with the ASCII format of the outputs are problematic for doing large scale optimizations on HPC clusters with parallel file systems. Moreover, the I/O overhead in writing and reading the restart files is unnecessary. Therefore, the objective of this work is to enhance the I/O data management of SU2 for such optimization problems.

An effective way to mitigate the aforementioned problems is to implement a checkpointing method to handle the restart files. This approach significantly decreases the number of restart files written to the disk at the expense of additional primal evaluations and extra memory usage. Checkpoints are snapshots of the primal solution at chosen time-steps. These snapshots will be written to the disk during the primal computations while the intermediate time-steps between two consecutive checkpoints are recomputed and kept in the memory during the adjoint calculations. Moreover, the output data for both adjoint and primal solutions should be written to a single HDF5 file. By virtue of these modifications, the data storage overheads can be substantially reduced so that large scale unsteady optimizations become viable.