CUDA-accelerated Grimme’s D3#

Overview#

We support a GPU-accelerated implementation of Grimme’s D3 dispersion (van der Waals) correction using CUDA, which can be used with ASE and LAMMPS in conjunction with SevenNet. This follows the implementation of Grimme’s D3 method. We have ported the code from the original fortran code. While D3 method is significantly faster than DFT, existing CPU implementations were slower than SevenNet. To address this, we have adopted CUDA and single precision (FP32) operations to accelerate the code.

Caution

Currently, the D3 implementation does not support mulit-GPU or multi-core parallelism.

Caution

The implementation requires a GPU with a compute capability of at least 6.0. The target compute capability follows the setting of LibTorch, except for version 5.0.

Usage#

ASE#

The SevenNet package provides ASE calculators: D3Calculator() and SevenNetD3Calculator(). See ASE calculator for their usages.

Install and Usage of GPU-D3 in LAMMPS#

To use the LAMMPS d3 pair style, you need to patch the CMakeList.txt and the pair-style source code in the LAMMPS source. Detailed instructions for this procedure and its usage can be found in LAMMPS: PyTorch or LAMMPS: ML-IAP.

Input parameters for D3 dispersion#

This section explains the required input parameters for the d3 pair style itself. Note that, in practice, the D3 dispersion term is almost always used together with other classical or machine-learning interatomic potentials via the pair/hybrid command.

The d3 pair style uses the following syntax:

pair_style d3  {sq_r_cut} {sq_r_cn_cut} {type_of_damping} {name_of_functional}
pair_coeff * * {space_separated_chemical_species}

Cutoff radii#

sq_r_cut and sq_r_cn_cut are square of cutoff radii for energy/force calculation and coordination number, respectively. Units are squared Bohr, where 1 Bohr = 0.52917721 Å. Default values are 9000 and 1600 (Bohr²), which correspond to 50.2022 21.1671, respectively. This is also the default values used in VASP.[1]

Damping type#

Available type_of_damping are as follows:

damp_zero: Zero damping
damp_bj: Becke-Johnson (BJ) damping
damp_zerom: Modified version of zero damping
damp_bjm: Modified version of BJ damping

Available XC functionals#

Available name_of_functional options include all functionals as in the original Fortran code. SevenNet-0 is trained on the ‘PBE’ functional, so you should specify ‘pbe’ in the script when using it. For other supporting functionals, check ‘List of parametrized functionals’ in here. Also, we are actively collecting and updating the D3 parameters for other functionals, such as r2scan and wb97m.

Cautions#

Selective(or no) periodic boundary condition: implemented, But only PBC/noPBC can be checked through original FORTRAN code; selective PBC cannot
3-body term, n > 8 term: not implemented (as to VASP)
It can be slower than the CPU with a small number of atoms.
The maximum number of atoms that can be calculated is 46,340 (overflow issue).
There can be occurred small amounts of numerical error
- The introduction of some FP32 operations can lead to minor numerical errors, particularly in pressure calculations, but these are generally smaller than those seen with SevenNet.
- If the error is too large, ensure that the fmad=false option in patch_lammps.sh is correctly applied during build.

Contributors#

Hyungmin An: Ported the original Fortran D3 code to C++ with OpenMP and MPI.
Gijin Kim: Accelerated the C++ D3 code with OpenACC[2] and CUDA, and currently maintains it.