This document lists and provides the description of the name (keywords) of parallelisation input variables to be used in the main input file of the abinit code.
The new user is advised to read first the new user's guide, before reading the present file. It will be easier to discover the present file with the help of the tutorial.
When the user is sufficiently familiarized with ABINIT, the reading of the ~abinit/doc/users/tuning file might be useful. For response-function calculations using abinit, please read the response function help file
This input variable is used only when running ABINIT in parallel and for Ground-State calculations.
It controls the automatic determination of parameters related to parallel work distribution (if not imposed in input file).
Given a total number of processors, ABINIT can find a suitable distribution that fill (when possible)
all the differents levels of parallelization. ABINIT can also determine optimal parameters for
the use of parallel Linear Algebra routines (using Scalapack or Cuda, at present, v7.2).
The different values for autoparal are:
Control the size of the block in the LOBPCG algorithm.
This keyword works only with paral_kgb=1 and has to be either 1 or a multiple of 2.
-- With npband=1:
Only relevant if use_gpu_cuda=1, that is, if ABINIT is used with CUDA functionality.
Use of linear algebra and matrix algebra on GPU is only efficient if the size of the involved matrices is large enough.
The gpu_linalg_limit parameter defines the threshold above which linear (and matrix) algebra operations
are done on the Graphics Processing Unit.
The considered matrix size is equal to:
Only relevant if optdriver=3 or 4, that is, screening or sigma calculations.
gwpara is used to choose between the two different parallelization levels available in the GW code. The available options are:
Additional notes:
In the present status of the code, only the parallelization over bands (gwpara=2)
allows to reduce the memory allocated by each processor.
Using gwpara=1, indeed, requires the same amount of memory as a sequential run,
irrespectively of the number of CPU's used.
A reduction of the requireed memory can be achieved by opting for an out-of-core solution (mkmem=0, only coded for optdriver=3) at the price of a drastic worsening of the performance.
This input variable is used only when running abinit in parallel. If localrdwf=1, the input wavefunction disk file or the KSS/SCR file in case of GW calculations, is read locally by each processor, while if localrdwf=0, only one processor reads it, and broadcast the data to the other processors.
The option localrdwf=0 is NOT allowed when parallel I/O are activated (MPI-IO access), i.e. when accesswff==1.
The option localrdwf=0 is NOT allowed when mkmem==0 (or, for RF, when mkqmem==0, or mk1mem==0), that is, when the wavefunctions are stored on disk. This is still to be coded ...
In the case of a parallel computer with a unique file system, both options are as convenient for the user. However, if the I/O are slow compared to communications between processors, (e.g. for CRAY T3E machines), localrdwf=0 should be much more efficient; if you really need temporary disk storage, switch to localrdwf=1 ).
In the case of a cluster of nodes, with a different file system for each machine, the input wavefunction file must be available on all nodes if localrdwf=1, while it is needed only for the master node if localrdwf=0.
Relevant only for the band/FFT parallelisation
(see the paral_kgb input variable).
npband gives the number of processors among which the work load over the band level is shared.
npband, npfft,
npkpt and npspinor
are combined to give the total number
of processors (nproc) working on the band/FFT/k-point parallelisation.
See npfft, npkpt,
npspinor and
paral_kgb for the additional information on the use of
band/FFT/k-point parallelisation.
Note : at present, npband has to be a divisor or equal to nband
Relevant only for the band/FFT/k-point parallelisation
(see the paral_kgb input variable).
npfft gives the number of processors among
which the work load over the FFT level is shared.
npfft, npkpt,
npband and npspinor
are combined to give the total number
of processors (nproc) working on the band/FFT/k-point parallelisation.
See npband, npkpt,
npspinor, and
paral_kgb for the additional information on the use of
band/FFT/k-point parallelisation.
Note : ngfft is automatically adjusted to npfft. If the number of processor is changed from a calculation to another one, npfft may change, and then ngfft also.
Relevant only
when sets of images are activated (see imgmov
and nimage.
npimage gives the number of processors among
which the work load over the image level is shared. It is compatible with all other parallelization
levels available for ground-state calculations.
Note on the npimage default value: this default value is crude.
It is set to the number of dynamic images (ndynimage)
if the number of available processors allows this choice.
If ntimimage=1, npimage is set to
min(nproc,nimage).
See paral_kgb,
npkpt,
npband,
npfft
and npspinor
for the additional information on the use of k-point/band/FFT parallelisation.
Relevant only for the band/FFT/k-point parallelisation
(see the paral_kgb input variable).
npkpt gives the number of processors among
which the work load over the k-point/spin-component level is shared.
npkpt, npfft, npband and
npspinor are combined to give the total number
of processors (nproc) working on the band/FFT/k-point parallelisation.
See npband, npfft,
npspinor and
paral_kgb for the additional information on the use of
band/FFT/k-point parallelisation.
Note : npkpt should be a divisor or equal to with the number of k-point/spin-components
(nkpt*nsppol)
in order to have the better load-balancing and efficiency.
This parameter is used in connection to the parallelization over perturbations(see paral_rf ), for a linear response calculation.
nppert gives the number of processors among which the work load over the perturbation level is shared.
Can be 1 or 2 (if nspinor=2).
Relevant only for the band/FFT/k-point parallelisation
(see the paral_kgb input variable).
npspinor gives the number of processors among
which the work load over the spinorial components of wave-functions is shared.
npspinor, npfft,
npband and npkpt
are combined to give the total number of processors (nproc)
working on the band/FFT/k-point parallelisation.
See npkpt, npband,
npfft, and
paral_kgb for the additional information on the use of
band/FFT/k-point parallelisation.
Only relevant (for Ground-State calculations) when paral_kgb=1
and LOBPCG algorithm is used.
When using Scalapack (or any similar Matrix Algebra library), it is well known that
the efficiency of the eigenproblem resolution saturates as the number of CPU cores
increases. It is better to use a smaller number of CPU cores for the LINALG calls.
This maximum number of cores can be set with np_slk.
A large number for np_slk (i.e. 1000000) means that all cores are used for the
Linear Algebra calls.
Relevant only for PAW calculations.
This keyword controls the parallel distribution of memory over atomic sites. Calculations are
also distributed using the "kpt-band" communicator.
Warning: use of paral_atom is highly experimental.
Only compatible (for the moment) with ground-state calculations.
If paral_kgb is not explicitely put in the input file, ABINIT automatically detects if the job has been sent in sequential or in parallel. In this last case, it detects the number of processors on which the job has been sent and calculates values of npkpt, npfft, npband, bandpp ,npimage and npspinor that are compatible with the number of processors. It then set paral_kgb to 0 or 1 (see hereunder)and launches the job.
If paral_kgb=0, the parallelization over k-points only is activated. In this case, npkpt, npfft and npband are ignored. Require compilation option --enable-mpi="yes".If paral_kgb=1, the parallelization over bands, FFTs, and k-point/spin-components is activated (see npkpt, npfft and npband). With this parallelization, the work load is split over three levels of parallelization. The different communications almost occur along one dimension only. Require compilation option --enable-mpi="yes".
HOWTO fix the number of processors along one level of parallelisation:For additional information, download F. Bottin's presentation at the ABINIT workshop 2007
Suggested acknowledgments :
F. Bottin, S. Leroux, A. Knyazev and G. Zerah,
Large scale ab initio calculations based on three levels of parallelization,
Comput. Mat. Science 42, 329 (2008),
available on arXiv, http://arxiv.org/abs/0707.3405 .
If the total number of processors used is compatible with the three levels of parallelization, the values for npband, npfft, npband and bandpp will be filled automatically, although the repartition may not be optimal. To optimize the repartition use:
If paral_kgb=-n , ABINIT will test automatically if all the processor numbers between 2 and n are convenient for a parallel calculation and print the possible values in the log file. A weight is attributed to each possible processors repartition. It is adviced to select a processor repartition for which the weight is high (as closed to the number of processors as possible). The code will then stop after the printing. This test can be done as well with a sequential as with a parallel version of the code. The user can then choose the adequate number of processor on which he can run his job. He must put again paral_kgb=1 in the input file and put the corresponding values for npband, npfft, npband and bandpp in the input file.
This parameter activates the parallelization over perturbations which can be used during RF-Calculation. It is possible to use this type of parallelization in combination to the parallelization over k-points.
Currently total energies calculated by groups, where the master process is not in, are saved in .status_LOGxxxx files.
Only available if ABINIT executable has been compiled with cuda nvcc compiler.
This parameter activates the use of NVidia graphic accelerators (GPU) if present.
If use_gp_cuda=1, some parts of the computation are transmitted to the GPUs.
If use_gp_cuda=1, no computation is done on GPUs, even if present.
Note that, while running ABINIT on GPUs, it is recommended to use MAGMA external library
(i.e. Lapack on GPUs). The latter is activated during compilation stage (see "configure"
step of ABINIT compilation process). If MAGMA is not used, ABINIT performances on GPUs
can be poor.
If set to 1, enable the use of ScaLapack within LOBPCG.