TABLE OF CONTENTS


ABINIT/m_gwr [ Modules ]

[ Top ] [ Modules ]

NAME

  m_gwr

FUNCTION

  Objects and procedures implementing the GW method in real-space and imaginary time.

NOTES

  Memory and workload are distributed using a 4D cartesian grid: (g/r, tau, k-points, spin).

  Inside the g/r communicator, we use PBLAS matrices to store G, tchi and W
  using a 1D processor grid and block distribution along columns.
  A 2D grid, indeed, would require MPI-FFT or some communication before performing the FFTs along columns.

  Let's assume for simplicity that we have only two MPI procs in the g/r communicator.
  Matrices in (g,g') space are distributed along columns so that the g-index is local
  and we can use sequential zero-padded FFTs to transform from g to r in the unit cell:

                      g'-axis
               |--------------------
               |         |         |
      g-axis   |   P0    |   P1    |
               |         |         |
               |--------------------

  The results of the FFT transform along g are stored in another PBLAS matrix with the same layout:

                      g'-axis
               |--------------------
               |         |         |
      r-axis   |   P0    |   P1    |
               |         |         |
               |--------------------

  At this point, we call ptrans to MPI transpose the (r, g') matrix, and we end up with:

                      r-axis
               |--------------------
               |         |         |
      g'-axis  |   P0    |   P1    |
               |         |         |
               |--------------------

  Differences with respect to the quartic GW code formulated in frequency-domain (real axis)

   - in GWR, the k-mesh must be Gamma-centered.
   - All the two-point functions are defined on k/q-centered g-spheres while GW uses a single Gamma-centered sphere.
   - The frequency/tau meshes are automatically defined by ntau and the KS spectrum (minimax meshes)

  Technical problems:

    - it's not clear to me that one can use vc(Sq, SG) when a cutoff is used as the cutoff breaks
      the spherical symmetry of vc(r). Besides, when symmetries are used to reconstruct the term for q in the BZ,
      one might have to take into account umklapps. Use cache?

    - Treatment of the anisotropic behaviour of Wc. This part is badly coded in GW, in the sense that
      we use a finite small q when computing Wc for q --> 0. This breaks the symmetry of the system
      and QP degeneracies. The equations needed to express the angular dependency of W(q) for q --> 0
      are well known but one has to pass through the Adler-Wiser expression.
      Possible solution: Compute heads and wings using a WFK_fine wavefunction file with dense k-mesh and less bands.
      The dipole matrix elements are computed with the DFPT routines, still we need to
      recode a lot of stuff that is already done in cchi0q0, especially symmetries.
      Note, however, that tchi is Hermitian along the imaginary axis, expect for omega = 0 in metals
      but I don't think the minmax grids contain omega = 0.

   - In principle, it's possible to compute QP correction along a k-path if a new WFK file is provided.
     The correlated part is evaluated in real-space in the super-cell.
     For Sigma_x, we need a specialized routine that can handle arbitrary q, especially at the level of v(q, G)
     but I don't know if this approach will give smooth bands
     as we don't have q --> 0 when k does not belong to the k-mesh.

   - New routine to compute oscillator matrix elements with NC/PAW and PBLAS matrices.
     It can be used to compute tchi head/wings as well as Sigma_x + interface with coupled-cluster codes.

   - Decide whether we should use VASP conventions for G and the analytic continuation or the "standard" ones by Godby.
     The standard ones are consistent with Hedin's notations and correspond to the ones used in the legacy GW code.
     On the other hand, VASP notations make life easier if one has to implement PAW as all the equations
     have been already derived.

   - Address nspinor = 2 and PBLAS distribution as MPI proc can have both spinors in memory
     In other words, we should store the first/last index in gvec for each spinor

   - Optimization for Gamma-only. Memory and c -> r FFTs

   - Need to extend FFT API to avoid scaling if isign = -1. Also fft_ug and fft_ur should accept isign
     optional argument. Refactoring of all the FFT routines used in the GW code is needed
     in order to exploit R2C, C2R (e.g. chi0(q=0) and GPU version.

   - Use round-robin distribution instead of blocked-distribution to improve load balance?

   - Memory peaks:

       (env3.9) [magianto@uan01 /scratch/project_465000061/magianto/DDIAGO_ZnO]
       $~/git_repos/abinit/tests/Scripts/abimem.py peaks abimem_rank0.mocc
       [0] <var=gt_scbox, A@m_gwr.F90:3395, addr=0x14aa53673010, size_mb=379.688>
       [1] <var=xsum, A@xmpi_sum.finc:2551, addr=0x14aa2fce9010, size_mb=379.688>
       [2] <var=gt_scbox, A@m_gwr.F90:4338, addr=0x14aa4f64f010, size_mb=379.688>
       [3] <var=allcg_k, A@m_wfd.F90:4631, addr=0x14aa56b57010, size_mb=217.865>
       [5] <var=wct_scbox, A@m_gwr.F90:4339, addr=0x14aa43876010, size_mb=189.844>
       [6] <var=xsum, A@xmpi_sum.finc:2476, addr=0x14aa31bb0010, size_mb=189.844>
       [7] <var=cg_k, A@m_wfd.F90:4623, addr=0x14aa64535010, size_mb=108.932>

TODO

  - Remove cryst%timrev, use kptopt and qptopt
  - Sig_c breaks QP degeneracies due to fixed q0.

 NOTES:

  1) _slk_mat_t is a CPP macro defined in abi_common.h that allows us to use PBLAS in single/double precision
     Be careful when using c_f_pointer because there's no type checking.

COPYRIGHT

 Copyright (C) 1999-2024 ABINIT group (MG)
 This file is distributed under the terms of the
 GNU General Public License, see ~abinit/COPYING
 or http://www.gnu.org/copyleft/gpl.txt .

SOURCE

120 #if defined HAVE_CONFIG_H
121 #include "config.h"
122 #endif
123 
124 #include "abi_common.h"
125 
126 module m_gwr
127 
128  use defs_basis
129  use m_abicore
130  use m_errors
131  !use mpi
132  use m_xmpi
133  use m_xomp
134  use m_hdr
135  use m_ebands
136  use netcdf
137  use m_nctk
138  use m_dtfil
139  use m_yaml
140  use m_sigtk
141  use, intrinsic :: iso_c_binding
142  use m_hide_blas
143 
144  use defs_datatypes,  only : pseudopotential_type, ebands_t
145  use defs_abitypes,   only : mpi_type
146  use m_gwdefs,        only : GW_TOL_DOCC, GW_TOLQ0, GW_TOL_W0, GW_Q0_DEFAULT, cone_gw, czero_gw, j_gw, sigijtab_t, &
147                              sigijtab_free, g0g0w
148  use m_time,          only : cwtime, cwtime_report, sec2str, timab
149  use m_io_tools,      only : iomode_from_fname, get_unit, file_exists, open_file, write_units
150  use m_pstat,         only : pstat_t
151  use m_numeric_tools, only : blocked_loop, get_diag, isdiagmat, arth, print_arr, imin_loc, imax_loc, &
152                              c2r, linfit, bisect, hermitianize
153  use m_copy,          only : alloc_copy
154  use m_geometry,      only : normv, vdotw
155  use m_fstrings,      only : sjoin, itoa, strcat, ktoa, ltoa, ftoa, string_in, yesno
156  use m_sort,          only : sort_dp, sort_rvals
157  use m_krank,         only : krank_t, krank_new, krank_from_kptrlatt, get_ibz2bz, star_from_ibz_idx
158  use m_crystal,       only : crystal_t
159  use m_dtset,         only : dataset_type
160  use m_fftcore,       only : get_kg, sphereboundary, getng, print_ngfft, fftcore_set_mixprec, ngfft_seq
161  use m_cgtk,          only : cgtk_rotate
162  use m_mpinfo,        only : initmpi_seq, destroy_mpi_enreg
163  use m_distribfft,    only : init_distribfft_seq
164  use m_kg,            only : getcut
165  use m_fft,           only : fftbox_plan3_t, uplan_t, fft_ug, fft_ur, fourdp
166  use m_fft_mesh,      only : calc_ceikr, calc_ceigr, ctimes_eikr
167  use m_kpts,          only : kpts_ibz_from_kptrlatt, kpts_timrev_from_kptopt, kpts_map, kpts_map_print, kpts_pack_in_stars
168  use m_bz_mesh,       only : littlegroup_t, findqg0
169  use m_gsphere,       only : kg_map, gsphere_t
170  use m_melemts,       only : melements_t
171  use m_ioarr,         only : fftdatar_write, read_rhor
172  use m_slk,           only : matrix_scalapack, slkmat_sp_t, processor_scalapack, slk_array_free, slk_array_set, &
173                              slk_array_locmem_mb, block_dist_1d, slk_pgemm
174  use m_wfk,           only : wfk_read_ebands, wfk_t, wfk_open_read
175  use m_wfd,           only : wfd_init, wfd_t, wfdgw_t
176  use m_ddk,           only : ddkop_t, ddkop_new
177  use m_pawtab,        only : pawtab_type
178  use m_pawcprj,       only : pawcprj_type
179  use m_vcoul,         only : vcgen_t
180  use m_vkbr,          only : vkbr_t, vkbr_free, vkbr_init, nc_ihr_comm
181  use m_chi0tk,        only : chi0_bbp_mask, accumulate_head_wings_imagw, symmetrize_afm_chi0
182  use m_sigx,          only : sigx_symmetrize
183  use m_dyson_solver,  only : sigma_pade_t
184 !#ifdef __HAVE_GREENX
185  use minimax_grids,   only : gx_minimax_grid !, gx_get_error_message
186 !#endif
187 
188  implicit none
189 
190  private

m_gwr/box2gsph [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 box2gsph

FUNCTION

 Extract cg_k array defined on the k-centered g-sphere with npw vectors from the FFT box.

INPUTS

 ngfft:
   n1,n2,n3=physical dimension of the FFT box
   n4,n5,n6=memory dimension of cfft
 npw=number of G vectors in basis at this k point
 ndat=number of items to process
 kg_k(3,npw)=integer coordinates of G vectors in basis sphere
 cfft(n4,n5,n6, ndat) = array on FFT box

OUTPUT

 cg(npw*ndat)= contains values for npw G vectors in basis sphere

SOURCE

6803 subroutine box2gsph(ngfft, npw, ndat, kg_k, cfft, cg)
6804 
6805 !Arguments ------------------------------------
6806 !scalars
6807  integer,intent(in) :: ngfft(6), npw, ndat
6808 !arrays
6809  integer,intent(in) :: kg_k(3, npw)
6810  complex(gwpc),target,intent(in) :: cfft(ngfft(4)*ngfft(5)*ngfft(6)*ndat)
6811  complex(gwpc),intent(out) :: cg(npw*ndat)
6812 
6813 !Local variables-------------------------------
6814  integer :: n1, n2, n3, n4, n5, n6, i1, i2, i3, idat, ipw, icg
6815  complex(gwpc),contiguous,pointer :: cfft_ptr(:,:,:,:)
6816  !character(len=500) :: msg
6817 
6818 ! *************************************************************************
6819 
6820  n1 = ngfft(1); n2 = ngfft(2); n3 = ngfft(3)
6821  n4 = ngfft(4); n5 = ngfft(5); n6 = ngfft(6)
6822  call c_f_pointer(c_loc(cfft), cfft_ptr, shape=[n4, n5, n6, ndat])
6823 
6824  ! Extract cg from cfft, ignoring components outside range of cg sphere
6825  !$OMP PARALLEL DO PRIVATE(i1, i2, i3, icg) IF (ndat > 1)
6826  do idat=1,ndat
6827    do ipw=1,npw
6828      i1 = modulo(kg_k(1, ipw), n1) + 1
6829      i2 = modulo(kg_k(2, ipw), n2) + 1
6830      i3 = modulo(kg_k(3, ipw), n3) + 1
6831      !if (any(kg_k(:,ipw) > ngfft(1:3)/2) .or. any(kg_k(:,ipw) < -(ngfft(1:3)-1)/2) ) then
6832      !  write(msg,'(a,3(i0,1x),a)')" The G-vector: ",kg_k(:, ipw)," falls outside the FFT box. Increase boxcutmin (?)"
6833      !  ABI_ERROR(msg)
6834      !end if
6835      icg = ipw + (idat - 1) * npw
6836      cg(icg) = cfft_ptr(i1, i2, i3, idat)
6837    end do
6838  end do
6839 
6840 end subroutine box2gsph

m_gwr/check_scf_cyle [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  check_scf_cycle

FUNCTION

INPUTS

OUTPUT

SOURCE

6507 subroutine gwr_check_scf_cycle(gwr, converged)
6508 
6509 !Arguments ------------------------------------
6510  class(gwr_t),intent(in) :: gwr
6511  logical,intent(out) :: converged
6512 
6513 !Local variables-------------------------------
6514  integer,parameter :: master = 0
6515  integer :: spin, ikcalc, ik_ibz, band, ib, jb
6516  character(len=500) :: msg
6517  real(dp) :: max_adiff, adiff(gwr%qp_ebands%mband)
6518  integer :: units(2)
6519 
6520 ! *************************************************************************
6521 
6522  max_adiff = -one; converged = .True.; units = [std_out, ab_out]
6523 
6524  if (gwr%comm%me == master) then
6525    call wrtout(units, sjoin(" Checking for convergence at iteration:", itoa(gwr%scf_iteration)))
6526  end if
6527 
6528  associate (now => gwr%qp_ebands, prev => gwr%qp_ebands_prev)
6529  do spin=1,gwr%nsppol
6530    do ikcalc=1,gwr%nkcalc ! TODO: Should be spin dependent!
6531      ! Compute max abs difference between QP at iteration i and i-1.
6532      ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
6533      ib = gwr%bstart_ks(ikcalc, spin); jb = gwr%bstop_ks(ikcalc, spin)
6534      adiff = zero; adiff(ib:jb) = abs(now%eig(ib:jb, ik_ibz, spin) - prev%eig(ib:jb, ik_ibz, spin))
6535      band = maxloc(adiff, dim=1)
6536      max_adiff = max(max_adiff, adiff(band))
6537      if (adiff(band) > gwr%dtset%gwr_tolqpe) converged = .False.
6538      if (gwr%comm%me == master) then
6539        ! Write info
6540        write(msg, "(a,i0,1x,2a,i0)") " For k-point: ", ik_ibz, trim(ktoa(now%kptns(:,ik_ibz))),", spin: ", spin
6541        call wrtout(units, msg)
6542        write(msg, "(4x,a,es12.5,a,i0)")"max(abs(E_i - E_{i-1})): ", adiff(band) * Ha_meV, " (meV) for band: ", band
6543        call wrtout(units, msg)
6544      end if
6545    end do
6546  end do
6547  end associate
6548 
6549  ! Make sure that all MPI procs agree on this!
6550  call xmpi_land(converged, gwr%comm%value)
6551 
6552  if (gwr%comm%me == master) then
6553    write(msg, "(a,i0,a)") "QP gaps at iteration: ",gwr%scf_iteration," (Fermi energy set to zero)"
6554    call ebands_print_gaps(gwr%qp_ebands, std_out, header=msg)
6555    call ebands_print_gaps(gwr%qp_ebands, ab_out, header=msg)
6556    if (.not. converged) then
6557      call wrtout(units," Not converged --> start new iteration ...")
6558    !else
6559    !  call wrtout(units, sjoin(" Convergence achieved at iteration", itoa(gwr%scf_iteration)))
6560    end if
6561    ! TODO: Incremente scf_interation in GWR.nc
6562  end if
6563 
6564 end subroutine gwr_check_scf_cycle

m_gwr/desc_copy [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  desc_copy

FUNCTION

  Copy object
  NB: cannot use obj1 = obj2 syntax because ABINIT memory-leak detector
  won't see the allocation automatically performed by the compiler.

SOURCE

3764 subroutine desc_copy(in_desc, new_desc)
3765 
3766 !Arguments ------------------------------------
3767  class(desc_t),intent(in) :: in_desc
3768  class(desc_t),intent(out) :: new_desc
3769 
3770 ! *************************************************************************
3771 
3772  call new_desc%free()
3773 
3774  new_desc%istwfk = in_desc%istwfk
3775  new_desc%npw = in_desc%npw
3776  new_desc%ig0 = in_desc%ig0
3777  new_desc%kin_sorted = in_desc%kin_sorted
3778 
3779  call alloc_copy(in_desc%gvec, new_desc%gvec)
3780  call alloc_copy(in_desc%gbound, new_desc%gbound)
3781  if (allocated(in_desc%vc_sqrt)) call alloc_copy(in_desc%vc_sqrt, new_desc%vc_sqrt)
3782 
3783  if (allocated(in_desc%g2box)) then
3784    call alloc_copy(in_desc%g2box, new_desc%g2box)
3785    new_desc%cached_sc_ngfft = in_desc%cached_sc_ngfft
3786  end if
3787 
3788 end subroutine desc_copy

m_gwr/desc_free [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  desc_free

FUNCTION

  Free memory

SOURCE

3885 subroutine desc_free(desc)
3886 
3887 !Arguments ------------------------------------
3888  class(desc_t),intent(inout) :: desc
3889 ! *************************************************************************
3890 
3891  ABI_SFREE(desc%gvec)
3892  ABI_SFREE(desc%gbound)
3893  ABI_SFREE(desc%vc_sqrt)
3894  ABI_SFREE(desc%g2box)
3895  desc%cached_sc_ngfft = -1
3896 
3897 end subroutine desc_free

m_gwr/desc_get_vc_sqrt [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  desc_get_vc_sqrt

FUNCTION

  Compute square root of the Coulomb interaction vc(q,g).

SOURCE

3732 subroutine desc_get_vc_sqrt(desc, qpt, q_is_gamma, gwr, comm)
3733 
3734 !Arguments ------------------------------------
3735  class(desc_t),intent(inout) :: desc
3736  real(dp),intent(in) :: qpt(3)
3737  logical, intent(in) :: q_is_gamma
3738  class(gwr_t),intent(in) :: gwr
3739  integer,intent(in) :: comm
3740 
3741 ! *************************************************************************
3742 
3743  ABI_UNUSED([q_is_gamma])
3744  if (allocated(desc%vc_sqrt)) return
3745  ABI_MALLOC(desc%vc_sqrt, (desc%npw))
3746  call gwr%vcgen%get_vc_sqrt(qpt, desc%npw, desc%gvec, gwr%q0, gwr%cryst, desc%vc_sqrt, comm)
3747 
3748 end subroutine desc_get_vc_sqrt

m_gwr/desc_init [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  desc_init

FUNCTION

 Initialize the descriptor.

INPUTS

OUTPUT

SOURCE

3688 subroutine desc_init(desc, kk, istwfk, ecut, gwr, kin_sorted)
3689 
3690 !Arguments ------------------------------------
3691  class(desc_t),intent(inout) :: desc
3692  real(dp),intent(in) :: kk(3)
3693  integer,intent(in) :: istwfk
3694  real(dp),intent(in) :: ecut
3695  class(gwr_t),intent(in) :: gwr
3696  logical,optional,intent(in) :: kin_sorted
3697 
3698 !Local variables-------------------------------
3699  integer :: ig
3700 
3701 ! *************************************************************************
3702 
3703  desc%kin_sorted = .False.; if (present(kin_sorted)) desc%kin_sorted = kin_sorted
3704  desc%istwfk = istwfk
3705  call get_kg(kk, desc%istwfk, ecut, gwr%cryst%gmet, desc%npw, desc%gvec, kin_sorted=desc%kin_sorted)
3706 
3707  ABI_MALLOC(desc%gbound, (2 * gwr%g_mgfft + 8, 2))
3708  call sphereboundary(desc%gbound, desc%istwfk, desc%gvec, gwr%g_mgfft, desc%npw)
3709 
3710  ! Find the index of g = 0.
3711  desc%ig0 = -1
3712  do ig=1,desc%npw
3713    if (all(desc%gvec(:,ig) == 0)) then
3714      desc%ig0 = ig; exit
3715    end if
3716  end do
3717 
3718 end subroutine desc_init

m_gwr/desc_t [ Types ]

[ Top ] [ m_gwr ] [ Types ]

NAME

 desc_t

FUNCTION

  Parameters related to a two-point function such as
  gvectors, tables used for zero padded FFTs and matrix elements of the Coulomb interaction.

SOURCE

204  type,public :: desc_t
205 
206    integer :: istwfk = 1
207    ! Storage mode for this k/q point.
208 
209    integer :: npw = -1
210    ! Total number of plane-waves for this k/q-point.
211 
212    integer :: ig0 = -1
213    ! Index of g=0 in gvec.
214 
215    logical :: kin_sorted
216    ! True if gvec are sorted by |q+g|^2/2
217 
218    integer,allocatable :: gvec(:,:)
219    ! (3, npw)
220    ! G-vectors in reduced coordinates.
221    ! Note that this array is global i.e. it is not MPI-distributed inside the PBLAS communicator.
222 
223    integer,allocatable :: gbound(:,:)
224    ! (2*mgfft+8, 2)
225    ! sphere boundary info for zero-padded FFT
226 
227    integer,allocatable :: g2box(:)
228    ! (npw)
229    ! Index of gvec in the supercell FFT box.
230 
231    integer :: cached_sc_ngfft(6) = -1
232 
233    complex(gwpc),allocatable :: vc_sqrt(:)
234    ! (npw)
235    ! Square root of the Coulomb interaction in reciprocal space.
236    ! Allocated and computed for tchi/W descriptors.
237    ! A cutoff might be applied.
238 
239  contains
240 
241    procedure :: init => desc_init
242    ! Initialize the object
243 
244    procedure :: copy => desc_copy
245    ! Copy object.
246 
247    procedure :: to_scbox => desc_to_scbox
248    ! Copy object.
249 
250    procedure :: get_vc_sqrt => desc_get_vc_sqrt
251    ! Compute square root of vc(q,g).
252 
253    procedure :: free => desc_free
254    ! Free memory.
255 
256  end type desc_t
257 
258  interface desc_array_free
259    module procedure desc_array1_free
260  end interface desc_array_free

m_gwr/desc_to_scbox [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 desc_to_scbox

FUNCTION

 Insert cg_k array defined on the k-centered g-sphere with npw vectors inside the FFT box.
 The main difference wrt to sphere is that cfft is not initialized to zero. See notes below.

INPUTS

 sc_ngfft:
   n1,n2,n3=physical dimension of the FFT box
   n4,n5,n6=memory dimension of cfft
 npw=number of G vectors in basis at this k point
 ndat=number of items to process
 cg(npw*ndat)= contains values for npw G vectors in basis sphere

OUTPUT

 cfft(n4,n5,n6*ndat) = array on FFT box filled with cg data
      Note that cfft is intent(inout) so that we can add contributions from different k-points.

SOURCE

3815 subroutine desc_to_scbox(desc, kk, ngkpt, sc_ngfft, ndat, cg, cfft)
3816 
3817 !Arguments ------------------------------------
3818 !scalars
3819  class(desc_t),intent(inout) :: desc
3820  real(dp),intent(in) :: kk(3)
3821  integer,intent(in) :: ngkpt(3)
3822  integer,intent(in) :: sc_ngfft(6), ndat
3823 !arrays
3824  complex(gwpc),intent(in) :: cg(desc%npw, ndat)
3825  complex(gwpc),intent(inout) :: cfft(sc_ngfft(4)*sc_ngfft(5)*sc_ngfft(6),ndat)
3826 
3827 !Local variables-------------------------------
3828 integer :: n1, n2, n3, n4, n5, n6, i1, i2, i3, idat, ipw, kg(3), gg(3), ifft
3829  logical :: compute_mapping
3830  !real(dp) :: tsec(2) !, cpu, wall, gflops
3831  !character(len=500) :: msg
3832 
3833 ! *************************************************************************
3834 
3835  !call timab(1931, 1, tsec)
3836 
3837  n1 = sc_ngfft(1); n2 = sc_ngfft(2); n3 = sc_ngfft(3)
3838  n4 = sc_ngfft(4); n5 = sc_ngfft(5); n6 = sc_ngfft(6)
3839  gg = nint(kk * ngkpt)
3840 
3841  compute_mapping = .not. allocated(desc%g2box) .or. any(desc%cached_sc_ngfft /= sc_ngfft(1:6))
3842 
3843  ! FIXME This is not thread safe
3844  if (compute_mapping) then
3845    ABI_REMALLOC(desc%g2box, (desc%npw))
3846    desc%cached_sc_ngfft = sc_ngfft(1:6)
3847    do ipw=1,desc%npw
3848      kg = gg + ngkpt * desc%gvec(:,ipw)  ! k+g
3849      i1 = modulo(kg(1), n1) !+ 1
3850      i2 = modulo(kg(2), n2) !+ 1
3851      i3 = modulo(kg(3), n3) !+ 1
3852      desc%g2box(ipw) = 1 + i1 + n4*(i2+i3*n5)
3853    end do
3854  end if
3855 
3856  ! Insert cg into cfft
3857 !$OMP PARALLEL DO PRIVATE(i1, i2, i3) IF (ndat > 1)
3858  do idat=1,ndat
3859    do ipw=1,desc%npw
3860      !if (any(kg_k(:,ipw) > sc_ngfft(1:3)/2) .or. any(kg_k(:,ipw) < -(sc_ngfft(1:3)-1)/2) ) then
3861      !  write(msg,'(a,3(i0,1x),a)')" The G-vector: ",kg_k(:, ipw)," falls outside the FFT box. Increase boxcutmin (?)"
3862      !  ABI_ERROR(msg)
3863      !end if
3864      ifft = desc%g2box(ipw)
3865      cfft(ifft,idat) = cg(ipw,idat)
3866    end do
3867  end do
3868 
3869  !call timab(1931, 2, tsec)
3870 
3871 end subroutine desc_to_scbox

m_gwr/est_print [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 est_print

FUNCTION

SOURCE

1760 subroutine est_print(est, units)
1761 
1762 !Arguments ------------------------------------
1763  class(est_t), intent(in) :: est
1764  integer,intent(in) :: units(:)
1765 
1766 !Local variables-------------------------------
1767  character(len=4),parameter :: fmt = "f8.1"
1768 
1769 ! *************************************************************************
1770 
1771  call wrtout(units, "- Resident memory in Mb for G(g,g',+/-tau) and chi(g,g',tau):")
1772  call wrtout(units, sjoin("- G_k(g,g,tau): ", ftoa(est%mem_green_gg, fmt=fmt)))
1773  call wrtout(units, sjoin("- Chi_q(g,g,tau): ", ftoa(est%mem_chi_gg, fmt=fmt)))
1774  call wrtout(units, sjoin("- u_k(g,b): ", ftoa(est%mem_ugb, fmt=fmt)))
1775  call wrtout(units, "- Temporary memory allocated inside the tau loops:")
1776  call wrtout(units, sjoin("- G_k(r,g): ", ftoa(est%mem_green_rg, fmt=fmt)))
1777  call wrtout(units, sjoin("- chi_q(r,g): ", ftoa(est%mem_chi_rg, fmt=fmt)))
1778 
1779 end subroutine est_print

m_gwr/est_t [ Types ]

[ Top ] [ m_gwr ] [ Types ]

NAME

 est_t

FUNCTION

 Memory is given in Mb

SOURCE

274  type, public :: est_t
275 
276    real(dp) :: mem_green_gg = zero
277    real(dp) :: mem_green_rg = zero
278    real(dp) :: mem_chi_gg = zero
279    real(dp) :: mem_chi_rg = zero
280    real(dp) :: mem_ugb = zero
281    real(dp) :: mem_total = zero
282    real(dp) :: efficiency = zero
283    real(dp) :: speedup = zero
284 
285  contains
286    procedure :: print => est_print
287  end type est_t

m_gwr/estimate [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 estimate

FUNCTION

  Estimate memory requirements and the parallel speedup of a given `np_kgts` configuration.

SOURCE

1695 type(est_t) pure function estimate(gwr, np_kgts) result(est)
1696 
1697 !Arguments ------------------------------------
1698  class(gwr_t),intent(in) :: gwr
1699  integer,intent(in) :: np_kgts(4)
1700 
1701 !Local variables-------------------------------
1702  real(dp) :: np_k, np_g, np_t, np_s, w_k, w_g, w_t, w_s, np_tot
1703 
1704 ! *************************************************************************
1705 
1706  ! Use real quantities to avoid integer division
1707  np_k = np_kgts(1); np_g = np_kgts(2); np_t = np_kgts(3); np_s = np_kgts(4)
1708  np_tot = product(real(np_kgts))
1709 
1710  ! NB: array dimensioned with nkibz and nqibz do not scale as 1/np_k as we distribute the BZ, IBZ points might be replicated.
1711 
1712  ! Resident memory in Mb for G(g,g',+/-tau) and chi(g,g',tau)
1713  est%mem_green_gg = two * two * (one*gwr%nspinor*gwr%green_mpw)**2 * two*gwr%ntau * gwr%nkibz * gwr%nsppol * gwp*b2Mb / np_tot
1714  est%mem_chi_gg = two * (one*gwr%tchi_mpw)**2 * gwr%ntau * gwr%nqibz * gwp*b2Mb / (np_g * np_t * np_k)
1715  est%mem_ugb = two * gwr%green_mpw * gwr%nspinor * gwr%dtset%nband(1) * gwr%nkibz * gwr%nsppol * gwp*b2Mb / np_tot
1716 
1717  ! Temporary memory allocated inside the tau loops.
1718  ! This is the chunck we have to minimize by increasing np_g and/or np_k to avoid going OOM.
1719  est%mem_green_rg = two * two * gwr%nspinor**2 * gwr%green_mpw * gwr%g_nfft * gwr%nkbz * gwr%nsppol * gwp*b2Mb / (np_g * np_k)
1720  est%mem_chi_rg = two * gwr%tchi_mpw * gwr%g_nfft * gwr%nqbz * gwp*b2Mb / (np_g * np_k)
1721 
1722  est%mem_total = est%mem_green_gg + est%mem_chi_gg + est%mem_ugb + est%mem_green_rg + est%mem_chi_rg
1723 
1724  ! Estimate speedup and parallel efficiency using heuristic weights. Note g_nfft instead of green_mpw.
1725  w_k = 0.799_dp; w_g = 0.899_dp; w_t = 1.1_dp; w_s = 1.2_dp
1726 
1727  ! Promote kpt parallelism under particular circumstances.
1728  if (gwr%nkbz > 4**3) w_k = w_g + tol2 * merge(+1, -5, np_k < 5)
1729 
1730  est%speedup = speedup(gwr%nkbz, nint(np_k), w_k) * speedup(gwr%g_nfft, nint(np_g), w_g) * &
1731                speedup(gwr%ntau, nint(np_t), w_t) * speedup(gwr%nsppol, nint(np_s), w_s)
1732  est%efficiency = est%speedup / np_tot
1733 
1734 contains
1735 
1736 real(dp) pure function speedup(size, np, weight)
1737  ! Expected speedup for a `size` problem and `np` processes
1738  integer,intent(in) :: size, np
1739  real(dp),intent(in) :: weight
1740  if (np == 1) then
1741    speedup = one
1742  else
1743    speedup = (weight*size) / (one* ((size / np) + merge(0, 1, mod(size, np) == 0)))
1744  end if
1745 end function speedup
1746 
1747 end function estimate

m_gwr/get_1d_sc_phases [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  get_1d_sc_phases

FUNCTION

  Compute one-dimensional factors in the supercell.

INPUTS

OUTPUT

SOURCE

7913 subroutine get_1d_sc_phases(sc_shape, nkpt, kpts, ph1d)
7914 
7915 !Arguments ------------------------------------
7916  integer,intent(in) :: sc_shape(3), nkpt
7917  real(dp),intent(in) :: kpts(3, nkpt)
7918  complex(gwpc),allocatable,intent(out) :: ph1d(:,:,:)
7919 
7920 !Local variables-------------------------------
7921  integer :: ikpt, ix, iy, iz
7922  real(dp) :: arg, fact, kk(3)
7923 
7924 ! *************************************************************************
7925 
7926  ABI_MALLOC(ph1d, (maxval(sc_shape), 3, nkpt))
7927 
7928  do ikpt=1,nkpt
7929    kk = kpts(:, ikpt)
7930    fact = two_pi * kk(1)
7931    do ix=0,sc_shape(1) - 1
7932      arg = fact * ix
7933      ph1d(ix + 1, 1, ikpt) = cmplx(cos(arg), sin(arg), kind=gwpc)
7934    end do
7935    fact = two_pi * kk(2)
7936    do iy=0,sc_shape(2) - 1
7937      arg = fact * iy
7938      ph1d(iy + 1, 2, ikpt) = cmplx(cos(arg), sin(arg), kind=gwpc)
7939    end do
7940    fact = two_pi * kk(3)
7941    do iz=0,sc_shape(3) - 1
7942      arg = fact * iz
7943      ph1d(iz + 1, 3, ikpt) = cmplx(cos(arg), sin(arg), kind=gwpc)
7944    end do
7945  end do ! ikpt
7946 
7947 end subroutine get_1d_sc_phases

m_gwr/gsph2box [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 gsph2box

FUNCTION

 Insert cg_k array defined on the k-centered g-sphere with npw vectors inside the FFT box.
 The main difference wrt to sphere is that cfft is not initialized to zero. See notes below.

INPUTS

 ngfft:
   n1,n2,n3=physical dimension of the FFT box
   n4,n5,n6=memory dimension of cfft
 npw=number of G vectors in basis at this k point
 ndat=number of items to process
 kg_k(3,npw)=integer coordinates of G vectors in basis sphere
 cg(npw*ndat)= contains values for npw G vectors in basis sphere

OUTPUT

 cfft(n4,n5,n6*ndat) = array on FFT box filled with cg data
      Note that cfft is intent(inout) so that we can add contributions from different k-points.

SOURCE

6738 subroutine gsph2box(ngfft, npw, ndat, kg_k, cg, cfft)
6739 
6740 !Arguments ------------------------------------
6741 !scalars
6742  integer,intent(in) :: ngfft(6), npw, ndat
6743 !arrays
6744  integer,intent(in) :: kg_k(3, npw)
6745  complex(gwpc),intent(in) :: cg(npw * ndat)
6746  complex(gwpc),target,intent(inout) :: cfft(ngfft(4)*ngfft(5)*ngfft(6)*ndat)
6747 
6748 !Local variables-------------------------------
6749  integer :: n1, n2, n3, n4, n5, n6, i1, i2, i3, idat, ipw
6750  complex(gwpc),contiguous,pointer :: cfft_ptr(:,:,:,:)
6751  !real(dp) :: tsec(2) !, cpu, wall, gflops
6752  !character(len=500) :: msg
6753 
6754 ! *************************************************************************
6755 
6756  !call timab(1931, 1, tsec)
6757  n1 = ngfft(1); n2 = ngfft(2); n3 = ngfft(3)
6758  n4 = ngfft(4); n5 = ngfft(5); n6 = ngfft(6)
6759  call c_f_pointer(c_loc(cfft), cfft_ptr, shape=[n4, n5, n6, ndat])
6760 
6761  ! Insert cg into cfft
6762 !$OMP PARALLEL DO PRIVATE(i1, i2, i3) IF (ndat > 1)
6763  do idat=1,ndat
6764    do ipw=1,npw
6765      i1 = modulo(kg_k(1, ipw), n1) + 1
6766      i2 = modulo(kg_k(2, ipw), n2) + 1
6767      i3 = modulo(kg_k(3, ipw), n3) + 1
6768      !if (any(kg_k(:,ipw) > ngfft(1:3)/2) .or. any(kg_k(:,ipw) < -(ngfft(1:3)-1)/2) ) then
6769      !  write(msg,'(a,3(i0,1x),a)')" The G-vector: ",kg_k(:, ipw)," falls outside the FFT box. Increase boxcutmin (?)"
6770      !  ABI_ERROR(msg)
6771      !end if
6772      cfft_ptr(i1,i2,i3,idat) = cg(ipw+npw*(idat-1))
6773    end do
6774  end do
6775  !call timab(1931, 2, tsec)
6776 
6777 end subroutine gsph2box

m_gwr/gwr_build_chi0_head_and_wings [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_chi0_head_and_wings

FUNCTION

  Compute head and wings of chi0 on the minimax frequency grid.

SOURCE

6854 subroutine gwr_build_chi0_head_and_wings(gwr)
6855 
6856 !Arguments ------------------------------------
6857  class(gwr_t),target,intent(inout) :: gwr
6858 
6859 !Local variables-------------------------------
6860 !scalars
6861  integer,parameter :: two_poles = 2, one_pole = 1, gwcomp0 = 0, spmeth0 = 0
6862  integer :: nsppol, nspinor, ierr, my_is, spin, my_ikf, itau, my_it
6863  integer :: ik_bz, ik_ibz, isym_k, trev_k, g0_k(3)
6864  !integer :: iq_bz, iq_ibz, isym_q, trev_q, g0_q(3)
6865  integer :: nkpt_summed, use_umklp, band1, band2, band1_start, band1_stop, band1_max
6866  integer :: ib, il_b1, il_b2, nb, block_size, ii, mband, block_counter
6867  integer :: istwf_ki, npw_ki, istwf_kf, nI, nJ, nomega, io, iq, nq, dim_rtwg !ig,
6868  integer :: npwe, u_nfft, u_mgfft, u_mpw
6869  logical :: isirr_k, use_tr, is_metallic, print_time
6870  real(dp) :: spin_fact, weight, deltaf_b1b2, deltaeGW_b1b2, gwr_boxcutmin_c, zcut, qlen, eig_nk, e0
6871  real(dp) :: cpu_all, wall_all, gflops_all, cpu_k, wall_k, gflops_k
6872  complex(dpc) :: deltaeKS_b1b2
6873  type(__slkmat_t),pointer :: ugb_kibz
6874  character(len=5000) :: msg
6875  type(crystal_t),pointer :: cryst
6876  type(dataset_type),pointer :: dtset
6877  type(ebands_t),pointer :: now_ebands
6878  type(littlegroup_t) :: ltg_q
6879  type(desc_t),pointer :: desc_ki
6880 !arrays
6881  integer :: gmax(3), u_ngfft(18), work_ngfft(18), units(2) ! spinor_padx(2,4), g0(3),
6882  integer,contiguous, pointer :: kg_ki(:,:)
6883  integer,allocatable :: gvec_q0(:,:), gbound_q0(:,:), u_gbound(:,:)
6884  real(dp) :: kk_ibz(3), kk_bz(3), tsec(2)
6885  real(dp),contiguous, pointer :: qp_eig(:,:,:), qp_occ(:,:,:), ks_eig(:,:,:) !, cwave(:,:)
6886  real(dp),allocatable :: work(:,:,:,:), qdirs(:,:)
6887  logical :: gradk_not_done(gwr%nkibz)
6888  logical,allocatable :: bbp_mask(:,:)
6889  complex(dpc) :: chq(3) !, wng(3)
6890  !complex(dp),allocatable :: ug1_block(:,:)
6891  complex(gwpc) :: rhotwx(3, gwr%nspinor**2) !, new_rhotwx(3, gwr%nspinor**2)
6892  complex(gwpc),allocatable :: ug2(:), ur1_kibz(:), ur2_kibz(:), ur_prod(:), rhotwg(:), ug1_block(:,:), ug1(:)
6893  complex(dpc) :: green_w(gwr%ntau), omega(gwr%ntau)
6894  complex(dpc),allocatable :: chi0_lwing(:,:,:), chi0_uwing(:,:,:), chi0_head(:,:,:), head_qvals(:)
6895  real(dp), allocatable :: gh1c_block(:,:,:,:)
6896  type(vkbr_t),allocatable :: vkbr(:)
6897  type(gsphere_t) :: gsph
6898  type(ddkop_t) :: ddkop
6899  !type(pawcprj_type),allocatable :: cwaveprj(:,:)
6900 
6901 ! *************************************************************************
6902 
6903  call timab(1927, 1, tsec)
6904  call cwtime(cpu_all, wall_all, gflops_all, "start")
6905  units = [std_out, ab_out]
6906  call wrtout(units, sjoin(" Computing chi0 head and wings with inclvkb:", itoa(gwr%dtset%inclvkb)), pre_newlines=1)
6907 
6908  nspinor = gwr%nspinor; nsppol = gwr%nsppol; dtset => gwr%dtset; cryst => gwr%cryst
6909  use_tr = gwr%dtset%awtr == 1; zcut = gwr%dtset%zcut ! well, it's not used in g0w0 when omega is complex.
6910 
6911  ! Use KS or QP energies depending on the iteration state.
6912  if (gwr%scf_iteration == 1) then
6913    call wrtout(units, " Using KS orbitals and KS energies...", newlines=1, do_flush=.True.)
6914    qp_eig => gwr%ks_ebands%eig; qp_occ => gwr%ks_ebands%occ
6915    now_ebands => gwr%ks_ebands
6916  else
6917    call wrtout(units, " Using KS orbitals and QP energies...", newlines=1, do_flush=.True.)
6918    qp_eig => gwr%qp_ebands%eig; qp_occ => gwr%qp_ebands%occ
6919    now_ebands => gwr%qp_ebands
6920  end if
6921 
6922  ks_eig => gwr%ks_ebands%eig
6923  mband = gwr%ks_ebands%mband
6924 
6925  is_metallic = ebands_has_metal_scheme(now_ebands)
6926 
6927  ! Setup weight (2 for spin unpolarized systems, 1 for polarized).
6928  ! spin_fact is used to normalize the occupation factors to one.
6929  ! Consider also the AFM case.
6930  select case (nsppol)
6931  case (1)
6932    weight = two / gwr%nkbz; spin_fact = half
6933    if (gwr%nspden == 2) then
6934      weight = one / gwr%nkbz; spin_fact = half
6935    end if
6936    if (nspinor == 2) then
6937      weight = one / gwr%nkbz; spin_fact = one
6938    end if
6939  case (2)
6940    weight = one / gwr%nkbz; spin_fact = one
6941  case default
6942    ABI_BUG(sjoin("Wrong nsppol:", itoa(nsppol)))
6943  end select
6944 
6945  ! TODO: Replace vkbr with ddk and factorize calls to DDK |bra>
6946  ABI_MALLOC(vkbr, (gwr%nkibz))
6947  gradk_not_done = .TRUE.
6948 
6949  ! TODO: Might become 1b
6950  ABI_MALLOC(bbp_mask, (mband, mband))
6951 
6952  ! =========================================
6953  ! Find FFT mesh and max number of g-vectors
6954  ! =========================================
6955  ! TODO: Can be decreased. Consider also fftgw
6956  gwr_boxcutmin_c = two
6957  !gwr_boxcutmin_c = one
6958  call gwr%get_u_ngfft(gwr_boxcutmin_c, u_ngfft, u_nfft, u_mgfft, u_mpw, gmax)
6959 
6960  ! Init work_ngfft
6961  gmax = gmax + 4 ! FIXME: this is to account for umklapp, should also consider Gamma-only and istwfk
6962  gmax = 2 * gmax + 1
6963  call ngfft_seq(work_ngfft, gmax)
6964  !write(std_out,*)"work_ngfft(1:3): ",work_ngfft(1:3)
6965  ABI_MALLOC(work, (2, work_ngfft(4), work_ngfft(5), work_ngfft(6)))
6966 
6967  if (gwr%comm%me == 0) then
6968    call print_ngfft(u_ngfft, header="FFT mesh for chi0 head/wings computation", unit=std_out)
6969    !call print_ngfft(u_ngfft, header="FFT mesh for chi0 head/wings computation", unit=ab_out)
6970  endif
6971 
6972  ! Need to broadcast G-vectors at q = 0 if k/q-point parallelism is activated.
6973  if (gwr%kpt_comm%me == 0) then
6974    npwe = gwr%tchi_desc_qibz(1)%npw
6975    ABI_CHECK(gwr%tchi_desc_qibz(1)%kin_sorted, "g-vectors are not sorted by |q+g|^2/2 !")
6976  end if
6977  call xmpi_bcast(npwe, 0, gwr%kpt_comm%value, ierr)
6978  ABI_MALLOC(gvec_q0, (3, npwe))
6979  if (gwr%kpt_comm%me == 0) gvec_q0 = gwr%tchi_desc_qibz(1)%gvec
6980  call xmpi_bcast(gvec_q0, 0, gwr%kpt_comm%value, ierr)
6981 
6982  ! This is needed to call accumulate_head_wings_imagw
6983  call gsph%init(cryst, npwe, gvec_q0)
6984 
6985  ABI_MALLOC(gbound_q0, (2 * u_mgfft + 8, 2))
6986  call sphereboundary(gbound_q0, istwfk1, gvec_q0, u_mgfft, npwe)
6987 
6988  ! Init little group to find IBZ_q
6989  use_umklp = 0
6990  call ltg_q%init([zero, zero, zero], gwr%nkbz, gwr%kbz, cryst, use_umklp, npwe) !, gvec=gvec_kss)
6991 
6992  nkpt_summed = gwr%nkbz
6993  if (dtset%symchi /= 0) then
6994    nkpt_summed = ltg_q%nibz_ltg
6995    call ltg_q%print(std_out, dtset%prtvol)
6996  end if
6997  !call wrtout(std_out, sjoin(' Calculation status: ', itoa(nkpt_summed), ' k-points to be completed'))
6998 
6999  ! ============================================
7000  ! === Begin big fat loop over transitions ====
7001  ! ============================================
7002 
7003  ! NB: One might reduce the number of bands as head and wings converge fast wrt nband and slow wrt k-mesh.
7004  ! Should introduce a tolerance on the frequency part computed at the first minimax frequency and
7005  ! compute max_nband from this.
7006 
7007  ! Find band1_max from gwr_max_hwtene
7008  band1_max = gwr%ugb_nband
7009 
7010  if (gwr%dtset%gwr_max_hwtene > zero) then
7011    ! Set e0 to top of valence band if semiconductor else Fermi level
7012    e0 = now_ebands%fermie
7013    if (all(gwr%ks_gaps%ierr == 0)) e0 = minval(gwr%ks_gaps%vb_max)
7014    do band1_start=1, gwr%ugb_nband
7015      if (all(qp_eig(band1_start,:,:) - e0 > gwr%dtset%gwr_max_hwtene)) then
7016        band1_max = band1_start; exit
7017      end if
7018    end do
7019  !else if (gwr%dtset%gwr_max_hwtene < zero) then
7020  !  band1_max = min(nint(-gwr%dtset%gwr_max_hwtene) gwr%ugb_nband)
7021  end if
7022 
7023  call wrtout(std_out, sjoin(" gwr_max_hwtene:", ftoa(gwr%dtset%gwr_max_hwtene * Ha_eV), " (eV)"))
7024  call wrtout(std_out, sjoin(" Using: ", itoa(band1_max), "/", itoa(gwr%ugb_nband), "bands for chi0 head and wings."))
7025 
7026  ! Loop on spin to calculate $\chi_{\up,\up} + \chi_{\down,\down}$
7027  ! TODO: nspinor 2
7028  nI = 1; nJ = 1; nomega = gwr%ntau
7029  omega(:) = j_dpc * gwr%iw_mesh(:)
7030  ABI_CALLOC(chi0_lwing, (npwe*nI, nomega, 3))
7031  ABI_CALLOC(chi0_uwing, (npwe*nJ, nomega, 3))
7032  ABI_CALLOC(chi0_head, (3, 3, nomega))
7033 
7034  ABI_MALLOC(u_gbound, (2 * u_mgfft + 8, 2))
7035  ABI_MALLOC(ur1_kibz, (u_nfft * nspinor))
7036  ABI_MALLOC(ur2_kibz, (u_nfft * nspinor))
7037  ABI_MALLOC(ur_prod, (u_nfft * nspinor))
7038  dim_rtwg = 1 !; if (nspinor==2) dim_rtwg=2 ! Can reduce size depending on Ep%nI and Ep%nj
7039  ABI_MALLOC(rhotwg, (npwe * dim_rtwg))
7040 
7041  ! TODO:
7042  ddkop = ddkop_new(dtset, gwr%cryst, gwr%pawtab, gwr%psps, gwr%mpi_enreg, u_mpw, u_ngfft)
7043 
7044  do my_is=1,gwr%my_nspins
7045    spin = gwr%my_spins(my_is)
7046 
7047    ! Loop over my k-points in the BZ.
7048    do my_ikf=1,gwr%my_nkbz
7049      ik_bz = gwr%my_kbz_inds(my_ikf); kk_bz = gwr%kbz(:, ik_bz)
7050      istwf_kf = 1
7051      !istwf_kf = gwt% ???
7052 
7053      if (dtset%symchi == 1 .and. ltg_q%ibzq(ik_bz) /= 1) CYCLE ! Only IBZ_q
7054      print_time = gwr%comm%me == 0 .and. (my_ikf <= LOG_MODK .or. mod(my_ikf, LOG_MODK) == 0)
7055      if (print_time) call cwtime(cpu_k, wall_k, gflops_k, "start")
7056 
7057      ! FIXME: Be careful with the symmetry conventions here!
7058      ! and the interplay between umklapp in q and FFT
7059      ! Also, the assembly_chi0 routines assume symrec and trev_k in [1, 2]
7060      ik_ibz = gwr%kbz2ibz_symrel(1, ik_bz); isym_k = gwr%kbz2ibz_symrel(2, ik_bz)
7061      trev_k = gwr%kbz2ibz_symrel(6, ik_bz); g0_k = gwr%kbz2ibz_symrel(3:5, ik_bz)
7062      isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
7063      kk_ibz = gwr%kibz(:, ik_ibz)
7064 
7065      ugb_kibz => gwr%ugb(ik_ibz, spin)
7066      desc_ki => gwr%green_desc_kibz(ik_ibz)
7067      npw_ki   =  desc_ki%npw
7068      istwf_ki =  desc_ki%istwfk
7069      kg_ki    => desc_ki%gvec
7070 
7071      ABI_MALLOC(ug1, (npw_ki * nspinor))
7072      ABI_MALLOC(ug2, (npw_ki * nspinor))
7073 
7074      call sphereboundary(u_gbound, istwf_ki, kg_ki, u_mgfft, npw_ki)
7075 
7076      if (gwr%usepaw == 0 .and. dtset%inclvkb /= 0 .and. gradk_not_done(ik_ibz)) then
7077        ! Include term <n,k|[Vnl,iqr]|n"k>' for q -> 0.
7078        call vkbr_init(vkbr(ik_ibz), cryst, gwr%psps, dtset%inclvkb, istwf_ki, npw_ki, kk_ibz, kg_ki)
7079        gradk_not_done(ik_ibz) = .FALSE.
7080      end if
7081 
7082      !call ddkop%setup_spin_kpoint(gwr%dtset, gwr%cryst, gwr%psps, spin, kk_bz, istwf_kk, npw_ki, kg_ki)
7083 
7084      !call wfd%copy_cg(ib_v, ik, spin, cg_v)
7085      !call ddkop%apply(ebands%eig(ib_v, ik, spin), npw_k, wfd%nspinor, cg_v, cwaveprj)
7086 
7087      !call wfd%copy_cg(ib_c, ik, spin, cg_c)
7088      !vv = ddkop%get_braket(ebands%eig(ib_c, ik, spin), istwf_k, npw_k, nspinor, cg_c, mode=ds%mode)
7089 
7090      ! HM: 24/07/2018
7091      ! Transform dipoles to be consistent with results from DFPT
7092      ! Perturbations with DFPT are along the reciprocal lattice vectors
7093      ! Perturbations with Commutator are along real space lattice vectors
7094      ! dot(A, DFPT) = X
7095      ! dot(B, COMM) = X
7096      ! B = 2 pi (A^{-1})^T =>
7097      ! dot(B^T B,COMM) = 2 pi DFPT
7098      !vr = (2*pi)*(2*pi)*sum(ihrc(:,:),dim=2)
7099      !vg(1) = dot_product(Cryst%gmet(1,:), vr)
7100      !vg(2) = dot_product(Cryst%gmet(2,:), vr)
7101      !vg(3) = dot_product(Cryst%gmet(3,:), vr)
7102 
7103      call chi0_bbp_mask(ik_ibz, ik_ibz, spin, spin_fact, use_tr, &
7104                         gwcomp0, spmeth0, gwr%ugb_nband, mband, now_ebands, bbp_mask)
7105      !bbp_mask = .True.
7106 
7107      ! FIXME: This part should be tested with tau/g-para
7108      ! TODO:
7109      !  1) Logic to determine block_size from memory.
7110      !  2) Add support for symchi = 0
7111      !  3) Invert the loops
7112 
7113      block_size = min(48, gwr%ugb_nband)
7114      !block_size = min(200, gwr%ugb_nband)
7115      !block_size = 1
7116 
7117      block_counter = 0
7118      do band1_start=1, gwr%ugb_nband, block_size
7119        block_counter = block_counter + 1
7120        ! Distribute blocks inside tau_comm as wavefunctions are replicated
7121        if (gwr%tau_comm%skip(block_counter)) cycle
7122 
7123        if (all(.not. bbp_mask(band1_start:, :))) then
7124          !print *, "exiting band1_start loop"
7125          exit
7126        end if
7127 
7128        !print *, "band1_start, gwr%ugb_nband, block_size", band1_start, gwr%ugb_nband, block_size
7129        nb = blocked_loop(band1_start, gwr%ugb_nband, block_size)
7130        band1_stop = band1_start + nb - 1
7131        if (band1_stop > band1_max) exit
7132 
7133        ! Collect nb bands starting from band1_start on each proc.
7134        call ugb_kibz%collect_cplx(npw_ki * nspinor, nb, [1, band1_start], ug1_block)
7135 
7136        ABI_MALLOC(gh1c_block, (2, npw_ki*nspinor, 3, nb))
7137        do il_b1=1, ugb_kibz%sizeb_local(2)
7138          band1 = ugb_kibz%loc2gcol(il_b1)
7139          eig_nk = gwr%ks_ebands%eig(band1, ik_ibz, spin)
7140 
7141          ! FIXME: This is wrong if spc
7142          !call c_f_pointer(c_loc(ugb_kibz%buffer_cplx(:,il_b1)), cwave, shape=[2, npw_ki*nspinor])
7143          !call ddkop%apply(eig_nk, npw_ki, nspinor, cwave, cwaveprj)
7144          !gh1c_block(:,:,:,xx_ib) = ddkop%gh1c(:, 1:npw_ki*nspinor,:)
7145        end do
7146 
7147        ! Loop over "conduction" states.
7148        !do band1=band1_start, band1_stop
7149        do ib=1,nb
7150          band1 = band1_start + ib - 1
7151          ug1 = ug1_block(:, ib)
7152          call fft_ug(npw_ki, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwf_ki, kg_ki, u_gbound, ug1, ur1_kibz)
7153          !call fft_ug(npw_ki, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwf_ki, kg_ki, u_gbound, ug1_block(:,ib), ur1_kibz)
7154 
7155          ! Loop over "valence" states.
7156          !do band2=1,gwr%ugb_nband
7157          do il_b2=1, ugb_kibz%sizeb_local(2)
7158            band2 = ugb_kibz%loc2gcol(il_b2)
7159 
7160            deltaeKS_b1b2 = ks_eig(band1, ik_ibz, spin) - ks_eig(band2, ik_ibz, spin)
7161            deltaf_b1b2  = spin_fact * (qp_occ(band1, ik_ibz, spin) - qp_occ(band2, ik_ibz, spin))
7162            deltaeGW_b1b2 = qp_eig(band1, ik_ibz, spin) - qp_eig(band2, ik_ibz, spin)
7163 
7164            ! Skip negligible transitions.
7165            if (abs(deltaf_b1b2) < GW_TOL_DOCC) CYCLE
7166            ! Adler-Wiser expression.
7167            ! Add small imaginary of the Time-Ordered response function but only for non-zero real omega
7168            ! FIXME What about metals?
7169            if (.not. use_tr) then
7170              ! Adler-Wiser without time-reversal.
7171              do io=1,nomega
7172                green_w(io) = g0g0w(omega(io), deltaf_b1b2, deltaeGW_b1b2, zcut, GW_TOL_W0, one_pole)
7173              end do
7174 
7175            else
7176              if (band1 < band2) CYCLE ! Here we GAIN a factor ~2
7177 
7178              do io=1,nomega
7179                ! Rangel: In metals, the intra-band transitions term does not contain the antiresonant part
7180                ! if(abs(deltaeGW_b1b2)>GW_TOL_W0) green_w(io) = g0g0w(omega(io),deltaf_b1b2,deltaeGW_b1b2,zcut,GW_TOL_W0)
7181                if (band1 == band2) green_w(io) = g0g0w(omega(io), deltaf_b1b2, deltaeGW_b1b2, zcut, GW_TOL_W0, one_pole)
7182                if (band1 /= band2) green_w(io) = g0g0w(omega(io), deltaf_b1b2, deltaeGW_b1b2, zcut, GW_TOL_W0, two_poles)
7183              end do
7184            end if
7185 
7186            ug2 = ugb_kibz%buffer_cplx(:, il_b2)
7187            call fft_ug(npw_ki, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwf_ki, kg_ki, u_gbound, ug2, ur2_kibz)
7188 
7189            ! FIXME: nspinor 2 is wrong as we have a 2x2 matrix
7190            ur_prod(:) = conjg(ur1_kibz(:)) * ur2_kibz
7191            call fft_ur(npwe, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwfk1, gvec_q0, gbound_q0, ur_prod, rhotwg)
7192 
7193            if (gwr%usepaw == 0) then
7194              ! Matrix elements of i[H,r] for NC pseudopotentials.
7195              ! NB ug1 and ug2 are kind=gwpc
7196              rhotwx = nc_ihr_comm(vkbr(ik_ibz), cryst, gwr%psps, npw_ki, nspinor, istwf_ki, gwr%dtset%inclvkb, &
7197                                   kk_ibz, ug1, ug2, kg_ki)
7198            end if
7199 
7200            ! Treat a possible degeneracy between v and c.
7201            ! Adler-Wiser expression, to be consistent here we use the KS eigenvalues (?)
7202            if (abs(deltaeKS_b1b2) > GW_TOL_W0) then
7203              rhotwx = -rhotwx / deltaeKS_b1b2
7204            else
7205              rhotwx = czero_gw
7206            end if
7207 
7208            !new_rhotwx = zero
7209            !gh1c_block(:,:,:,ib) = ddkop%gh1c(:, 1:npw_ki*nspinor,:)
7210 
7211            ! NB: Using symrec conventions here
7212            ik_ibz = gwr%kbz2ibz(1, ik_bz); isym_k = gwr%kbz2ibz(2, ik_bz)
7213            trev_k = gwr%kbz2ibz(6, ik_bz); g0_k = gwr%kbz2ibz(3:5, ik_bz)
7214            trev_k = trev_k + 1  ! NB: GW routines assume trev in [1, 2]
7215 
7216            ! TODO: Metals
7217            call accumulate_head_wings_imagw( &
7218                                         npwe, nomega, nI, nJ, dtset%symchi, &
7219                                         is_metallic, ik_bz, isym_k, trev_k, nspinor, cryst, ltg_q, gsph, &
7220                                         rhotwx, rhotwg, green_w, chi0_head, chi0_lwing, chi0_uwing)
7221          end do ! band2
7222        end do ! band1
7223 
7224        ABI_FREE(ug1_block)
7225        ABI_SFREE(gh1c_block)
7226      end do ! band1_start
7227 
7228      !if (gwr%usepaw == 0 .and. dtset%inclvkb /= 0 .and. dtset%symchi == 1) then
7229      !  call vkbr_free(vkbr(ik_ibz)) ! Not need anymore as we loop only over IBZ.
7230      !end if
7231 
7232      ABI_FREE(ug1)
7233      ABI_FREE(ug2)
7234      if (print_time) then
7235        write(msg,'(4x,3(a,i0),a)')"my_ikf [", my_ikf, "/", gwr%my_nkbz, "] (tot: ", gwr%nkbz, ")"
7236        call cwtime_report(msg, cpu_k, wall_k, gflops_k); if (my_ikf == LOG_MODK) call wrtout(std_out, " ...")
7237      end if
7238    end do ! my_ikf
7239  end do ! my_is
7240 
7241  call ddkop%free()
7242  ABI_FREE(bbp_mask)
7243  ABI_FREE(gvec_q0)
7244  ABI_FREE(gbound_q0)
7245  ABI_FREE(work)
7246  ABI_FREE(ur1_kibz)
7247  ABI_FREE(ur2_kibz)
7248  ABI_FREE(ur_prod)
7249  ABI_FREE(rhotwg)
7250  ABI_FREE(u_gbound)
7251  call vkbr_free(vkbr)
7252  ABI_FREE(vkbr)
7253 
7254  ! Collect head and wings.
7255  call xmpi_sum(chi0_head, gwr%comm%value, ierr)
7256  call xmpi_sum(chi0_lwing, gwr%comm%value, ierr)
7257  call xmpi_sum(chi0_uwing, gwr%comm%value, ierr)
7258 
7259  chi0_head = chi0_head * weight / cryst%ucvol
7260  ! Tensor in terms of reciprocal lattice vectors.
7261  do io=1,nomega
7262    chi0_head(:,:,io) = matmul(chi0_head(:,:,io), cryst%gmet) * (two_pi**2)
7263  end do
7264  chi0_lwing = chi0_lwing * weight / cryst%ucvol
7265  chi0_uwing = chi0_uwing * weight / cryst%ucvol
7266 
7267  ! ===============================================
7268  ! ==== Symmetrize chi0 in case of AFM system ====
7269  ! ===============================================
7270  ! Reconstruct $chi0{\down,\down}$ from $chi0{\up,\up}$.
7271  ! Works only in the case of magnetic group Shubnikov type IV.
7272  if (cryst%use_antiferro) then
7273    call symmetrize_afm_chi0(Cryst, gsph, ltg_q, npwe, nomega, &
7274      chi0_head=chi0_head, chi0_lwing=chi0_lwing, chi0_uwing=chi0_uwing)
7275  end if
7276 
7277  if (gwr%comm%me == 0 .and. gwr%dtset%prtvol >= 1) then
7278    ! Construct head and wings from the tensor and output results.
7279    qlen = tol3
7280    call cryst%get_redcart_qdirs(nq, qdirs, qlen=qlen)
7281    ABI_MALLOC(head_qvals, (nq))
7282    call wrtout(units, " Head of the irreducible polarizability for q --> 0", pre_newlines=1)
7283    call wrtout(units, sjoin(" q0_len:", ftoa(qlen), "(Bohr^-1)"))
7284    write(msg, "(*(a14))") "iomega (eV)", "[100]", "[010]", "[001]", "x", "y", "z"
7285    call wrtout(units, msg)
7286    do io=1,nomega
7287      do iq=1,nq
7288        chq = matmul(chi0_head(:,:,io), qdirs(:,iq))
7289        head_qvals(iq) = vdotw(qdirs(:, iq), chq, cryst%gmet, "G")
7290      end do
7291      write(msg, "(*(es12.5,2x))") gwr%iw_mesh(io) * Ha_eV, real(head_qvals(:))
7292      call wrtout(units, msg)
7293      ! Write imag part to std_out only
7294      write(msg, "(*(es12.5,2x))") gwr%iw_mesh(io) * Ha_eV, aimag(head_qvals(:))
7295      call wrtout(std_out, msg)
7296    end do
7297    call wrtout(units, " ")
7298    ABI_FREE(qdirs)
7299    ABI_FREE(head_qvals)
7300  end if
7301 
7302  ! Save quantities for later use as this routine must be called before build_tchi.
7303  if (gwr%kpt_comm%me == 0) then
7304    ABI_REMALLOC(gwr%chi0_head_myw, (3, 3, gwr%my_ntau) )
7305    ABI_REMALLOC(gwr%chi0_uwing_myw, (3, npwe, gwr%my_ntau) )
7306    ABI_REMALLOC(gwr%chi0_lwing_myw, (3, npwe, gwr%my_ntau) )
7307 
7308    do my_it=1,gwr%my_ntau
7309      itau = gwr%my_itaus(my_it)
7310      gwr%chi0_head_myw(:,:,my_it) = chi0_head(:,:,itau)
7311      do ii=1,3
7312        gwr%chi0_uwing_myw(ii,:,my_it) = chi0_uwing(:,itau,ii)
7313        gwr%chi0_lwing_myw(ii,:,my_it) = chi0_lwing(:,itau,ii)
7314      end do
7315    end do
7316  end if
7317 
7318  ABI_FREE(chi0_lwing)
7319  ABI_FREE(chi0_uwing)
7320  ABI_FREE(chi0_head)
7321  call ltg_q%free()
7322  call gsph%free()
7323 
7324  call cwtime_report(" gwr_build_chi0_head_and_wings:", cpu_all, wall_all, gflops_all)
7325  call timab(1927, 2, tsec)
7326 
7327 end subroutine gwr_build_chi0_head_and_wings

m_gwr/gwr_build_green [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_green

FUNCTION

  Build Green's functions in imaginary time from the gwr%ugb matrices stored in memory.
  Store only G_k for the IBZ k-points treated by this MPI proc.

INPUTS

  free_ugb: True if the gwr%ugb wavefunctions should be deallocated before returning.

SOURCE

2383 subroutine gwr_build_green(gwr, free_ugb)
2384 
2385 !Arguments ------------------------------------
2386  class(gwr_t),target,intent(inout) :: gwr
2387  logical,intent(in) :: free_ugb
2388 
2389 !Local variables-------------------------------
2390 !scalars
2391  integer :: my_is, my_iki, spin, ik_ibz, band, itau, ipm, il_b, npwsp, isgn, my_it, nb_occ
2392  real(dp) :: f_nk, eig_nk, cpu, wall, gflops, cpu_k, wall_k, gflops_k
2393  logical :: print_time
2394  character(len=500) :: msg
2395  real(dp) :: gt_rfact
2396  type(__slkmat_t), target :: work_gb, green
2397 !arrays
2398  integer :: mask_kibz(gwr%nkibz), units(2), ija(2), ijb(2)
2399  real(dp) :: tsec(2)
2400  real(dp),contiguous, pointer :: qp_eig(:,:,:), qp_occ(:,:,:)
2401 
2402 ! *************************************************************************
2403 
2404  call cwtime(cpu, wall, gflops, "start")
2405  call timab(1922, 1, tsec)
2406  units = [std_out, ab_out]
2407 
2408  ! Use KS or QP energies depending on the iteration state.
2409  if (gwr%scf_iteration == 1) then
2410    call wrtout(units, " Building Green's functions from KS orbitals and KS energies...", &
2411                pre_newlines=2, newlines=1, do_flush=.True.)
2412    qp_eig => gwr%ks_ebands%eig; qp_occ => gwr%ks_ebands%occ
2413    msg = sjoin("Fermi energy is not set to zero! fermie:", ftoa(gwr%ks_ebands%fermie))
2414    ABI_CHECK(abs(gwr%ks_ebands%fermie) < tol12, msg)
2415 
2416    ! Allocate my Green's functions in IBZ if this is the first iteration.
2417    mask_kibz = 0; mask_kibz(gwr%my_kibz_inds(:)) = 1
2418    call gwr%malloc_free_mats(mask_kibz, "green", "malloc")
2419 
2420  else
2421    call wrtout(units, " Building Green's functions from KS orbitals and QP energies...", &
2422                pre_newlines=2, newlines=1, do_flush=.True.)
2423    qp_eig => gwr%qp_ebands%eig; qp_occ => gwr%qp_ebands%occ
2424    msg = sjoin("Fermi energy is not set to zero! fermie:", ftoa(gwr%qp_ebands%fermie))
2425    ABI_CHECK(abs(gwr%qp_ebands%fermie) < tol12, msg)
2426  end if
2427 
2428  ABI_CHECK(allocated(gwr%ugb), "gwr%ugb array should be allocated!")
2429 
2430  do my_is=1,gwr%my_nspins
2431    spin = gwr%my_spins(my_is)
2432    ! Loop over my k-points in the IBZ
2433    do my_iki=1,gwr%my_nkibz
2434      print_time = gwr%comm%me == 0 .and. (my_iki < LOG_MODK .or. mod(my_iki, LOG_MODK) == 0)
2435      if (print_time) call cwtime(cpu_k, wall_k, gflops_k, "start")
2436      ik_ibz = gwr%my_kibz_inds(my_iki)
2437      associate (ugb_ks => gwr%ugb(ik_ibz, spin), desc_k => gwr%green_desc_kibz(ik_ibz))
2438      npwsp = desc_k%npw * gwr%nspinor
2439 
2440      call ugb_ks%copy(work_gb)
2441      !call ugb_ks%change_size_blocs(work_gb, size_blocs=, processor=)
2442      !call work_gb%copy(green, empty=.True.)
2443 
2444      ! Init output of pzgemm in g-communicator
2445      call green%init(npwsp, npwsp, gwr%g_slkproc, istwfk1) ! size_blocs=[-1, col_bsize])
2446 
2447      ! Loop over my_ntau as pzgemm is MPI-parallelized inside g_comm.
2448      do my_it=1,gwr%my_ntau
2449        itau = gwr%my_itaus(my_it)
2450        do ipm=1,2
2451          ! Multiply my columns by exponentials in imaginary time.
2452          work_gb%buffer_cplx = ugb_ks%buffer_cplx
2453 
2454          !$OMP PARALLEL DO PRIVATE(band, f_nk, eig_nk, gt_rfact)
2455          do il_b=1, work_gb%sizeb_local(2)
2456            band = work_gb%loc2gcol(il_b)
2457            f_nk = qp_occ(band, ik_ibz, spin)
2458            eig_nk = qp_eig(band, ik_ibz, spin)
2459            gt_rfact = zero
2460            if (ipm == 2) then
2461              if (eig_nk < -tol6) gt_rfact = exp(gwr%tau_mesh(itau) * eig_nk)
2462            else
2463              if (eig_nk > tol6) gt_rfact = exp(-gwr%tau_mesh(itau) * eig_nk)
2464            end if
2465 
2466            !work_gb%buffer_cplx(:,il_b) = work_gb%buffer_cplx(:,il_b) * sqrt(gt_rfact)
2467            call xscal(npwsp, real(sqrt(gt_rfact), kind=gwpc), work_gb%buffer_cplx(:,il_b), 1)
2468          end do ! il_b
2469 
2470          ! Now build G(g,g',ipm) with PZGEMM.
2471          isgn = merge(1, -1, ipm == 2)
2472          ija = [1, 1]; ijb = [1, 1]
2473          ! TODO: optimize
2474          nb_occ = -1
2475          !if (ipm == 1) then
2476          !  ija = [1, nb_occ]; ijb = ija
2477          !else
2478          !  ija = [nb_occ+1, gwr%ugb_nband]; ijb = ija
2479          !end if
2480          call slk_pgemm("N", "C", work_gb, isgn * cone_gw, work_gb, czero_gw, green, ija=ija, ijb=ijb)
2481 
2482          ! Redistribute data.
2483          call gwr%gt_kibz(ipm, ik_ibz, itau, spin)%take_from(green)
2484        end do ! ipm
2485      end do ! itau
2486 
2487      call work_gb%free(); call green%free()
2488      ! Free wavefunctions if asked for.
2489      if (free_ugb) call ugb_ks%free()
2490 
2491      if (print_time) then
2492        write(msg,'(4x,3(a,i0),a)')"G_ikbz [", my_iki, "/", gwr%my_nkibz, "] (tot: ", gwr%nkibz, ")"
2493        call cwtime_report(msg, cpu_k, wall_k, gflops_k); if (my_iki == LOG_MODK) call wrtout(std_out, " ...")
2494      end if
2495      end associate
2496    end do ! my_iki
2497  end do ! my_is
2498 
2499  if (gwr%dtset%prtvol > 0) call gwr_print_trace(gwr, "gt_kibz")
2500  call gwr%print_mem(unit=std_out)
2501 
2502  call cwtime_report(" gwr_build_green:", cpu, wall, gflops)
2503  call timab(1922, 2, tsec)
2504 
2505 end subroutine gwr_build_green

m_gwr/gwr_build_sigmac [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_sigmac

FUNCTION

  Build Sigma_c(i tau) and compute matrix elements in the KS basis set.

INPUTS

OUTPUT

SOURCE

5103 subroutine gwr_build_sigmac(gwr)
5104 
5105 !Arguments ------------------------------------
5106  class(gwr_t),target,intent(inout) :: gwr
5107 
5108 #ifndef FC_CRAY
5109 !Local variables-------------------------------
5110 !scalars
5111  integer,parameter :: master = 0
5112  integer :: my_is, my_it, spin, ikcalc_ibz, ik_ibz, sc_nfft, my_ir, my_nr, iw, idat, max_ndat, ndat, ii, jj, irow
5113  integer :: iq_ibz, iq_bz, itau, ierr, ibc, bmin, bmax, band, band1
5114  integer :: band2, band2_start, band2_stop, nbc, ib1, ib2, pade_npts
5115  integer :: my_ikf, ipm, ik_bz, ikcalc, uc_ir, ir, ncid, col_bsize, nrsp, sc_nfftsp
5116  integer :: isym_k, trev_k, g0_k(3), tsign_k !, b1gw, b2gw, ! npwsp, my_iqi, sc_ir, ig, my_iqf,
5117  integer(kind=XMPI_ADDRESS_KIND) :: buf_count
5118  integer :: gt_scbox_win, wct_scbox_win, use_umklp, ideg, nstates
5119  real(dp) :: cpu_tau, wall_tau, gflops_tau, cpu_all, wall_all, gflops_all !, cpu, wall, gflops
5120  real(dp) :: mem_mb, cpu_ir, wall_ir, gflops_ir, cpu_ikf, wall_ikf, gflops_ikf
5121  real(dp) :: max_abs_imag_wct, max_abs_re_wct, sck_ucvol, scq_ucvol, wtqm, wtqp
5122  logical :: k_is_gamma, use_shmem_for_k, use_mpi_for_k, isirr_k
5123  logical :: compute_this_kbz, print_time, define, sigc_is_herm, band_inversion
5124  character(len=500) :: msg
5125  !type(desc_t), pointer :: desc_q !, desc_k
5126  type(yamldoc_t) :: ydoc
5127  type(c_ptr) :: void_ptr
5128 !arrays
5129  integer :: sc_ngfft(18), need_qibz(gwr%nqibz), got_qibz(gwr%nqibz), units(2), dat_units(3), g0_q(3) ! gg(3),
5130  integer,allocatable :: green_scgvec(:,:), wc_scgvec(:,:)
5131  real(dp) :: kk_bz(3), kcalc_bz(3), qq_bz(3), tsec(2)  !, qq_ibz(3)
5132  complex(gwpc) :: cpsi_r, sigc_pm(2)
5133  complex(dp) :: odd_t(gwr%ntau), even_t(gwr%ntau), avg_2ntau(2,gwr%ntau)
5134  complex(dp),target,allocatable :: sigc_it_mat(:,:,:,:,:,:)
5135  complex(gwpc) ABI_ASYNC, contiguous, pointer :: gt_scbox(:,:,:), wct_scbox(:,:)
5136  complex(gwpc),allocatable :: uc_psir_bk(:,:,:), scph1d_kcalc(:,:,:), uc_ceikr(:), ur(:)
5137  type(__slkmat_t) :: gt_gpr(2, gwr%my_nkbz), gk_rpr_pm(2), sigc_rpr(2,2,gwr%nkcalc), wc_rpr, wc_gpr(gwr%my_nqbz)
5138  type(desc_t), target :: desc_mykbz(gwr%my_nkbz), desc_myqbz(gwr%my_nqbz)
5139  type(fftbox_plan3_t) :: green_plan, wt_plan
5140  type(littlegroup_t) :: ltg_kcalc(gwr%nkcalc)
5141  type(gaps_t) :: new_gaps
5142  integer :: band_val, ibv, ncerr, unt_it, unt_iw, unt_rw
5143  real(dp) :: e0, ks_gap, qp_gap, qp_pade_gap, sigx, vxc_val, vu, v_meanf, eshift, sigma_fact
5144  complex(dp) :: zz, zsc, sigc_e0__, dsigc_de0, z_e0, sig_xc, hhartree_bk, qp_ene, qp_ene_prev
5145  integer,allocatable :: iperm(:)
5146  integer :: gt_request, wct_request
5147  real(dp),allocatable :: sorted_qpe(:)
5148  real(dp) :: e0_kcalc(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol), rw_mesh(gwr%nwr)
5149  real(dp) :: spfunc_diag(gwr%nwr, gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5150  integer :: pade_solver_ierr(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5151  real(dp) :: ks_gaps(gwr%nkcalc, gwr%nsppol), qpz_gaps(gwr%nkcalc, gwr%nsppol) !, qp_pade_gaps(gwr%nkcalc, gwr%nsppol)
5152  complex(dp) :: ze0_kcalc(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5153  complex(dp) :: sigc_e0(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5154  complex(dp) :: qpz_ene(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol), imag_zmesh(gwr%ntau)
5155  complex(dp) :: qp_pade(gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5156  complex(dp) :: sigxc_rw_diag(gwr%nwr, gwr%b1gw:gwr%b2gw, gwr%nkcalc, gwr%nsppol)
5157  type(sigma_pade_t) :: spade
5158  type(sigijtab_t),allocatable :: Sigxij_tab(:,:), Sigcij_tab(:,:)
5159 
5160 ! *************************************************************************
5161  call cwtime(cpu_all, wall_all, gflops_all, "start")
5162  call timab(1925, 1, tsec)
5163 
5164  ABI_CHECK(gwr%wc_space == "itau", sjoin("wc_space: ", gwr%wc_space, " != itau"))
5165 
5166  !mask_kibz = 0; mask_kibz(gwr%my_kibz_inds(:)) = 1
5167  !call gwr%malloc_free_mats(mask_kibz, "sigma" "malloc")
5168 
5169  !if (gwr%scf_iteration == 1) then
5170  !else
5171  !end if
5172 
5173  ! Set FFT mesh in the supercell.
5174  ! Be careful when using the FFT plan as ndat can change inside the loop if we start to block.
5175  ! Perhaps the safest approach would be to generate the plan on the fly.
5176 
5177  sc_ngfft = gwr%g_ngfft
5178  sc_ngfft(1:3) = gwr%ngkpt * gwr%g_ngfft(1:3)
5179  sc_ngfft(4:6) = sc_ngfft(1:3)
5180  sc_nfft = product(sc_ngfft(1:3)); sc_nfftsp = sc_nfft * gwr%nspinor
5181  !sc_mgfft = maxval(sc_ngfft(1:3))
5182  sck_ucvol = gwr%cryst%ucvol * product(gwr%ngkpt)
5183  scq_ucvol = gwr%cryst%ucvol * product(gwr%ngqpt)
5184 
5185  ! Set FFT mesh used to compute u(r) in the unit cell.
5186  call gwr%kcalc_wfd%change_ngfft(gwr%cryst, gwr%psps, gwr%g_ngfft)
5187 
5188  ! Table for \Sigmac_ij matrix elements.
5189  sigc_is_herm = .False.
5190  call sigtk_sigma_tables(gwr%nkcalc, gwr%nkibz, gwr%nsppol, gwr%bstart_ks, gwr%bstop_ks, gwr%kcalc2ibz(:,1), &
5191                          gwr%sig_diago, sigc_is_herm, sigxij_tab, sigcij_tab)
5192 
5193  call sigijtab_free(Sigxij_tab)
5194  ABI_FREE(Sigxij_tab)
5195 
5196  units = [std_out, ab_out]
5197  !if (gwr%sig_diago) then
5198  !  call wrtout(units, " Computing diagonal matrix elements of Sigma_c", pre_newlines=1)
5199  !else
5200  !  call wrtout(units, " Computing diagonal + off-diagonal matrix elements of Sigma_c", pre_newlines=1)
5201  !end if
5202 
5203  ! Allocate matrix elements Sigmac_(itau) in the KS basis set.
5204  ii = gwr%b1gw; jj = gwr%b2gw
5205  if (gwr%sig_diago) then
5206    ii = 1; jj = 1
5207  end if
5208  ABI_CALLOC(sigc_it_mat, (2, gwr%ntau, gwr%b1gw:gwr%b2gw, ii:jj, gwr%nkcalc, gwr%nsppol))
5209  ABI_RECALLOC(gwr%sigc_iw_mat, (gwr%ntau, gwr%b1gw:gwr%b2gw, ii:jj, gwr%nkcalc, gwr%nsppol))
5210 
5211  max_abs_imag_wct = zero; max_abs_re_wct = zero
5212  call gwr%print_mem(unit=std_out)
5213 
5214 if (gwr%use_supercell_for_sigma) then
5215 
5216  ! NOTE:
5217  ! There are two possibilities here:
5218  !
5219  ! 1) Compute the matrix elements of Sigma_c in the KS basis set by integrating over the real-space supercell.
5220  !
5221  ! 2) Compute and store Sigma_c^k(g,g',iomega) and then compute the matrix elements in g-space.
5222  !
5223  ! The first option requires less memory provided we are interested in a small set of KS states.
5224  ! The second option is interesting if we need to compute several matrix elements, including off-diagonal terms.
5225  call print_sigma_header()
5226 
5227  max_ndat = gwr%sc_batch_size
5228  use_mpi_for_k = gwr%sc_batch_size > 1 .and. gwr%sc_batch_size == gwr%kpt_comm%nproc
5229  use_mpi_for_k = .False.
5230 
5231  use_shmem_for_k = gwr%sc_batch_size == gwr%kpt_comm%nproc .and. gwr%kpt_comm%nproc > 1
5232  use_shmem_for_k = use_shmem_for_k .and. gwr%kpt_comm%can_use_shmem()
5233  !use_shmem_for_k = .False.
5234 
5235  if (use_shmem_for_k) then
5236    buf_count = 2 * (sc_nfftsp * max_ndat * 2)
5237    call gwr%kpt_comm%allocate_shared_master(buf_count, gwpc, xmpi_info_null, void_ptr, gt_scbox_win)
5238    call c_f_pointer(void_ptr, gt_scbox, shape=[sc_nfftsp, max_ndat, 2])
5239    buf_count = 2 * (sc_nfftsp * max_ndat)
5240    call gwr%kpt_comm%allocate_shared_master(buf_count, gwpc, xmpi_info_null, void_ptr, wct_scbox_win)
5241    call c_f_pointer(void_ptr, wct_scbox, shape=[sc_nfftsp, max_ndat])
5242  end if
5243 
5244  call wrtout(std_out, sjoin(" use_mpi_for_k:", yesno(use_mpi_for_k)))
5245  call wrtout(std_out, sjoin(" use_shmem_for_k:", yesno(use_shmem_for_k)))
5246  mem_mb = 3 * (sc_nfftsp * max_ndat * gwpc) * b2Mb
5247  call wrtout(std_out, sjoin(" Memory for gt_scbox/wct_scbox arrays:", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
5248 
5249  if (.not. use_shmem_for_k) then
5250    ABI_CALLOC(gt_scbox, (sc_nfft * gwr%nspinor, max_ndat, 2))
5251    ABI_CALLOC(wct_scbox, (sc_nfft * gwr%nspinor, max_ndat))
5252  end if
5253 
5254  ! Build plans for dense FFTs.
5255  call green_plan%from_ngfft(sc_ngfft, gwr%nspinor*max_ndat*2, gwr%dtset%gpu_option)
5256  call wt_plan%from_ngfft(sc_ngfft, gwr%nspinor*max_ndat, gwr%dtset%gpu_option)
5257 
5258  sigma_fact = one / (sck_ucvol * scq_ucvol)
5259 
5260  ! The g-vectors in the supercell for G and tchi.
5261  ABI_MALLOC(green_scgvec, (3, gwr%green_mpw))
5262  ABI_MALLOC(wc_scgvec, (3, gwr%tchi_mpw))
5263 
5264  do my_is=1,gwr%my_nspins
5265    spin = gwr%my_spins(my_is)
5266 
5267    ! Load wavefunctions for GW corrections in the unit cell.
5268    ! TODO: MPI distribute or use MPI shared memory
5269    bmin = minval(gwr%bstart_ks(:, spin)); bmax = maxval(gwr%bstop_ks(:, spin))
5270    ABI_MALLOC_OR_DIE(uc_psir_bk, (gwr%g_nfft * gwr%nspinor, bmin:bmax, gwr%nkcalc), ierr)
5271    ABI_MALLOC(ur, (gwr%g_nfft * gwr%nspinor))
5272    ABI_MALLOC(uc_ceikr, (gwr%g_nfft * gwr%nspinor))
5273 
5274    do ikcalc=1,gwr%nkcalc
5275      kcalc_bz = gwr%kcalc(:, ikcalc); ikcalc_ibz = gwr%kcalc2ibz(ikcalc, 1)  ! NB: Assuming wfs in the IBZ.
5276      ! Compute e^{ik.r} phases in the unit cell.
5277      call calc_ceikr(kcalc_bz, gwr%g_ngfft, gwr%g_nfft, gwr%nspinor, uc_ceikr)
5278 
5279      do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5280        call gwr%kcalc_wfd%get_ur(band, ikcalc_ibz, spin, ur)
5281        uc_psir_bk(:, band, ikcalc) = ur * uc_ceikr
5282      end do
5283    end do ! ikcalc
5284 
5285    ABI_FREE(ur)
5286    ABI_FREE(uc_ceikr)
5287 
5288    ! Pre-compute one-dimensional factors to get 3d e^{ik.L}
5289    call get_1d_sc_phases(gwr%ngkpt, gwr%nkcalc, gwr%kcalc, scph1d_kcalc)
5290 
5291    ! Construct Sigma(itau) in the supercell.
5292    do my_it=1,gwr%my_ntau
5293      call cwtime(cpu_tau, wall_tau, gflops_tau, "start")
5294      itau = gwr%my_itaus(my_it)
5295      !if (my_it == 1 .and. gwr%comm%me == 0) call gwr%pstat%print([std_out], reload=.True.)
5296 
5297      ! G_k(g,g') --> G_k(g',r) e^{ik.r} for each k in the BZ treated by me.
5298      call gwr%get_myk_green_gpr(itau, spin, desc_mykbz, gt_gpr)
5299 
5300      ! Wc_q(g,g') --> Wc_q(g',r) e^{iq.r} for each q in the BZ treated by me.
5301      call gwr%get_myq_wc_gpr(itau, spin, desc_myqbz, wc_gpr)
5302      !if (my_it == 1 .and. gwr%comm%me == 0) call gwr%pstat%print([std_out], reload=.True.)
5303 
5304      my_nr = gt_gpr(1,1)%sizeb_local(2)
5305      ABI_CHECK(my_nr == wc_gpr(1)%sizeb_local(2), "my_nr != wc_gpr(1)%sizeb_local(2)")
5306 
5307      ! Loop over r in the unit cell that is now MPI-distributed inside g_comm.
5308      do my_ir=1, my_nr, gwr%sc_batch_size
5309        print_time = (gwr%comm%me == 0 .and. (my_ir <= 3 * gwr%sc_batch_size .or. mod(my_ir, LOG_MODR) == 0))
5310        if (print_time) call cwtime(cpu_ir, wall_ir, gflops_ir, "start")
5311        ndat = blocked_loop(my_ir, my_nr, gwr%sc_batch_size)
5312        uc_ir = gt_gpr(1,1)%loc2gcol(my_ir)  ! FIXME: This won't work if nspinor 2
5313 
5314        ! TODO: Should block using nproc in kpt_comm, scatter data and perform multiple FFTs in parallel.
5315 if (.not. use_shmem_for_k) then
5316 
5317        ! Insert G_k(g',r) in G'-space in the supercell FFT box (ndat vectors starting at my_ir).
5318        call gwr%gk_to_scbox(sc_ngfft, desc_mykbz, green_scgvec, my_ir, ndat, gt_gpr, gt_scbox)
5319        if (gwr%kpt_comm%nproc > 1) call xmpi_isum_ip(gt_scbox, gwr%kpt_comm%value, gt_request, ierr)
5320 
5321        ! Insert Wc_q(g',r) in G'-space in the supercell FFT box (ndat vectors starting at my_ir)
5322        call gwr%wcq_to_scbox(sc_ngfft, desc_myqbz, wc_scgvec, my_ir, ndat, wc_gpr, wct_scbox)
5323        if (gwr%kpt_comm%nproc > 1) call xmpi_isum_ip(wct_scbox, gwr%kpt_comm%value, wct_request, ierr)
5324 
5325        ! G(G',r) --> G(R',r)
5326        if (gwr%kpt_comm%nproc > 1) call xmpi_wait(gt_request, ierr)
5327        call green_plan%execute(gt_scbox(:,1,1), -1, iscale=0)
5328 
5329        ! Wc(G',r) --> Wc(R',r)
5330        if (gwr%kpt_comm%nproc > 1) call xmpi_wait(wct_request, ierr)
5331        call wt_plan%execute(wct_scbox(:,1), -1, iscale=0)
5332 
5333        ! Use gt_scbox to store GW (R',r, +/- i tau) for this set of ndat r-point
5334        gt_scbox(:,:,1) = gt_scbox(:,:,1) * wct_scbox(:,:) * sigma_fact
5335        gt_scbox(:,:,2) = gt_scbox(:,:,2) * wct_scbox(:,:) * sigma_fact
5336        !print *, "Maxval abs imag G:", maxval(abs(aimag(gt_scbox)))
5337 
5338 else
5339        ! use_shmem_for_k --> MPI shared window version. Only gt_scbox are wct_scbox are shared.
5340        call gwr%gk_to_scbox(sc_ngfft, desc_mykbz, green_scgvec, my_ir, ndat, gt_gpr, gt_scbox, &
5341                             gt_scbox_win=gt_scbox_win)
5342 
5343        call gwr%wcq_to_scbox(sc_ngfft, desc_myqbz, wc_scgvec, my_ir, ndat, wc_gpr, wct_scbox, &
5344                              wct_scbox_win=wct_scbox_win)
5345 
5346        ! Now each MPI proc operates on different idat entries.
5347        call xmpi_win_fence(gt_scbox_win)
5348        idat = gwr%kpt_comm%me + 1
5349        if (idat <= ndat) then
5350          call wt_plan%execute(wct_scbox(:,idat), -1, ndat=gwr%nspinor, iscale=0)
5351          do ipm=1,2
5352            call green_plan%execute(gt_scbox(:,idat,ipm), -1, ndat=gwr%nspinor, iscale=0)
5353            gt_scbox(:,idat,ipm) = gt_scbox(:,idat,ipm) * wct_scbox(:,idat) * sigma_fact
5354          end do
5355        end if
5356        !IF (.not. MPI_ASYNC_PROTECTS_NONBLOCKING) CALL MPI_F_SYNC_REG(gt_scbox)
5357        !call xmpi_barrier(gwr%kpt_comm%value)
5358        call xmpi_win_fence(gt_scbox_win)
5359 end if
5360 
5361        ! Integrate Sigma matrix elements in the R-supercell for ndat r-points and accumulate.
5362        ! possibly including off-diagonal terms.
5363        do ikcalc=1,gwr%nkcalc
5364          if (gwr%kpt_comm%skip(ikcalc)) cycle ! FIXME: Temporary hack till I find a better MPI algo for k-points.
5365          k_is_gamma = normv(gwr%kcalc(:,ikcalc), gwr%cryst%gmet, "G") < GW_TOLQ0
5366 
5367          do band2=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5368          do irow=1,Sigcij_tab(ikcalc, spin)%col(band2)%size1
5369            band1 = Sigcij_tab(ikcalc, spin)%col(band2)%bidx(irow)
5370            do idat=1,ndat
5371              !if (use_shmem_for_k .and. idat /= gwr%kpt_comm + 1) cycle
5372              ir = uc_ir + idat - 1
5373              cpsi_r = conjg(uc_psir_bk(ir, band1, ikcalc))
5374              do ipm=1,2
5375                call sc_sum(gwr%ngkpt, gwr%g_ngfft, gwr%nspinor, scph1d_kcalc(:,:,ikcalc), k_is_gamma, &
5376                            cpsi_r, gt_scbox(:,idat,ipm), uc_psir_bk(:, band2, ikcalc), sigc_pm(ipm))
5377              end do
5378              if (gwr%sig_diago) then
5379                sigc_it_mat(:, itau,band1,1,ikcalc,spin) = sigc_it_mat(:,itau,band1,1,ikcalc,spin) + sigc_pm(:)
5380              else
5381                sigc_it_mat(:,itau,band1,band2,ikcalc,spin) = sigc_it_mat(:,itau,band1,band2,ikcalc,spin) + sigc_pm(:)
5382              end if
5383            end do ! idat
5384          end do
5385          end do ! band2
5386        end do ! ikcalc
5387 
5388        !if (use_shmem_for_k) call xmpi_sum
5389 
5390        if (print_time) then
5391          write(msg,'(4x,3(a,i0),a)')"Sigma_c my_ir [", my_ir, "/", my_nr, "] (tot: ", gwr%g_nfft, ")"
5392          call cwtime_report(msg, cpu_ir, wall_ir, gflops_ir)
5393        end if
5394      end do ! my_ir
5395 
5396      ! Free descriptors and PBLAS matrices in kBZ and qBZ.
5397      call desc_array_free(desc_mykbz); call desc_array_free(desc_myqbz)
5398      call slk_array_free(gt_gpr); call slk_array_free(wc_gpr)
5399 
5400      write(msg,'(1x,3(a,i0),a)')"Sigma_c my_itau [", my_it, "/", gwr%my_ntau, "] (tot: ", gwr%ntau, ")"
5401      call cwtime_report(msg, cpu_tau, wall_tau, gflops_tau, end_str=ch10)
5402    end do ! my_it
5403 
5404    ABI_FREE(scph1d_kcalc)
5405    ABI_FREE(uc_psir_bk)
5406  end do ! my_is
5407 
5408  sigc_it_mat = -sigc_it_mat * (gwr%cryst%ucvol / gwr%g_nfft) ** 2
5409 
5410  !call wrtout(std_out, sjoin(" Maxval abs re W:", ftoa(max_abs_re_wct)))
5411  !call wrtout(std_out, sjoin(" Maxval abs imag W:", ftoa(max_abs_imag_wct)))
5412  if (.not. use_shmem_for_k) then
5413    ABI_FREE(gt_scbox)
5414    ABI_FREE(wct_scbox)
5415  else
5416    call xmpi_win_free(gt_scbox_win)
5417    call xmpi_win_free(wct_scbox_win)
5418  end if
5419 
5420  call green_plan%free()
5421  call wt_plan%free()
5422 
5423  ABI_FREE(green_scgvec)
5424  ABI_FREE(wc_scgvec)
5425 
5426 else
5427  ! ===================================================================
5428  ! Mixed-space algorithm in the unit cell with convolutions in k-space
5429  ! ===================================================================
5430  call print_sigma_header()
5431 
5432  ! Define tables to account for symmetries:
5433  !  - when looping over the BZ, we only need to include the union of IBZ_x for x in kcalc.
5434  !  - when accumulating the self-energy, we have to use weights that depend on x.
5435 
5436  ! * The little group is needed when symsigma == 1
5437  ! * If use_umklp == 1 then symmetries requiring an umklapp to preserve k_gw are included as well.
5438  ! * Note that TR is not yet supported so timrev is set to 1 even if TR has been used to generate the GS IBZ.
5439  use_umklp = 1
5440  do ikcalc=1,gwr%nkcalc
5441    call ltg_kcalc(ikcalc)%init(gwr%kcalc(:,ikcalc), gwr%nkbz, gwr%kbz, gwr%cryst, use_umklp, npwe=0, timrev=1)
5442    call ltg_kcalc(ikcalc)%print(unit=std_out, prtvol=gwr%dtset%prtvol)
5443  end do
5444 
5445  ! Allocate PBLAS matrices to store Wc_q(r',r,tau), and Sigma_kcalc(r',r,+/-tau) in the unit cell.
5446  nrsp = gwr%g_nfft * gwr%nspinor
5447  col_bsize = nrsp / gwr%g_comm%nproc; if (mod(nrsp, gwr%g_comm%nproc) /= 0) col_bsize = col_bsize + 1
5448 
5449  call wc_rpr%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
5450  do ipm=1,2
5451    call gk_rpr_pm(ipm)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
5452    do ikcalc=1,gwr%nkcalc
5453      call sigc_rpr(1,ipm,ikcalc)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
5454      ! For sigma we have to decompose it in hermitian/anti-hermitian part.
5455      !call sigc_rpr(2,ipm,ikcalc)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
5456    end do
5457  end do
5458 
5459  mem_mb = slk_array_locmem_mb(wc_rpr) + sum(slk_array_locmem_mb(gk_rpr_pm)) + sum(slk_array_locmem_mb(sigc_rpr))
5460  call wrtout(std_out, sjoin(" Local memory for PBLAS (r,r') matrices: ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
5461 
5462  do my_is=1,gwr%my_nspins
5463    spin = gwr%my_spins(my_is)
5464 
5465    ! Load wavefunctions for GW corrections in the real-space unit cell.
5466    ! TODO: MPI distribute or use MPI shared memory
5467    bmin = minval(gwr%bstart_ks(:, spin)); bmax = maxval(gwr%bstop_ks(:, spin))
5468    ABI_MALLOC_OR_DIE(uc_psir_bk, (nrsp, bmin:bmax, gwr%nkcalc), ierr)
5469    ABI_MALLOC(ur, (nrsp))
5470 
5471    do ikcalc=1,gwr%nkcalc
5472      kcalc_bz = gwr%kcalc(:, ikcalc); ikcalc_ibz = gwr%kcalc2ibz(ikcalc, 1)  ! NB: Assuming wfs in IBZ
5473      do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5474        call gwr%kcalc_wfd%get_ur(band, ikcalc_ibz, spin, ur)
5475        uc_psir_bk(:, band, ikcalc) = ur
5476      end do
5477    end do
5478    ABI_FREE(ur)
5479 
5480    need_qibz = 0
5481    do my_ikf=1,gwr%my_nkbz
5482      ik_bz = gwr%my_kbz_inds(my_ikf); kk_bz = gwr%kbz(:, ik_bz)
5483      do ikcalc=1,gwr%nkcalc
5484        qq_bz = gwr%kcalc(:,ikcalc) - kk_bz
5485        !qq_bz = -qq_bz
5486        ! TODO: here I may need to take into account the umklapp
5487        call findqg0(iq_bz, g0_q, qq_bz, gwr%nqbz, gwr%qbz, gwr%mG0)
5488        !ABI_CHECK(all(g0_q == 0), sjoin("g0_q != 0, kcalc", ktoa(gwr%kcalc(:,ikcalc)), "kk_bz:", ktoa(kk_bz)))
5489        iq_ibz = gwr%qbz2ibz(1, iq_bz)
5490        need_qibz(iq_ibz) = 1
5491      end do
5492    end do
5493 
5494    ! Construct Sigma(itau) using convolutions in k-space and real-space representation in the unit cell.
5495    do my_it=1,gwr%my_ntau
5496      call cwtime(cpu_tau, wall_tau, gflops_tau, "start")
5497      itau = gwr%my_itaus(my_it)
5498 
5499      ! Redistribute W_q(g,g') in the IBZ so that each MPI proc can reconstruct Wc_q in the BZ inside the loops
5500      call gwr%redistrib_mats_qibz("wc", itau, spin, need_qibz, got_qibz, "communicate")
5501      call slk_array_set(sigc_rpr, czero)
5502 
5503      ! Sum over my k-points in the BZ.
5504      do my_ikf=1,gwr%my_nkbz
5505        print_time = (gwr%comm%me == 0 .and. (my_ikf <= LOG_MODK .or. mod(my_ikf, LOG_MODK) == 0))
5506        if (print_time) call cwtime(cpu_ikf, wall_ikf, gflops_ikf, "start")
5507        ik_bz = gwr%my_kbz_inds(my_ikf); kk_bz = gwr%kbz(:,ik_bz)
5508 
5509        ik_ibz = gwr%kbz2ibz(1, ik_bz); isym_k = gwr%kbz2ibz(2, ik_bz)
5510        trev_k = gwr%kbz2ibz(6, ik_bz); g0_k = gwr%kbz2ibz(3:5, ik_bz)
5511        isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
5512        tsign_k = merge(1, -1, trev_k == 0)
5513        !if (.not. isirr_k) cycle
5514 
5515        ! Skip this BZ k-point if it's not in the IBZ(ikcalc) of some ikcalc.
5516        compute_this_kbz = .True.
5517        if (gwr%dtset%symsigma /= 0) then
5518          compute_this_kbz = .False.
5519          do ikcalc=1,gwr%nkcalc
5520            if (ltg_kcalc(ikcalc)%ibzq(ik_bz) == 1) then
5521              compute_this_kbz = .True.; exit
5522            end if
5523          end do
5524        end if
5525        if (.not. compute_this_kbz) cycle ! my_ikf loop
5526 
5527        ! Use symmetries to get G_kbz from the IBZ then G_k(g,g') --> G_k(r',r)
5528        call gwr%get_gkbz_rpr_pm(ik_bz, itau, spin, gk_rpr_pm)
5529 
5530        do ikcalc=1,gwr%nkcalc
5531          if (gwr%dtset%symsigma /= 0 .and. ltg_kcalc(ikcalc)%ibzq(ik_bz) == 0) cycle ! FIXME: iq_bz or ikq?
5532          qq_bz = gwr%kcalc(:, ikcalc) - kk_bz
5533          !qq_bz = -qq_bz
5534          ! TODO: here I may need to take into account the umklapp
5535          call findqg0(iq_bz, g0_q, qq_bz, gwr%nqbz, gwr%qbz, gwr%mG0)
5536          !ABI_CHECK(all(g0_q == 0), sjoin("g0_q != 0", ktoa(gwr%kcalc(:,ikcalc)), "kk_bz", ktoa(kk_bz)))
5537          !iq_ibz = gwr%qbz2ibz(1, iq_bz)
5538          call gwr%get_wc_rpr_qbz(g0_q, iq_bz, itau, spin, wc_rpr)
5539 
5540          ! The integration weight depends on ikcalc
5541          wtqp = one / gwr%nkbz; wtqm = zero
5542          if (gwr%dtset%symsigma /= 0) then
5543            ! If symsigma, symmetrize the matrix elements.
5544            ! Sum only q"s in IBZ_k. In this case elements are weighted
5545            ! according to wtqp and wtqm. wtqm is for time-reversal.
5546            !call ltg_kcalc(ikcalc)%get_weigts_ibz(ik_bz, wkbz_pm)
5547            associate (ltg_k => ltg_kcalc(ikcalc))
5548            !if (can_symmetrize(spin)) then
5549            wtqp = (one * sum(ltg_k%wtksym(1,:,ik_bz))) / gwr%nkbz   ! FIXME: iq_bz or ik_bz?
5550            wtqm = (one * sum(ltg_k%wtksym(2,:,ik_bz))) / gwr%nkbz
5551            end associate
5552          end if
5553 
5554          do ipm=1,2
5555            if (abs(wtqm) < tol12) then
5556              sigc_rpr(1,ipm,ikcalc)%buffer_cplx = sigc_rpr(1,ipm,ikcalc)%buffer_cplx + &
5557                 wtqp * gk_rpr_pm(ipm)%buffer_cplx * wc_rpr%buffer_cplx
5558            else
5559              ABI_ERROR(sjoin("TR is not yet implemented:, wqtm:", ftoa(wtqm)))
5560              sigc_rpr(1,ipm,ikcalc)%buffer_cplx = sigc_rpr(1,ipm,ikcalc)%buffer_cplx + &
5561                wtqp * (gk_rpr_pm(ipm)%buffer_cplx * wc_rpr%buffer_cplx)
5562 
5563              sigc_rpr(2,ipm,ikcalc)%buffer_cplx = sigc_rpr(2,ipm,ikcalc)%buffer_cplx + &
5564                wtqm * conjg(gk_rpr_pm(ipm)%buffer_cplx * wc_rpr%buffer_cplx)
5565 
5566              !sigc_rpr(1, ipm, ikcalc)%buffer_cplx = sigc_rpr(1, ipm, ikcalc)%buffer_cplx + &
5567              !    (wtqp + wtqm) * real(gk_rpr_pm(ipm)%buffer_cplx * wc_rpr%buffer_cplx, kind=gwpc) &
5568              !  + (wtqp - wtqm) * j_gw * aimag(gk_rpr_pm(ipm)%buffer_cplx * wc_rpr%buffer_cplx)
5569            end if
5570          end do ! ipm
5571 
5572        end do ! ikcalc
5573 
5574        if (print_time) then
5575          write(msg,'(4x,3(a,i0),a)')"Sigma_c my_ikf [", my_ikf, "/", gwr%my_nkbz, "] (tot: ", gwr%nkbz, ")"
5576          call cwtime_report(msg, cpu_ikf, wall_ikf, gflops_ikf)
5577        end if
5578      end do ! my_ikf
5579 
5580      ! Deallocate extra Wc matrices defined by got_qibz
5581      call gwr%redistrib_mats_qibz("wc", itau, spin, need_qibz, got_qibz, "free")
5582 
5583      ! Integrate self-energy matrix elements in the unit cell.
5584      ! Remember that Sigma is stored as (r',r) and that the second dimension is MPI-distributed.
5585      ! In case of k or g distribution, sigc_pm is a partial 6d integral that will be ALL_REDUCED in gwr%comm afterwards.
5586      ! TODO: Off-diagonal terms although this is not the most efficient algorithm
5587      do ikcalc=1,gwr%nkcalc
5588        do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5589          call sig_braket_ur(sigc_rpr(:,:,ikcalc), gwr%g_nfft*gwr%nspinor, uc_psir_bk(:,band,ikcalc), sigc_pm)
5590          if (gwr%sig_diago) then
5591            sigc_it_mat(:, itau, band, 1, ikcalc, spin) = sigc_pm
5592          end if
5593         end do
5594      end do ! ikcalc
5595      !ABI_FREE(loc_cwork)
5596 
5597      write(msg,'(3(a,i0),a)')" Sigma_c my_itau [", my_it, "/", gwr%my_ntau, "] (tot: ", gwr%ntau, ")"
5598      call cwtime_report(msg, cpu_tau, wall_tau, gflops_tau)
5599    end do ! my_it
5600 
5601    ABI_FREE(uc_psir_bk)
5602  end do ! my_is
5603 
5604  sigc_it_mat = -sigc_it_mat * (one/gwr%g_nfft) ** 2
5605 
5606  call wc_rpr%free(); call slk_array_free(sigc_rpr); call slk_array_free(gk_rpr_pm)
5607  do ikcalc=1,gwr%nkcalc
5608    call ltg_kcalc(ikcalc)%free()
5609  end do
5610  call wrtout(std_out, " Mixed space algorithm for sigma completed")
5611 end if
5612 
5613  call sigijtab_free(Sigcij_tab)
5614  ABI_FREE(Sigcij_tab)
5615 
5616  ! Collect results and average
5617  call xmpi_sum(sigc_it_mat, gwr%comm%value, ierr)
5618 
5619  if (gwr%dtset%symsigma == +1 .and. .not. gwr%use_supercell_for_sigma) then
5620    call wrtout(std_out, " Averaging Sig_c matrix elements within degenerate subspaces.")
5621    ABI_CHECK(gwr%sig_diago, "symsigma = 1 requires diagonal Sigma_c")
5622    do spin=1,gwr%nsppol
5623    do ikcalc=1,gwr%nkcalc
5624      do ideg=1,size(gwr%degtab(ikcalc, spin)%bids)
5625        associate (bids => gwr%degtab(ikcalc, spin)%bids(ideg)%vals)
5626        nstates = size(bids)
5627        avg_2ntau = sum(sigc_it_mat(:,:,bids(:), 1,ikcalc, spin), dim=3) / nstates
5628        do ii=1,nstates
5629          sigc_it_mat(:,:,bids(ii), 1,ikcalc, spin) = avg_2ntau
5630        end do
5631        end associate
5632      end do ! ideg
5633    end do
5634    end do
5635  end if ! symsigma == +1
5636 
5637  ! Store matrix elements of Sigma_c(it), separate even and odd part
5638  ! then use sine/cosine transform to get Sigma_c(i omega).
5639  ! Finally, perform analytic continuation with Pade' to go to the real-axis
5640  ! and compute QP corrections and spectral functions. All procs execute this part as it's very cheap.
5641 
5642  imag_zmesh(:) = j_dpc * gwr%iw_mesh
5643 
5644  ! Save previous QP bands in qp_ebands_prev (needed for self-consistency)
5645  ! In the loop below, we also update gwr%qp_ebands%eig with the QP results and recompute occ/fermie.
5646  gwr%qp_ebands_prev%eig = gwr%qp_ebands%eig
5647  gwr%qp_ebands_prev%occ = gwr%qp_ebands%occ
5648 
5649  e0_kcalc = zero; spfunc_diag = zero; pade_solver_ierr = 0; ze0_kcalc = zero; sigc_e0 = zero
5650  qpz_ene = zero; qp_pade = zero; sigxc_rw_diag = zero
5651  ks_gaps = -one; qpz_gaps = -one !; qp_pade_gaps = -one
5652 
5653  do spin=1,gwr%nsppol
5654  do ikcalc=1,gwr%nkcalc
5655    ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
5656    do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5657      ibc = band - gwr%bstart_ks(ikcalc, spin) + 1
5658 
5659      ! FT Sigma(itau) --> Sigma(iw)
5660      band2_start = 1; band2_stop = 1
5661      if (.not. gwr%sig_diago) then
5662        band2_start = gwr%bstart_ks(ikcalc, spin); band2_stop = gwr%bstop_ks(ikcalc, spin)
5663      end if
5664      do band2=band2_start, band2_stop
5665        ! f(t) = E(t) + O(t) = (f(t) + f(-t)) / 2  + (f(t) - f(-t)) / 2
5666        associate (vals_pmt => sigc_it_mat(:,:, band, band2 ,ikcalc, spin))
5667        even_t = (vals_pmt(1,:) + vals_pmt(2,:)) / two; odd_t = (vals_pmt(1,:) - vals_pmt(2,:)) / two
5668        gwr%sigc_iw_mat(:, band, band2, ikcalc, spin) = matmul(gwr%cosft_wt, even_t) + j_dpc * matmul(gwr%sinft_wt, odd_t)
5669        end associate
5670      end do
5671 
5672      ! NB: e0 is always set to the KS energy even in case of self-consistency.
5673      e0 = gwr%ks_ebands%eig(band, ik_ibz, spin)
5674      if (      gwr%sig_diago) sigx = gwr%sigx_mat(band, 1, ikcalc, spin)
5675      if (.not. gwr%sig_diago) sigx = gwr%sigx_mat(band, band, ikcalc, spin)
5676 
5677      ! Note vxc[n_val] instead of vxc[n_val + n_nlcc] with the model core charge.
5678      vxc_val = gwr%ks_me%vxcval(band, band, ik_ibz, spin)
5679      vu = zero; if (gwr%dtset%usepawu /= 0) vu = gwr%ks_me%vu(band, band, ik_ibz, spin)
5680      v_meanf = vxc_val + vu
5681 
5682      band2 = merge(1, band, gwr%sig_diago)
5683      pade_npts = gwr%ntau
5684      if (gwr%dtset%userie > 0 .and. pade_npts > gwr%dtset%userie) then
5685         pade_npts = min(gwr%ntau, gwr%dtset%userie)
5686         call wrtout(std_out, sjoin("Limiting the number of points for pade to:", itoa(pade_npts)))
5687      end if
5688      call spade%init(pade_npts, imag_zmesh, gwr%sigc_iw_mat(:, band, band2, ikcalc, spin), branch_cut=">")
5689 
5690      ! Solve the QP equation with Newton-Rapson starting from e0
5691      zz = cmplx(e0, zero)
5692      call spade%qp_solve(e0, v_meanf, sigx, zz, zsc, msg, ierr)
5693      qp_pade(band, ikcalc, spin) = zsc
5694      pade_solver_ierr(band, ikcalc, spin) = ierr
5695      ABI_WARNING_IF(ierr /= 0, msg)
5696 
5697      call spade%eval(zz, sigc_e0__, dzdval=dsigc_de0)
5698      ! Z = (1 - dSigma / domega(E0))^{-1}
5699      z_e0 = one / (one - dsigc_de0)
5700 
5701      ! Compute linearized QP solution and store results
5702      qp_ene = e0 + z_e0 * (sigc_e0__ + sigx - v_meanf)
5703      qpz_ene(band, ikcalc, spin) = qp_ene
5704      e0_kcalc(band, ikcalc, spin) = e0
5705      sigc_e0(band, ikcalc, spin) = sigc_e0__
5706      ze0_kcalc(band, ikcalc, spin) = z_e0
5707 
5708      ! IMPORTANT: Here we update qp_ebands%eig with the new enes obtained with the linearized QP equation
5709      gwr%qp_ebands%eig(band, ik_ibz, spin) = real(qp_ene)
5710 
5711      ! Compute Spectral function using linear mesh **centered** around KS e0.
5712      rw_mesh = arth(e0 - gwr%wr_step * (gwr%nwr / 2), gwr%wr_step, gwr%nwr)
5713      hhartree_bk = gwr%ks_ebands%eig(band, ik_ibz, spin) - v_meanf
5714      do iw=1,gwr%nwr
5715        zz = rw_mesh(iw)
5716        call spade%eval(zz, sigc_e0__)
5717        sig_xc = sigx + sigc_e0__
5718        sigxc_rw_diag(iw, band, ikcalc, spin) = sig_xc
5719 
5720        spfunc_diag(iw, band, ikcalc, spin) = one / pi * abs(aimag(sigc_e0__)) &
5721          / ( (real(rw_mesh(iw) - hhartree_bk - sig_xc)) ** 2 + (aimag(sigc_e0__)) ** 2)  ! / Ha_eV
5722 
5723        !Sr%hhartree = hdft - KS_me%vxcval
5724        !spfunc_diag(iw, band, ikcalc, spin) = &
5725        !  one / pi * abs(aimag(sigc_e0__)) &
5726        !  /( (real(rw_mesh(iw) - Sr%hhartree(ib, ib, ik_ibz, spin) - sigx_xc)) ** 2 &
5727        !    +(aimag(sigc_e0__)) ** 2) / Ha_eV
5728      end do ! iw
5729 
5730    end do ! band
5731  end do ! ikcalc
5732  end do ! spin
5733 
5734  if (gwr%nkcalc == gwr%nkibz) then
5735    ! Shift the bands that are not explicitly included in the SCF calculation.
5736    ! using the correction evaluated at bstop_ks/bstart_ks to accelerate self-consistent calculations.
5737    do spin=1,gwr%nsppol
5738      do ikcalc=1,gwr%nkcalc
5739        ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
5740        band = gwr%bstop_ks(ikcalc, spin)
5741        if (band + 1 <= size(gwr%qp_ebands%eig, dim=1)) then
5742          eshift = gwr%qp_ebands%eig(band, ik_ibz, spin) - gwr%qp_ebands_prev%eig(band, ik_ibz, spin)
5743          call wrtout(std_out, sjoin(" Correcting bands >= ", itoa(band+1), "with eshift:", ftoa(eshift * Ha_meV), "(meV)"))
5744          gwr%qp_ebands%eig(band + 1:, ik_ibz, spin) = gwr%qp_ebands%eig(band + 1:, ik_ibz, spin) + eshift
5745        end if
5746        band = gwr%bstart_ks(ikcalc, spin)
5747        if (band > 1) then ! unlikely
5748          eshift = gwr%qp_ebands%eig(band, ik_ibz, spin) - gwr%qp_ebands_prev%eig(band, ik_ibz, spin)
5749          call wrtout(std_out, sjoin(" Correcting bands < ", itoa(band), "with eshift:", ftoa(eshift * Ha_meV), "(meV)"))
5750          gwr%qp_ebands%eig(:band - 1, ik_ibz, spin) = gwr%qp_ebands%eig(:band - 1, ik_ibz, spin) + eshift
5751        end if
5752      end do
5753    end do
5754 
5755    ! Recompute occupancies and set fermie to zero.
5756    ! FIXME: Possible problem here if the QP energies are not ordered!
5757    call ebands_update_occ(gwr%qp_ebands, gwr%dtset%spinmagntarget, prtvol=gwr%dtset%prtvol, fermie_to_zero=.True.)
5758  end if
5759 
5760  if (gwr%comm%me == 0) then
5761    ! Master writes results to ab_out, std_out and GWR.nc
5762    if (any(pade_solver_ierr /= 0)) then
5763      ! Write warning if QP solver failed.
5764      ierr = count(pade_solver_ierr /= 0)
5765      call wrtout([ab_out, std_out], sjoin("QP solver failed for:", itoa(ierr), "states"))
5766    end if
5767 
5768    call write_notations([std_out, ab_out])
5769    do spin=1,gwr%nsppol
5770      do ikcalc=1,gwr%nkcalc
5771        ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
5772 
5773        ydoc = yamldoc_open('GWR_SelfEnergy_ee', width=11, real_fmt='(3f8.3)')
5774        call ydoc%add_real1d('kpoint', gwr%kcalc(:, ikcalc))
5775        call ydoc%add_int('spin', spin, int_fmt="(i1)")
5776        call ydoc%add_int('gwr_scf_iteration', gwr%scf_iteration)
5777        call ydoc%add_string('gwr_task', gwr%dtset%gwr_task)
5778 
5779        ! Compute gaps assumim KS band indices.
5780        band_val = gwr%ks_vbik(ik_ibz, spin)
5781        nbc = gwr%bstop_ks(ikcalc, spin) - gwr%bstart_ks(ikcalc, spin) + 1
5782        ib1 = gwr%bstart_ks(ikcalc, spin); ib2 = gwr%bstop_ks(ikcalc, spin)
5783 
5784        if (band_val >= gwr%bstart_ks(ikcalc, spin) .and. band_val + 1 <= gwr%bstop_ks(ikcalc, spin)) then
5785          ibv = band_val - gwr%bstart_ks(ikcalc, spin) + 1
5786          ks_gap = gwr%ks_ebands%eig(band_val+1, ik_ibz, spin) - gwr%ks_ebands%eig(band_val, ik_ibz, spin)
5787 
5788          ! This to detect a possible band inversion and compute qp_gaps accordingly.
5789          band_inversion = .False.
5790          call sort_rvals(nbc, real(qpz_ene(ib1:, ikcalc, spin)), iperm, sorted_qpe, tol=tol12)
5791 
5792          if (iperm(ibv) /= ibv .or. iperm(ibv + 1) /= ibv + 1) then
5793            band_inversion = .True.
5794            call ydoc%add_int('QP_VBM_band', iperm(ibv) + gwr%bstart_ks(ikcalc, spin) - 1)
5795            call ydoc%add_int('QP_CBM_band', iperm(ibv+1) + gwr%bstart_ks(ikcalc, spin) - 1)
5796            qp_gap = sorted_qpe(ibv+1) - sorted_qpe(ibv)
5797            !qp_pade_gap = qp_pade(band_val+1, ikcalc, spin) - qp_pade(band_val, ikcalc, spin)
5798          else
5799            call ydoc%add_int('QP_VBM_band', ibv + gwr%bstart_ks(ikcalc, spin) - 1)
5800            call ydoc%add_int('QP_CBM_band', ibv+1 + gwr%bstart_ks(ikcalc, spin) - 1)
5801            qp_gap = gwr%qp_ebands%eig(band_val+1, ik_ibz, spin) - gwr%qp_ebands%eig(band_val, ik_ibz, spin)
5802            qp_pade_gap = qp_pade(band_val+1, ikcalc, spin) - qp_pade(band_val, ikcalc, spin)
5803          end if
5804          ABI_FREE(iperm)
5805          ABI_FREE(sorted_qpe)
5806 
5807          call ydoc%add_real('KS_gap', ks_gap * Ha_eV)
5808          call ydoc%add_real('QP_gap', qp_gap * Ha_eV)
5809          call ydoc%add_real('Delta_QP_KS', (qp_gap - ks_gap) * Ha_eV)
5810          ks_gaps(ikcalc, spin)= ks_gap
5811          qpz_gaps(ikcalc, spin) = qp_gap
5812          !qp_pade_gaps(ikcalc, spin) = qp_pade_gap
5813        end if
5814 
5815        call ydoc%open_tabular('data') !, tag='SigmaeeData')
5816        write(msg, "(a5, *(a9))") "Band", "E0", "<VxcDFT>", "SigX", "SigC(E0)", "Z", "E-E0", "E-Eprev", "E", "Occ(E)"
5817        call ydoc%add_tabular_line(msg)
5818 
5819        do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5820          ibc = band - gwr%bstart_ks(ikcalc, spin) + 1
5821          e0 = gwr%ks_ebands%eig(band, ik_ibz, spin)
5822          qp_ene = gwr%qp_ebands%eig(band, ik_ibz, spin)
5823          qp_ene_prev = gwr%qp_ebands_prev%eig(band, ik_ibz, spin)
5824          if (      gwr%sig_diago) sigx = gwr%sigx_mat(band, 1, ikcalc, spin)
5825          if (.not. gwr%sig_diago) sigx = gwr%sigx_mat(band, band, ikcalc, spin)
5826 
5827          write(msg,'(i5, *(f9.3))') &
5828            band, &                                                        ! Band
5829            e0 * Ha_eV, &                                                  ! E0
5830            real(gwr%ks_me%vxcval(band, band, ik_ibz, spin)) * Ha_eV, &    ! <VxcDFT>
5831            real(sigx) * Ha_eV, &                                          ! SigX
5832            real(sigc_e0(band, ikcalc, spin)) * Ha_eV, &                   ! SigC(E0)
5833            real(ze0_kcalc(band, ikcalc, spin)), &                         ! Z
5834            (real(qp_ene - e0)) * Ha_eV, &                                 ! E-E0
5835            real(qp_ene - qp_ene_prev) * Ha_eV, &                          ! E-Eprev
5836            real(qp_ene) * Ha_eV, &                                        ! E
5837            gwr%qp_ebands%occ(band, ik_ibz, spin)                          ! Occ(E)
5838          call ydoc%add_tabular_line(msg)
5839        end do
5840 
5841        call ydoc%write_units_and_free([std_out, ab_out])
5842      end do ! ikcalc
5843    end do ! spin
5844 
5845    ! Print KS and QP gaps
5846    msg = "Kohn-Sham gaps and band edges from IBZ mesh"
5847    call gwr%ks_gaps%print(unit=std_out, header=msg)
5848    call gwr%ks_gaps%print(unit=ab_out, header=msg)
5849 
5850    new_gaps = ebands_get_gaps(gwr%qp_ebands, ierr)
5851    write(msg,"(a,i0,a)")" QP gaps and band edges taking into account Sigma_nk corrections for ",gwr%nkcalc," k-points"
5852    call new_gaps%print(unit=std_out, header=msg)
5853    call new_gaps%print(unit=ab_out, header=msg)
5854    if (ierr /= 0) then
5855      ABI_WARNING("Cannot compute QP fundamental and direct gap (likely metal)")
5856    end if
5857    call new_gaps%free()
5858 
5859    ! Write results to txt files.
5860    if (open_file(strcat(gwr%dtfil%filnam_ds(4), '_SIGC_IT'), msg, newunit=unt_it, action="write") /= 0) then
5861      ABI_ERROR(msg)
5862    end if
5863    write(unt_it, "(a)")"# Diagonal elements of Sigma_c(i tau, +/-) in atomic units"
5864    write(unt_it, "(a)")"# tau Re/Im Sigma_c(+itau) Re/Im Sigma_c(-itau)"
5865 
5866    if (open_file(strcat(gwr%dtfil%filnam_ds(4), '_SIGXC_IW'), msg, newunit=unt_iw, action="write") /= 0) then
5867      ABI_ERROR(msg)
5868    end if
5869    write(unt_iw, "(a)")"# Diagonal elements of Sigma_xc(i omega) in eV units"
5870    write(unt_iw, "(a)")"# omega Re/Im Sigma_c(i omega)"
5871 
5872    if (open_file(strcat(gwr%dtfil%filnam_ds(4), '_SIGXC_RW'), msg, newunit=unt_rw, action="write") /= 0) then
5873      ABI_ERROR(msg)
5874    end if
5875    write(unt_rw, "(a)")"# Diagonal elements of Sigma_xc(omega) in eV units and spectral function A(omega)"
5876    write(unt_rw, "(a)")"# omega Re/Im Sigma_xc(omega), A(omega)"
5877 
5878    dat_units = [unt_it, unt_iw, unt_rw]
5879    call write_units(dat_units, "# Fermi energy set to zero. Energies in eV")
5880    call write_units(dat_units, sjoin("# nkcalc:", itoa(gwr%nkcalc), ", nsppol:", itoa(gwr%nsppol)))
5881 
5882    ! TODO: Improve file format. Add compatibility with gnuplot format for datasets?
5883    do spin=1,gwr%nsppol
5884      do ikcalc=1,gwr%nkcalc
5885        ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
5886        call write_units(dat_units, sjoin("# kpt:", ktoa(gwr%kcalc(:, ikcalc)), "spin:", itoa(spin)))
5887        do band=gwr%bstart_ks(ikcalc, spin), gwr%bstop_ks(ikcalc, spin)
5888          ibc = band - gwr%bstart_ks(ikcalc, spin) + 1
5889          e0 = gwr%ks_ebands%eig(band, ik_ibz, spin)
5890          band2 = merge(1, band, gwr%sig_diago)
5891          sigx = gwr%sigx_mat(band, band2, ikcalc, spin)
5892 
5893          call write_units(dat_units, sjoin("# band:", itoa(band), ", spin:", itoa(spin)))
5894          call write_units(dat_units, sjoin("# sigx_ev:", ftoa(sigx * Ha_eV)))
5895 
5896          do itau=1,gwr%ntau
5897            ! FIXME itau is not ordered
5898            write(unt_it, "(*(es16.8))") &
5899              gwr%tau_mesh(itau), &
5900              c2r(sigc_it_mat(1, itau, band, band2, ikcalc, spin)), &
5901              c2r(sigc_it_mat(2, itau, band, band2, ikcalc, spin))
5902            write(unt_iw, "(*(es16.8))") &
5903              gwr%iw_mesh(itau) * Ha_eV, &
5904              (c2r(gwr%sigc_iw_mat(itau, band, band2, ikcalc, spin) + sigx)) * Ha_eV
5905          end do
5906 
5907          ! Write Sigma_xc(omega) and A(omega)
5908          rw_mesh = arth(e0 - gwr%wr_step * (gwr%nwr / 2), gwr%wr_step, gwr%nwr) * Ha_eV
5909          do iw=1,gwr%nwr
5910            write(unt_rw, "(*(es16.8))") &
5911              rw_mesh(iw), &
5912              c2r(sigxc_rw_diag(iw, band, ikcalc, spin)) * Ha_eV, &
5913              spfunc_diag(iw, band, ikcalc, spin) / Ha_eV
5914          end do
5915        end do
5916      end do
5917    end do
5918 
5919    close(unt_it); close(unt_iw); close(unt_rw)
5920 
5921    ! ======================
5922    ! Add results to GWR.nc
5923    ! ======================
5924    NCF_CHECK(nctk_open_modify(ncid, gwr%gwrnc_path, xmpi_comm_self))
5925 
5926    ! Define arrays with results.
5927    define = .True.
5928    if (define) then
5929      ncerr = nctk_def_arrays(ncid, [ &
5930        nctkarr_t("e0_kcalc", "dp", "smat_bsize1, nkcalc, nsppol"), &
5931        nctkarr_t("ze0_kcalc", "dp", "two, smat_bsize1, nkcalc, nsppol"), &
5932        nctkarr_t("qpz_ene", "dp", "two, smat_bsize1, nkcalc, nsppol"), &
5933        nctkarr_t("qp_pade", "dp", "two, smat_bsize1, nkcalc, nsppol"), &
5934        nctkarr_t("pade_solver_ierr", "int", "smat_bsize1, nkcalc, nsppol"), &
5935        nctkarr_t("ks_gaps", "dp", "nkcalc, nsppol"), &
5936        nctkarr_t("qpz_gaps", "dp", "nkcalc, nsppol"), &
5937        !nctkarr_t("qp_pade_gaps", "dp", "nkcalc, nsppol"), &
5938        nctkarr_t("sigx_mat", "dp", "two, smat_bsize1, smat_bsize2, nkcalc, nsppol"), &
5939        nctkarr_t("sigc_it_mat", "dp", "two, two, ntau, smat_bsize1, smat_bsize2, nkcalc, nsppol"), &
5940        nctkarr_t("sigc_iw_mat", "dp", "two, ntau, smat_bsize1, smat_bsize2, nkcalc, nsppol"), &
5941        nctkarr_t("sigxc_rw_diag", "dp", "two, nwr, smat_bsize1, nkcalc, nsppol"), &
5942        nctkarr_t("spfunc_diag", "dp", "nwr, smat_bsize1, nkcalc, nsppol") &
5943      ])
5944      NCF_CHECK(ncerr)
5945    end if
5946 
5947    ! Write data.
5948    NCF_CHECK(nctk_set_datamode(ncid))
5949    NCF_CHECK(nf90_put_var(ncid, vid("e0_kcalc"), e0_kcalc))
5950    NCF_CHECK(nf90_put_var(ncid, vid("ze0_kcalc"), c2r(ze0_kcalc)))
5951    NCF_CHECK(nf90_put_var(ncid, vid("sigx_mat"), c2r(gwr%sigx_mat)))
5952    NCF_CHECK(nf90_put_var(ncid, vid("qpz_ene"), c2r(qpz_ene)))
5953    NCF_CHECK(nf90_put_var(ncid, vid("qp_pade"), c2r(qp_pade)))
5954    NCF_CHECK(nf90_put_var(ncid, vid("pade_solver_ierr"), pade_solver_ierr))
5955    NCF_CHECK(nf90_put_var(ncid, vid("ks_gaps"), ks_gaps))
5956    NCF_CHECK(nf90_put_var(ncid, vid("qpz_gaps"), qpz_gaps))
5957    !NCF_CHECK(nf90_put_var(ncid, vid("qp_pade_gaps"), qp_pade_gaps))
5958    NCF_CHECK(nf90_put_var(ncid, vid("sigc_it_mat"), c2r(sigc_it_mat)))
5959    NCF_CHECK(nf90_put_var(ncid, vid("sigc_iw_mat"), c2r(gwr%sigc_iw_mat)))
5960    NCF_CHECK(nf90_put_var(ncid, vid("sigxc_rw_diag"), c2r(sigxc_rw_diag)))
5961    NCF_CHECK(nf90_put_var(ncid, vid("spfunc_diag"), spfunc_diag))
5962    NCF_CHECK(nf90_close(ncid))
5963  end if ! master
5964 
5965  ABI_FREE(sigc_it_mat)
5966  !ABI_FREE(sigc_iw_mat)
5967 
5968  call cwtime_report(" gwr_build_sigmac:", cpu_all, wall_all, gflops_all)
5969  call timab(1925, 2, tsec)
5970 
5971 contains
5972 integer function vid(vname)
5973  character(len=*),intent(in) :: vname
5974  vid = nctk_idname(ncid, vname)
5975 end function vid
5976 
5977 subroutine print_sigma_header()
5978 
5979  if (gwr%comm%me /= 0) return
5980  if (gwr%use_supercell_for_sigma) then
5981    call wrtout(std_out, sjoin(" Building Sigma_c in the supercell with FFT mesh:", ltoa(sc_ngfft(1:3))), pre_newlines=2)
5982  else
5983    call wrtout([std_out,ab_out], " Building Sigma_c with convolutions in k-space:", pre_newlines=2)
5984  end if
5985  call wrtout(std_out, sjoin(" gwr_np_kgts:", ltoa(gwr%dtset%gwr_np_kgts)))
5986  call wrtout(std_out, sjoin(" ngkpt:", ltoa(gwr%ngkpt), " ngqpt:", ltoa(gwr%ngqpt)))
5987  call wrtout(std_out, sjoin(" gwr_boxcutmin:", ftoa(gwr%dtset%gwr_boxcutmin)))
5988  call wrtout(std_out, sjoin(" my_ntau:", itoa(gwr%my_ntau), "ntau:", itoa(gwr%ntau)))
5989  call wrtout(std_out, sjoin(" my_nkbz:", itoa(gwr%my_nkbz), "nkibz:", itoa(gwr%nkibz)))
5990  call wrtout(std_out, sjoin("- FFT uc_batch_size:", itoa(gwr%uc_batch_size)))
5991  call wrtout(std_out, sjoin("- FFT sc_batch_size:", itoa(gwr%sc_batch_size)), do_flush=.True.)
5992 
5993 end subroutine print_sigma_header
5994 
5995 #endif
5996 end subroutine gwr_build_sigmac

m_gwr/gwr_build_sigxme [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_sigxme

FUNCTION

 Compute matrix elements of the exchange part.

INPUTS

OUTPUT

SOURCE

7345 subroutine gwr_build_sigxme(gwr, compute_qp)
7346 
7347 !Arguments ------------------------------------
7348  class(gwr_t),target,intent(inout) :: gwr
7349  logical,optional,intent(in) :: compute_qp
7350 
7351 !Local variables-------------------------------
7352 !scalars
7353  integer :: nsppol, nspinor, ierr, my_ikf, band_sum, ii, jj, kb, il_b, iab !ig_start, ig,
7354  integer :: my_is, ikcalc, ikcalc_ibz, bmin, bmax, band, istwf_k, npw_k
7355  integer :: spin, jb, is_idx, use_umklp
7356  integer :: spad, wtqm, wtqp, irow, spadx1, spadx2
7357  integer :: npwx, u_nfft, u_mgfft, u_mpw
7358  integer :: ik_bz, ik_ibz, isym_k, trev_k, g0_k(3)
7359  integer :: iq_bz, iq_ibz, isym_q, trev_q, g0_q(3)
7360  logical :: isirr_k, isirr_q, sigc_is_herm, compute_qp__
7361  real(dp) :: fact_spin, theta_mu_minus_esum, theta_mu_minus_esum2, tol_empty, tol_empty_in, gwr_boxcutmin_x
7362  real(dp) :: cpu_k, wall_k, gflops_k, cpu_all, wall_all, gflops_all
7363  character(len=5000) :: msg
7364  logical :: q_is_gamma
7365  type(__slkmat_t),pointer :: ugb_kibz
7366  type(crystal_t),pointer :: cryst
7367  type(dataset_type),pointer :: dtset
7368  type(littlegroup_t) :: ltg_k
7369  type(desc_t),pointer :: desc_ki
7370 !arrays
7371  integer :: g0(3), gmax(3), spinor_padx(2,4), u_ngfft(18), work_ngfft(18), units(2)
7372  integer,allocatable :: gbound_kcalc(:,:), gvec_x(:,:), gbound_x(:,:), kg_k(:,:), gbound_ksum(:,:)
7373  real(dp) :: ksum(3), kk_ibz(3), kgw(3), kgw_m_ksum(3), qq_bz(3), tsec(2) !, kk_bz(3), q0(3) !, spinrot_kbz(4), spinrot_kgw(4)
7374  real(dp),contiguous, pointer :: ks_eig(:,:,:), qp_eig(:,:,:), qp_occ(:,:,:), cg2_ptr(:,:) ! cg1_ptr(:,:),
7375  real(dp),allocatable :: work(:,:,:,:), cg1_ibz(:,:) !, cg2_bz(:,:)
7376  complex(gwpc),allocatable :: vc_sqrt_qbz(:)
7377  complex(dp),allocatable :: rhotwg(:), rhotwgp(:), rhotwg_ki(:,:)
7378  complex(gwpc),allocatable :: ur_bdgw(:,:)
7379  complex(dp),allocatable :: ur_ksum(:), ur_prod(:), eig0r(:)
7380  complex(dp),target,allocatable :: ug_ksum(:)
7381  complex(dp),allocatable  :: sigxcme_tmp(:,:), sigxme_tmp(:,:,:), sigx(:,:,:,:)
7382  complex(dp) :: gwpc_sigxme, gwpc_sigxme2, xdot_tmp
7383  type(sigijtab_t),allocatable :: Sigxij_tab(:,:), Sigcij_tab(:,:)
7384 
7385 ! *************************************************************************
7386 
7387  call timab(1920, 1, tsec)
7388  call cwtime(cpu_all, wall_all, gflops_all, "start")
7389  units = [std_out, ab_out]
7390 
7391  nsppol = gwr%nsppol; nspinor = gwr%nspinor; cryst => gwr%cryst; dtset => gwr%dtset
7392 
7393  ! Table for \Sigmax_ij matrix elements.
7394  sigc_is_herm = .False.
7395  call sigtk_sigma_tables(gwr%nkcalc, gwr%nkibz, gwr%nsppol, gwr%bstart_ks, gwr%bstop_ks, gwr%kcalc2ibz(:,1), &
7396                          gwr%sig_diago, sigc_is_herm, sigxij_tab, sigcij_tab)
7397 
7398  call sigijtab_free(Sigcij_tab)
7399  ABI_FREE(Sigcij_tab)
7400 
7401  if (gwr%sig_diago) then
7402    call wrtout(units, " Computing diagonal matrix elements of Sigma_x", pre_newlines=1)
7403  else
7404    call wrtout(units, " Computing diagonal + off-diagonal matrix elements of Sigma_x", pre_newlines=1)
7405  end if
7406 
7407  ! Allocate array with Sigma_x matrix elements depending on sig_diago
7408  ii = gwr%b1gw; jj = gwr%b2gw
7409  if (gwr%sig_diago) then
7410    ii = 1; jj = 1
7411  end if
7412  ABI_RECALLOC(gwr%sigx_mat, (gwr%b1gw:gwr%b2gw, ii:jj, gwr%nkcalc, gwr%nsppol*gwr%nsig_ab))
7413 
7414  ks_eig => gwr%ks_ebands%eig
7415  if (gwr%scf_iteration == 1) then
7416    call wrtout(units, " Using KS orbitals and KS energies...", newlines=1, do_flush=.True.)
7417    qp_eig => gwr%ks_ebands%eig; qp_occ => gwr%ks_ebands%occ
7418  else
7419    call wrtout(units, " Using KS orbitals and QP energies...", newlines=1, do_flush=.True.)
7420    qp_eig => gwr%qp_ebands%eig; qp_occ => gwr%qp_ebands%occ
7421  end if
7422 
7423  ! MRM allow lower occ numbers
7424  ! Normalization of theta_mu_minus_esum. If nsppol==2, qp_occ $\in [0,1]$
7425  tol_empty_in = 0.01                            ! Initialize the tolerance used to decide if a band is empty (passed to m_sigx.F90)
7426  select case (nsppol)
7427  case (1)
7428    fact_spin = half; tol_empty = tol_empty_in          ! below this value the state is assumed empty
7429    if (nspinor == 2) then
7430      fact_spin = one; tol_empty = half * tol_empty_in  ! below this value the state is assumed empty
7431    end if
7432  case (2)
7433    fact_spin = one; tol_empty = half * tol_empty_in  ! to be consistent and obtain similar results if a metallic
7434  case default                                        ! spin unpolarized system is treated using nsppol==2
7435    ABI_BUG(sjoin('Wrong nsppol:', itoa(nsppol)))
7436  end select
7437 
7438  ! =========================================
7439  ! Find FFT mesh and max number of g-vectors
7440  ! =========================================
7441  gwr_boxcutmin_x = two
7442  call gwr%get_u_ngfft(gwr_boxcutmin_x, u_ngfft, u_nfft, u_mgfft, u_mpw, gmax)
7443 
7444  if (gwr%comm%me == 0) then
7445    call print_ngfft(u_ngfft, header="FFT mesh for Sigma_x", unit=std_out)
7446    !call print_ngfft(u_ngfft, header="FFT mesh for Sigma_x", unit=ab_out)
7447  end if
7448 
7449  ! Init work_ngfft
7450  gmax = gmax + 4 ! FIXME: this is to account for umklapp, should also consider Gamma-only and istwfk
7451  gmax = 2 * gmax + 1
7452  call ngfft_seq(work_ngfft, gmax)
7453  !write(std_out,*)"work_ngfft(1:3): ",work_ngfft(1:3)
7454  ABI_MALLOC(work, (2, work_ngfft(4), work_ngfft(5), work_ngfft(6)))
7455 
7456  do my_is=1,gwr%my_nspins
7457  spin = gwr%my_spins(my_is)
7458  do ikcalc=1,gwr%nkcalc ! TODO: Should be spin dependent!
7459    call cwtime(cpu_k, wall_k, gflops_k, "start")
7460    ikcalc_ibz = gwr%kcalc2ibz(ikcalc, 1)
7461    kgw = gwr%kcalc(:, ikcalc)
7462    bmin = gwr%bstart_ks(ikcalc, spin); bmax = gwr%bstop_ks(ikcalc, spin)
7463 
7464    ! ==============================================================
7465    ! ==== Find little group of the k-points for GW corrections ====
7466    ! ==============================================================
7467    ! * The little group is used only if symsigma == 1
7468    ! * If use_umklp == 1 then symmetries requiring an umklapp to preserve k_gw are included as well.
7469    use_umklp = 1
7470    call ltg_k%init(kgw, gwr%nqbz, gwr%qbz, cryst, use_umklp, npwe=0)
7471 
7472    write(msg,'(5a)') ch10, &
7473     ' Calculating <nk|Sigma_x|nk> at k: ',trim(ktoa(kgw)), ", for band range: ", trim(ltoa([bmin, bmax]))
7474    call wrtout(std_out, msg)
7475 
7476    ! ===============================================
7477    ! Load wavefunctions for Sigma_x matrix elements
7478    ! ===============================================
7479    ! All procs need ur_bdgw but the IBZ is distributed and, possibly, replicated in gwr%kpt_comm.
7480    ! Here we select the right procs, fill the buffer with the FFT results and then use
7481    ! a dumb xmpi_sum + rescaling to gather the results.
7482    ! FIXME: g-vectors from Green's descriptor or use another array to be able to deal with istwfk == 2?
7483 
7484    ABI_MALLOC_OR_DIE(ur_bdgw, (u_nfft * nspinor, bmin:bmax), ierr)
7485    ur_bdgw = czero_gw
7486 
7487    if (any(ikcalc_ibz == gwr%my_kibz_inds)) then
7488      associate (desc_kcalc => gwr%green_desc_kibz(ikcalc_ibz), ugb_kcalc => gwr%ugb(ikcalc_ibz, spin))
7489      ABI_MALLOC(gbound_kcalc, (2 * u_mgfft + 8, 2))
7490      call sphereboundary(gbound_kcalc, desc_kcalc%istwfk, desc_kcalc%gvec, u_mgfft, desc_kcalc%npw)
7491 
7492      do il_b=1,ugb_kcalc%sizeb_local(2)
7493        band = ugb_kcalc%loc2gcol(il_b); if (band < bmin .or. band > bmax) CYCLE
7494        call fft_ug(desc_kcalc%npw, u_nfft, nspinor, ndat1, &
7495                    u_mgfft, u_ngfft, desc_kcalc%istwfk, desc_kcalc%gvec, gbound_kcalc, &
7496                    gwr%ugb(ikcalc_ibz, spin)%buffer_cplx(:, il_b), &  ! in
7497                    ur_bdgw(:, band))                                  ! out
7498      end do
7499      ABI_FREE(gbound_kcalc)
7500      end associate
7501    end if
7502 
7503    ! Collect and rescale
7504    !call xmpi_sum(ur_bdgw, gwr%kgt_comm%value, ierr)
7505    call xmpi_sum(ur_bdgw, gwr%kg_comm%value, ierr)
7506    ur_bdgw = ur_bdgw / gwr%np_kibz(ikcalc_ibz)
7507 
7508    ABI_MALLOC(ur_prod, (u_nfft * nspinor))
7509    ABI_MALLOC(ur_ksum, (u_nfft * nspinor))
7510    ABI_MALLOC(eig0r, (u_nfft * nspinor))
7511 
7512    ABI_CALLOC(sigxme_tmp, (bmin:bmax, bmin:bmax, nsppol * gwr%nsig_ab))
7513    ABI_CALLOC(sigxcme_tmp, (bmin:bmax, nsppol * gwr%nsig_ab))
7514    ABI_CALLOC(sigx, (2, bmin:bmax, bmin:bmax, nsppol * gwr%nsig_ab))
7515 
7516    ! ========================================
7517    ! ==== Sum over my k-points in the BZ ====
7518    ! ========================================
7519 
7520    do my_ikf=1,gwr%my_nkbz
7521      ik_bz = gwr%my_kbz_inds(my_ikf)
7522      ksum = gwr%kbz(:, ik_bz)
7523 
7524      ! Find the symmetrical image of ksum in the IBZ
7525      !call kmesh%get_BZ_item(ik_bz, ksum, ik_ibz, isym_ki, iik, ph_mkt)
7526 
7527      ! FIXME: Be careful with the symmetry conventions here and the interplay between umklapp in q and FFT
7528      ik_ibz = gwr%kbz2ibz_symrel(1, ik_bz); isym_k = gwr%kbz2ibz_symrel(2, ik_bz)
7529      trev_k = gwr%kbz2ibz_symrel(6, ik_bz); g0_k = gwr%kbz2ibz_symrel(3:5, ik_bz)
7530      isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
7531      kk_ibz = gwr%kibz(:, ik_ibz)
7532 
7533      ! Identify q and G0 where q + G0 = k_GW - ksum
7534      kgw_m_ksum = kgw - ksum
7535      call findqg0(iq_bz, g0, kgw_m_ksum, gwr%nqbz, gwr%qbz, gwr%mG0)
7536      !ABI_CHECK(all(g0 == 0), sjoin("g0 = ", ltoa(g0)))
7537 
7538      call calc_ceigr(g0, u_nfft, nspinor, u_ngfft, eig0r)
7539 
7540      ! If symmetries are exploited, only q-points in the IBZ_k are computed.
7541      ! In this case elements are weighted according to wtqp and wtqm. wtqm is for time-reversal.
7542      wtqp = 1; wtqm = 0
7543      !if (can_symmetrize(spin)) then
7544      if (gwr%dtset%symsigma == 1) then
7545        if (ltg_k%ibzq(iq_bz) /= 1) CYCLE
7546        wtqp = sum(ltg_k%wtksym(1,:,iq_bz))
7547        wtqm = sum(ltg_k%wtksym(2,:,iq_bz))
7548      end if
7549 
7550      qq_bz = gwr%qbz(:, iq_bz)
7551      iq_ibz = gwr%qbz2ibz(1, iq_bz); isym_q = gwr%qbz2ibz(2, iq_bz)
7552      trev_q = gwr%qbz2ibz(6, iq_bz); g0_q = gwr%qbz2ibz(3:5, iq_bz)
7553      isirr_q = (isym_q == 1 .and. trev_q == 0 .and. all(g0_q == 0))
7554 
7555      ! Find the corresponding irreducible q-point.
7556      ! NB: non-zero umklapp G_o is not allowed. There's a check in setup_sigma
7557      !call qmesh%get_BZ_item(iq_bz, qbz, iq_ibz, isym_q, itim_q)
7558      q_is_gamma = normv(qq_bz, cryst%gmet, "G") < GW_TOLQ0
7559      call get_kg(qq_bz, istwfk1, dtset%ecutsigx, cryst%gmet, npwx, gvec_x)
7560 
7561      ABI_MALLOC(gbound_x, (2*u_mgfft + 8, 2))
7562      call sphereboundary(gbound_x, istwfk1, gvec_x, u_mgfft, npwx)
7563 
7564      ! Tables for the FFT of the oscillators.
7565      !  a) FFT index of G-G0.
7566      !  b) x_gbound table for the zero-padded FFT performed in rhotwg.
7567      !ABI_MALLOC(x_gbound, (2*u_mgfft+8, 2))
7568      !call Gsph_x%fft_tabs(g0, u_mgfft, u_ngfft, use_padfft, x_gbound, igfftxg0)
7569 
7570      ABI_MALLOC(rhotwg_ki, (npwx * nspinor, bmin:bmax))
7571      ABI_MALLOC(rhotwg, (npwx * nspinor))
7572      ABI_MALLOC(rhotwgp, (npwx * nspinor))
7573      ABI_MALLOC(vc_sqrt_qbz, (npwx))
7574      spinor_padx = reshape([0, 0, npwx, npwx, 0, npwx, npwx, 0], [2, 4])
7575 
7576      ! Get Fourier components of the Coulomb interaction in the BZ
7577      ! In 3D systems, neglecting umklapp,  vc(Sq,sG)=vc(q,G)=4pi/|q+G|
7578      ! The same relation holds for 0-D systems, but not in 1-D or 2D systems. It depends on S.
7579      call gwr%vcgen%get_vc_sqrt(qq_bz, npwx, gvec_x, gwr%q0, gwr%cryst, vc_sqrt_qbz, gwr%gtau_comm%value)
7580 
7581      desc_ki => gwr%green_desc_kibz(ik_ibz)
7582 
7583      ! Get npw_k and kg_k for this k.
7584      if (isirr_k) then
7585        istwf_k = desc_ki%istwfk; npw_k = desc_ki%npw
7586        ABI_MALLOC(kg_k, (3, npw_k))
7587        kg_k(:,:) = desc_ki%gvec
7588      else
7589        istwf_k = 1
7590        call get_kg(ksum, istwf_k, dtset%ecut, cryst%gmet, npw_k, kg_k)
7591      end if
7592 
7593      ABI_MALLOC(ug_ksum, (npw_k * nspinor))
7594      ABI_MALLOC(cg1_ibz, (2, desc_ki%npw * nspinor))
7595      !ABI_MALLOC(cg2_bz, (2, npw_k * nspinor))
7596 
7597      ABI_MALLOC(gbound_ksum, (2*u_mgfft+8, 2))
7598      call sphereboundary(gbound_ksum, istwf_k, kg_k, u_mgfft, npw_k)
7599 
7600      ! ==========================
7601      ! Sum over (occupied) bands
7602      ! ==========================
7603      ugb_kibz => gwr%ugb(ik_ibz, spin)
7604 
7605      do il_b=1,ugb_kibz%sizeb_local(2)
7606        ! Distribute bands inside tau_comm as wavefunctions are replicated
7607        if (gwr%tau_comm%skip(il_b)) cycle
7608        band_sum = ugb_kibz%loc2gcol(il_b)
7609 
7610        ! Skip empty states. MRM: allow negative occ numbers.
7611        if (abs(qp_occ(band_sum, ik_ibz, spin)) < tol_empty) CYCLE
7612 
7613        !call wfd%get_ur(band_sum, ik_ibz, spin, ur_ibz)
7614 
7615        ! Compute ur_ksum(r) from the symmetrical image.
7616        ! I should rotate the g-vectors outside the loop and rotate ug here
7617        ! but at present I cannot use cgtk_rotate due to the symrel^T convention.
7618 
7619        if (isirr_k) then
7620          !call wfd%copy_cg(ibsum_kq, ikq_ibz, spin, bra_kq)
7621          ug_ksum(:) = ugb_kibz%buffer_cplx(:, il_b)
7622        else
7623          ! Reconstruct u_kq(G) from the IBZ image.
7624          !call wfd%copy_cg(ibsum_kq, ikq_ibz, spin, cgwork)
7625 
7626          ! FIXME: This is wrong if spc
7627          call c_f_pointer(c_loc(ug_ksum), cg2_ptr, shape=[2, npw_k * nspinor])
7628 
7629          !call c_f_pointer(c_loc(ugb_kibz%buffer_cplx(:, il_b)), cg1_ptr, shape=[2, desc_ki%npw * nspinor])
7630          !call cgtk_rotate(cryst, kk_ibz, isym_k, trev_k, g0_k, nspinor, ndat1, &
7631          !                 desc_ki%npw, desc_ki%gvec, &
7632          !                 npw_k, kg_k, desc_ki%istwfk, istwf_k, cg1_ptr, cg2_ptr, work_ngfft, work)
7633 
7634          cg1_ibz(1,:) = real(ugb_kibz%buffer_cplx(:, il_b))
7635          cg1_ibz(2,:) = aimag(ugb_kibz%buffer_cplx(:, il_b))
7636          call cgtk_rotate(cryst, kk_ibz, isym_k, trev_k, g0_k, nspinor, ndat1, &
7637                           desc_ki%npw, desc_ki%gvec, &
7638                           npw_k, kg_k, desc_ki%istwfk, istwf_k, cg1_ibz, cg2_ptr, work_ngfft, work)
7639        end if
7640 
7641        call fft_ug(npw_k, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwf_k, kg_k, gbound_ksum, &
7642                    ug_ksum, ur_ksum)
7643 
7644        if (any(g0 /= 0)) ur_ksum = ur_ksum * conjg(eig0r)
7645 
7646        ! Get all <k-q,band_sum,s|e^{-i(q+G).r}|s,jb,k>
7647        do jb=bmin,bmax
7648 
7649          ! FIXME: nspinor 2 is wrong as we have a 2x2 matrix
7650          ur_prod(:) = conjg(ur_ksum(:)) * ur_bdgw(:,jb)
7651          call fft_ur(npwx, u_nfft, nspinor, ndat1, u_mgfft, u_ngfft, istwfk1, gvec_x, gbound_x, &
7652                      ur_prod, rhotwg_ki(:,jb))
7653 
7654          ! Multiply by the square root of the Coulomb term
7655          ! In 3-D systems, the factor sqrt(4pi) is included
7656          do ii=1,nspinor
7657            spad = (ii-1) * npwx
7658            rhotwg_ki(spad+1:spad+npwx,jb) = rhotwg_ki(spad+1:spad + npwx,jb) * vc_sqrt_qbz(1:npwx)
7659          end do
7660 
7661          if (q_is_gamma) then
7662          !if (ik_bz == jk_bz) then
7663            ! Treat analytically the case q --> 0:
7664            !
7665            !   * The oscillator is evaluated at q = 0 as it is considered constant in the small cube around Gamma
7666            !     while the Colulomb term is integrated out.
7667            !   * If nspinor == 1, we have nonzero contribution only if band_sum == jb
7668            !   * If nspinor == 2, we evaluate <band_sum,up|jb,up> and <band_sum,dwn|jb,dwn>,
7669            !     and impose orthonormalization since npwwfn might be < npwvec.
7670            !   * Note the use of i_sz_resid and not i_sz, to account for the possibility
7671            !     to have generalized KS basis set from hybrid
7672 
7673            if (nspinor == 1) then
7674              rhotwg_ki(1, jb) = czero_gw
7675              if (band_sum == jb) rhotwg_ki(1,jb) = cmplx(sqrt(gwr%vcgen%i_sz), 0.0_gwp)
7676              !rhotwg_ki(1,jb) = czero_gw ! DEBUG
7677 
7678            else
7679              !ABI_ERROR("Not implemented Error")
7680              rhotwg_ki(1, jb) = zero; rhotwg_ki(npwx+1, jb) = zero
7681              if (band_sum == jb) then
7682                !ABI_CHECK(wfd%get_wave_ptr(band_sum, ik_ibz, spin, wave_sum, msg) == 0, msg)
7683                !cg_sum => wave_sum%ug
7684                !ABI_CHECK(wfd%get_wave_ptr(jb, jk_ibz, spin, wave_jb, msg) == 0, msg)
7685                !cg_jb  => wave_jb%ug
7686                !ctmp = xdotc(npw_k, cg_sum(1:), 1, cg_jb(1:), 1)
7687                rhotwg_ki(1, jb) = cmplx(sqrt(gwr%vcgen%i_sz), 0.0_gwp)  !* real(ctmp)
7688                !ctmp = xdotc(npw_k, cg_sum(npw_k+1:), 1, cg_jb(npw_k+1:), 1)
7689                rhotwg_ki(npwx+1, jb) = cmplx(sqrt(gwr%vcgen%i_sz), 0.0_gwp) ! * real(ctmp)
7690              end if
7691              !!!rhotwg_ki(1, jb) = zero; rhotwg_ki(npwx+1, jb) = zero
7692              !!! PAW is missing
7693            end if
7694          end if
7695 
7696        end do ! jb Got all matrix elements from bmin up to bmax.
7697 
7698        theta_mu_minus_esum  = fact_spin * qp_occ(band_sum, ik_ibz, spin)
7699        theta_mu_minus_esum2 = sqrt(abs(fact_spin * qp_occ(band_sum, ik_ibz, spin))) ! MBB Nat. orb. funct. approx. sqrt(occ)
7700 
7701        if (abs(theta_mu_minus_esum / fact_spin) >= tol_empty) then     ! MRM: allow negative occ numbers
7702          do kb=bmin,bmax
7703 
7704            ! Copy the ket Sigma_x |phi_{k,kb}>.
7705            rhotwgp(:) = rhotwg_ki(:, kb)
7706 
7707            ! Loop over the non-zero row elements of this column.
7708            ! If gwcalctyp <  20: only diagonal elements since QP == KS.
7709            ! If gwcalctyp >= 20:
7710            !      * Only off-diagonal elements connecting states with same character.
7711            !      * Only the upper triangle if HF, SEX, or COHSEX.
7712 
7713            do irow=1,Sigxij_tab(ikcalc, spin)%col(kb)%size1
7714              jb = Sigxij_tab(ikcalc, spin)%col(kb)%bidx(irow)
7715              rhotwg(:) = rhotwg_ki(:,jb)
7716 
7717              ! Calculate bare exchange <phi_jb|Sigma_x|phi_kb>.
7718              ! Do the scalar product only if band_sum is occupied.
7719              do iab=1,gwr%nsig_ab
7720                spadx1 = spinor_padx(1, iab); spadx2 = spinor_padx(2, iab)
7721                xdot_tmp = -XDOTC(npwx, rhotwg(spadx1+1:), 1, rhotwgp(spadx2+1:), 1)
7722                gwpc_sigxme  = xdot_tmp * theta_mu_minus_esum
7723                gwpc_sigxme2 = xdot_tmp * theta_mu_minus_esum2
7724 
7725                ! Accumulate and symmetrize Sigma_x matrix elements.
7726                ! -wtqm comes from time-reversal (exchange of band indices)
7727                is_idx = spin; if (nspinor == 2) is_idx = iab
7728                sigxme_tmp(jb, kb, is_idx) = sigxme_tmp(jb, kb, is_idx) + &
7729                   (wtqp + wtqm) * DBLE(gwpc_sigxme) + (wtqp - wtqm) * j_dpc * AIMAG(gwpc_sigxme)
7730                if (jb == kb) then
7731                  sigxcme_tmp(jb, is_idx) = sigxcme_tmp(jb, is_idx) + &
7732                    (wtqp + wtqm) * DBLE(gwpc_sigxme2) + (wtqp - wtqm) *j_dpc * AIMAG(gwpc_sigxme2)
7733                end if
7734 
7735                sigx(1, jb, kb, is_idx) = sigx(1, jb, kb, is_idx) + wtqp *      gwpc_sigxme
7736                sigx(2, jb, kb, is_idx) = sigx(2, jb, kb, is_idx) + wtqm *CONJG(gwpc_sigxme)
7737              end do
7738            end do ! irow
7739 
7740          end do ! kb
7741        end if
7742      end do ! band_sum
7743 
7744      ABI_FREE(gbound_x)
7745      ABI_FREE(kg_k)
7746      ABI_FREE(ug_ksum)
7747      ABI_FREE(cg1_ibz)
7748      !ABI_FREE(cg2_bz)
7749      ABI_FREE(gbound_ksum)
7750      ABI_FREE(gvec_x)
7751      ABI_FREE(rhotwg_ki)
7752      ABI_FREE(rhotwg)
7753      ABI_FREE(rhotwgp)
7754      ABI_FREE(vc_sqrt_qbz)
7755    end do ! my_ikf Got all diagonal (off-diagonal) matrix elements.
7756 
7757    ! Gather contributions from all the CPUs.
7758    call xmpi_sum(sigxme_tmp, gwr%kgt_comm%value, ierr)
7759    call xmpi_sum(sigxcme_tmp, gwr%kgt_comm%value, ierr)
7760    call xmpi_sum(sigx, gwr%kgt_comm%value, ierr)
7761 
7762    ! Multiply by constants. For 3D systems sqrt(4pi) is included in vc_sqrt_qbz.
7763    sigxme_tmp  = (one / (cryst%ucvol * gwr%nkbz)) * sigxme_tmp  ! * Sigp%sigma_mixing
7764    sigxcme_tmp = (one / (cryst%ucvol * gwr%nkbz)) * sigxcme_tmp ! * Sigp%sigma_mixing
7765    sigx        = (one / (cryst%ucvol * gwr%nkbz)) * sigx        ! * Sigp%sigma_mixing
7766 
7767    ! If we have summed over the IBZ_q, we have to average over degenerate states.
7768    ! Presently only diagonal terms are considered
7769    ! Note that here we pass ks_eig to sigx_symmetrize instead of qp_eig.
7770    ! The reason is that we use the eigenvalues to detect degeneracies before averaging
7771    ! and qp_eig may break degeneracies while ks_eig are much more accurate.
7772    ! Most of the breaking comes from the correlated part, likey due to the treatment of q --> 0.
7773 
7774    ! TODO QP-SCGW required a more involved approach, there is a check in sigma
7775    ! TODO it does not work if nspinor == 2.
7776 
7777    if (gwr%dtset%symsigma == 1) then
7778      call sigx_symmetrize(ikcalc_ibz, spin, bmin, bmax, nsppol, nspinor, gwr%nsig_ab, ks_eig, sigx, sigxme_tmp)
7779      !do ii=bmin, bmax; print *, "qp_eig:", ii, qp_eig(ii, ikcalc_ibz, spin) * Ha_eV; end do
7780      !call sigx_symmetrize(ikcalc_ibz, spin, bmin, bmax, nsppol, nspinor, gwr%nsig_ab, qp_eig, sigx, sigxme_tmp)
7781    end if
7782 
7783    ! Reconstruct the full sigma_x matrix from the upper triangle.
7784    if (gwr%nsig_ab == 1) then
7785      call hermitianize(sigxme_tmp(:,:,spin), "Upper")
7786    else
7787      ABI_WARNING("Should hermitianize non-collinear sigma!")
7788    end if
7789 
7790    ! Save exchange matrix in gwr%sigx_mat taking into account sig_diago.
7791    if (gwr%nsig_ab == 1) then
7792      if (gwr%sig_diago) then
7793        do jb=bmin,bmax
7794          gwr%sigx_mat(jb, 1, ikcalc, spin) = sigxme_tmp(jb,jb,spin)
7795        end do
7796      else
7797          gwr%sigx_mat(bmin:bmax, bmin:bmax, ikcalc, spin) = sigxme_tmp(bmin:bmax, bmin:bmax, spin)
7798      end if
7799    else
7800      if (gwr%sig_diago) then
7801        do iab=1,gwr%nsig_ab
7802          do jb=bmin,bmax
7803            gwr%sigx_mat(jb, 1, ikcalc, iab) = sigxme_tmp(jb,jb,iab)
7804          end do
7805        end do
7806      else
7807        gwr%sigx_mat(bmin:bmax, bmin:bmax, ikcalc, :) = sigxme_tmp(bmin:bmax, bmin:bmax, :)
7808      end if
7809    end if
7810 
7811    ABI_FREE(ur_bdgw)
7812    ABI_FREE(ur_prod)
7813    ABI_FREE(ur_ksum)
7814    ABI_FREE(eig0r)
7815    ABI_FREE(sigxme_tmp)
7816    ABI_FREE(sigxcme_tmp)
7817    ABI_FREE(sigx)
7818    call ltg_k%free()
7819    call cwtime_report(" Sigx_nk:", cpu_k, wall_k, gflops_k)
7820  end do ! ikcalc
7821  end do ! my_is
7822 
7823  if (gwr%spin_comm%nproc > 1) call xmpi_sum(gwr%sigx_mat, gwr%spin_comm%value, ierr)
7824 
7825  ABI_FREE(work)
7826  call sigijtab_free(Sigxij_tab)
7827  ABI_FREE(Sigxij_tab)
7828 
7829  ! Compute QP results. Done usually when gwr_task == G0v i.e. Hartree-Fock with KS states.
7830  compute_qp__ = .False.; if (present(compute_qp)) compute_qp__ = compute_qp
7831  if (compute_qp__ .and. gwr%comm%me == 0) then
7832    call write_notations(units)
7833    ! TODO
7834  end if
7835 
7836  call cwtime_report(" gwr_build_sigxme:", cpu_all, wall_all, gflops_all)
7837  call timab(1920, 2, tsec)
7838 
7839 end subroutine gwr_build_sigxme

m_gwr/gwr_build_tchi [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_tchi

FUNCTION

  High-level routine to compute the irreducible polarizability.

SOURCE

4043 subroutine gwr_build_tchi(gwr)
4044 
4045 !Arguments ------------------------------------
4046  class(gwr_t),target,intent(inout) :: gwr
4047 
4048 !Local variables-------------------------------
4049 !scalars
4050  integer :: my_is, my_it, my_ikf, ig, my_ir, my_nr, nrsp, npwsp, ncol_glob, col_bsize, my_iqi, gt_scbox_win
4051  integer :: idat, ndat, max_ndat, sc_nfft, sc_nfftsp, spin, ik_bz, iq_ibz, ikq_ibz, ikq_bz, ierr, ipm, itau, ig2 !, ii
4052  integer :: use_umklp ! ik_ibz, isym_k, trev_k, tsign_k, ! g0_k(3),
4053  integer(kind=XMPI_ADDRESS_KIND) :: buf_count
4054  real(dp) :: cpu_tau, wall_tau, gflops_tau, cpu_all, wall_all, gflops_all, cpu_ir, wall_ir, gflops_ir
4055  real(dp) :: cpu_ikf, wall_ikf, gflops_ikf
4056  real(dp) :: tchi_rfact, mem_mb, local_max, max_abs_imag_chit, wtqp, wtqm
4057  complex(gwpc) :: head_q
4058  complex(dp) :: chq(3), wng(3)
4059  logical :: q_is_gamma, use_shmem_for_k, use_mpi_for_k, print_time ! isirr_k,
4060  character(len=5000) :: msg
4061  type(desc_t),pointer :: desc_q ! desc_k,
4062  type(__slkmat_t) :: chi_rgp
4063  type(c_ptr) :: void_ptr
4064 !arrays
4065  integer :: sc_ngfft(18), gg(3), g0_kq(3), mask_qibz(gwr%nqibz), need_kibz(gwr%nkibz), got_kibz(gwr%nkibz)
4066  integer,allocatable :: green_scgvec(:,:), chi_scgvec(:,:)
4067  real(dp) :: kk_bz(3), kpq_bz(3), qq_ibz(3), tsec(2)
4068  complex(gwpc) ABI_ASYNC, contiguous, pointer :: gt_scbox(:,:,:)
4069  complex(gwpc),allocatable :: low_wing_q(:), up_wing_q(:), cemiqr(:)
4070  type(__slkmat_t) :: gkq_rpr_pm(2), gk_rpr_pm(2)
4071  type(__slkmat_t),allocatable :: gt_gpr(:,:), chiq_gpr(:), chiq_rpr(:)
4072  type(desc_t),target,allocatable :: desc_mykbz(:)
4073  type(littlegroup_t),allocatable :: ltg_qibz(:)
4074  type(fftbox_plan3_t) :: green_plan
4075  type(uplan_t) :: uplan_q
4076 
4077 ! *************************************************************************
4078 
4079  call cwtime(cpu_all, wall_all, gflops_all, "start")
4080  call timab(1923, 1, tsec)
4081 
4082  ABI_CHECK(gwr%tchi_space == "none", sjoin("tchi_space: ", gwr%tchi_space, " != none"))
4083  gwr%tchi_space = "itau"
4084 
4085  ! Allocate tchi_q(g,g') matrices
4086  mask_qibz = 0; mask_qibz(gwr%my_qibz_inds(:)) = 1
4087  call gwr%print_mem(unit=std_out)
4088  call gwr%malloc_free_mats(mask_qibz, "tchi", "malloc")
4089 
4090  max_abs_imag_chit = zero
4091 
4092  ! Setup FFT mesh in the supercell.
4093  sc_ngfft = gwr%g_ngfft
4094  sc_ngfft(1:3) = gwr%ngkpt * gwr%g_ngfft(1:3); sc_ngfft(4:6) = sc_ngfft(1:3)
4095  sc_nfft = product(sc_ngfft(1:3)); sc_nfftsp = sc_nfft * gwr%nspinor
4096 
4097  if (gwr%use_supercell_for_tchi) then
4098    ! ============================
4099    ! Chi algorithm with supercell
4100    ! ============================
4101    call print_chi_header()
4102 
4103    ! Be careful when using the FFT plan with ndat as ndat can change inside the loop if we start to block.
4104    ! Perhaps the safest approach would be to generate the plan on the fly.
4105    max_ndat = gwr%sc_batch_size
4106    use_mpi_for_k = gwr%sc_batch_size == gwr%kpt_comm%nproc .and. gwr%kpt_comm%nproc > 1
4107    use_mpi_for_k = .False.
4108 
4109    use_shmem_for_k = gwr%sc_batch_size == gwr%kpt_comm%nproc .and. gwr%kpt_comm%nproc > 1
4110    use_shmem_for_k = use_shmem_for_k .and. gwr%kpt_comm%can_use_shmem()
4111    !use_shmem_for_k = .False.
4112 
4113    if (use_shmem_for_k) then
4114      buf_count = 2 * (sc_nfftsp * max_ndat * 2)
4115      call gwr%kpt_comm%allocate_shared_master(buf_count, gwpc, xmpi_info_null, void_ptr, gt_scbox_win)
4116      call c_f_pointer(void_ptr, gt_scbox, shape=[sc_nfftsp, max_ndat, 2])
4117    end if
4118 
4119    call wrtout(std_out, sjoin(" use_mpi_for_k:", yesno(use_mpi_for_k)))
4120    call wrtout(std_out, sjoin(" use_shmem_for_k:", yesno(use_shmem_for_k)))
4121    mem_mb = (sc_nfftsp * max_ndat * 2 * gwpc) * b2Mb
4122    call wrtout(std_out, sjoin(" Memory for gt_scbox array:", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4123 
4124    if (.not. use_shmem_for_k) then
4125      ABI_MALLOC(gt_scbox, (sc_nfftsp, max_ndat, 2))
4126    end if
4127 
4128    ! Build plan for dense FFTs.
4129    call green_plan%from_ngfft(sc_ngfft, gwr%nspinor*max_ndat*2, gwr%dtset%gpu_option)
4130 
4131    ! The g-vectors in the supercell for G and tchi.
4132    ABI_MALLOC(green_scgvec, (3, gwr%green_mpw))
4133    ABI_MALLOC(chi_scgvec, (3, gwr%tchi_mpw))
4134    ABI_MALLOC(cemiqr, (gwr%g_nfft * gwr%nspinor)) ! The phase e^{-iq.r} in the unit cell.
4135    ABI_MALLOC(gt_gpr, (2, gwr%my_nkbz))
4136    ABI_MALLOC(chiq_gpr, (gwr%my_nqibz))
4137    ABI_MALLOC(desc_mykbz, (gwr%my_nkbz))
4138 
4139    ! Allocate PBLAS arrays for tchi_q(g',r) for all q in the IBZ treated by this MPI rank.
4140    ! Here we're gonna have a big allocation peak.
4141    do my_iqi=1,gwr%my_nqibz
4142      iq_ibz = gwr%my_qibz_inds(my_iqi)
4143      npwsp = gwr%tchi_desc_qibz(iq_ibz)%npw * gwr%nspinor
4144      ncol_glob = gwr%g_nfft * gwr%nspinor
4145      ABI_CHECK(block_dist_1d(ncol_glob, gwr%g_comm%nproc, col_bsize, msg), msg)
4146      call chiq_gpr(my_iqi)%init(npwsp, gwr%g_nfft * gwr%nspinor, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
4147    end do
4148    mem_mb = sum(slk_array_locmem_mb(chiq_gpr))
4149    call wrtout(std_out, sjoin(" Local memory for chi_q(g',r) matrices: ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4150 
4151    ! Loop over my spins and my taus.
4152    do my_is=1,gwr%my_nspins
4153      spin = gwr%my_spins(my_is)
4154      do my_it=1,gwr%my_ntau
4155        call cwtime(cpu_tau, wall_tau, gflops_tau, "start")
4156        itau = gwr%my_itaus(my_it)
4157        !if (my_it == 1 .and. gwr%comm%me == 0) call gwr%pstat%print([std_out], reload=.True.)
4158 
4159        ! G_k(g,g') --> G_k(g',r) e^{ik.r} for each k in the BZ treated by me.
4160        call gwr%get_myk_green_gpr(itau, spin, desc_mykbz, gt_gpr)
4161 
4162        !if (my_it == 1 .and. gwr%comm%me == 0) call gwr%pstat%print([std_out], reload=.True.)
4163 
4164        ! Loop over r in the unit cell that is now MPI-distributed inside g_comm.
4165        ! This is a bottleneck but perhaps one can take advantage of localization.
4166        ! Also, one can save all the FFTs in a matrix G(mnfft * ndat, my_nkbz) multiply by the e^{-ikr} phase
4167        ! and then use zgemm to compute Out(r,L) = [e^{-ikr}G_k(r)] e^{-ikL} with precomputed e^{-iLk} phases.
4168        my_nr = gt_gpr(1,1)%sizeb_local(2)
4169 
4170        do my_ir=1, my_nr, gwr%sc_batch_size
4171          ndat = blocked_loop(my_ir, my_nr, gwr%sc_batch_size)
4172          print_time = (gwr%comm%me == 0 .and. (my_ir <= 6 * gwr%sc_batch_size .or. mod(my_ir, LOG_MODR) == 0))
4173          if (print_time) call cwtime(cpu_ir, wall_ir, gflops_ir, "start")
4174 
4175          ! TODO: GPU version
4176 
4177 if (.not. use_shmem_for_k) then
4178 
4179          ! Insert G_k(g',r) in G'-space in the supercell FFT box (ndat vectors starting at my_ir).
4180          call gwr%gk_to_scbox(sc_ngfft, desc_mykbz, green_scgvec, my_ir, ndat, gt_gpr, gt_scbox)
4181 
4182          if (.not. use_mpi_for_k) then
4183            ! G(G',r) --> G(R',r) = sum_{k,g'} e^{-i(k+g').R'} G_k(g',r)
4184            if (gwr%kpt_comm%nproc > 1) call xmpi_sum(gt_scbox, gwr%kpt_comm%value, ierr)
4185            call green_plan%execute(gt_scbox(:,1,1), -1, iscale=0)
4186 
4187            ! Compute tchi(R',r) for this r and store it in (:,:,1). Note that results are real so one might use r2c FFT.
4188            ! Then back to tchi(G'=q+g',r) immediately with isign + 1.
4189            gt_scbox(:,:,1) = gt_scbox(:,:,1) * conjg(gt_scbox(:,:,2))
4190            !max_abs_imag_chit = max(max_abs_imag_chit, maxval(abs(aimag(gt_scbox(:,:,1)))))
4191            call green_plan%execute(gt_scbox(:,1,1), +1)
4192 
4193          else
4194            ! Reduce one G_k(tau) on the idat-1 proc and perform ndat FFTs in parallel.
4195            ! Finally, broadcast from the (idat-1) proc inside gwr%kpt_comm.
4196            do ipm=1,2
4197              do idat=1,ndat
4198                call xmpi_sum_master(gt_scbox(:,idat,ipm), idat-1, gwr%kpt_comm%value, ierr)
4199              end do
4200            end do
4201            idat = gwr%kpt_comm%me + 1
4202            do ipm=1,2
4203              call green_plan%execute(gt_scbox(:,idat,ipm), -1, ndat=gwr%nspinor, iscale=0)
4204            end do
4205            gt_scbox(:,idat,1) = gt_scbox(:,idat,1) * conjg(gt_scbox(:,idat,2))
4206            call green_plan%execute(gt_scbox(:,idat,1), +1, ndat=gwr%nspinor)
4207            do idat=1,ndat
4208              call xmpi_bcast(gt_scbox(:,idat,1), idat-1, gwr%kpt_comm%value, ierr)
4209            end do
4210          end if
4211 
4212 else
4213          ! use_shmem_for_k --> MPI shared window version. Only gt_scbox is shared.
4214          call gwr%gk_to_scbox(sc_ngfft, desc_mykbz, green_scgvec, my_ir, ndat, gt_gpr, gt_scbox, &
4215                               gt_scbox_win=gt_scbox_win)
4216 
4217          ! Now each MPI proc operates on different idat entries.
4218          !IF (.not. MPI_ASYNC_PROTECTS_NONBLOCKING) CALL MPI_F_SYNC_REG(gt_scbox)
4219          call xmpi_win_fence(gt_scbox_win)
4220          idat = gwr%kpt_comm%me + 1
4221          if (idat <= ndat) then
4222            do ipm=1,2
4223              call green_plan%execute(gt_scbox(:,idat,ipm), -1, ndat=gwr%nspinor, iscale=0)
4224            end do
4225            gt_scbox(:,idat,1) = gt_scbox(:,idat,1) * conjg(gt_scbox(:,idat,2))
4226            call green_plan%execute(gt_scbox(:,idat,1), +1, ndat=gwr%nspinor)
4227          end if
4228          !IF (.not. MPI_ASYNC_PROTECTS_NONBLOCKING) CALL MPI_F_SYNC_REG(gt_scbox)
4229          !call xmpi_barrier(gwr%kpt_comm%value)
4230          call xmpi_win_fence(gt_scbox_win)
4231 end if
4232 
4233          ! Now extract tchi_q(g',r) on the ecuteps (q+g)-sphere from the FFT box in the supercell
4234          ! and save data in chiq_gpr PBLAS matrix. Only my q-points in the IBZ are considered.
4235          ! Alternatively, one can avoid the above FFT, use zero-padded to go from the supercell
4236          ! to the ecuteps g-sphere inside the my_iqi loop. This approach should play well with k-point parallelism.
4237          do my_iqi=1,gwr%my_nqibz
4238            iq_ibz = gwr%my_qibz_inds(my_iqi); qq_ibz = gwr%qibz(:, iq_ibz); desc_q => gwr%tchi_desc_qibz(iq_ibz)
4239            gg = nint(qq_ibz * gwr%ngqpt)
4240            do ig=1,desc_q%npw
4241              chi_scgvec(:,ig) = gg + gwr%ngqpt(:) * desc_q%gvec(:,ig) ! q+g
4242            end do
4243            call box2gsph(sc_ngfft, desc_q%npw, gwr%nspinor * ndat, chi_scgvec, &
4244                          gt_scbox(:,1,1), chiq_gpr(my_iqi)%buffer_cplx(:,my_ir))
4245            ! TODO:
4246            !call desc_q%box2gsph(qq_ibz, gwr%ngqpt, sc_ngfft, gwr%nspinor * ndat, &
4247            !                     gt_scbox(:,1,1), chiq_gpr(my_iqi)%buffer_cplx(:,my_ir))
4248          end do ! my_iqi
4249 
4250          if (print_time) then
4251            write(msg,'(4x,3(a,i0),a)')"Chi my_ir [", my_ir, "/", my_nr, "] (tot: ", gwr%g_nfft, ")"
4252            call cwtime_report(msg, cpu_ir, wall_ir, gflops_ir)
4253          end if
4254       end do ! my_ir (end cpu intensive loop)
4255 
4256        ! Free descriptors and PBLAS matrices in kBZ.
4257        call desc_array_free(desc_mykbz)
4258        call slk_array_free(gt_gpr)
4259 
4260        ! Now we have tchi_q(g',r).
4261        ! For each IBZ q-point treated by this MPI proc, do:
4262        !
4263        !     1) MPI transpose to have tchi_q(r,g')
4264        !     2) FFT along the first dimension to get tchi_q(g,g') and store it in gwr%tchi_qibz
4265        !
4266        tchi_rfact = one / gwr%g_nfft / gwr%cryst%ucvol / (gwr%nkbz * gwr%nqbz)
4267        do my_iqi=1,gwr%my_nqibz
4268          iq_ibz = gwr%my_qibz_inds(my_iqi)
4269          q_is_gamma = normv(gwr%qibz(:,iq_ibz), gwr%cryst%gmet, "G") < GW_TOLQ0
4270          desc_q => gwr%tchi_desc_qibz(iq_ibz)
4271 
4272          ! Note the minus sign in q.
4273          if (.not. q_is_gamma) call calc_ceikr(-gwr%qibz(:,iq_ibz), gwr%g_ngfft, gwr%g_nfft, gwr%nspinor, cemiqr)
4274 
4275          ! MPI-transposition: tchi_q(g',r) => tchi_q(r,g')
4276          call chiq_gpr(my_iqi)%ptrans("N", chi_rgp)
4277 
4278          ! FFT tchi_q(r,g') --> tchi_q(g,g'). Results stored in gwr%tchi_qibz.
4279          call uplan_q%init(desc_q%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, istwfk1, &
4280                            desc_q%gvec, gwpc, gwr%dtset%gpu_option)
4281 
4282          do ig2=1, chi_rgp%sizeb_local(2), gwr%uc_batch_size
4283            ndat = blocked_loop(ig2, chi_rgp%sizeb_local(2), gwr%uc_batch_size)
4284 
4285            if (.not. q_is_gamma) then
4286              !$OMP PARALLEL DO
4287              do idat=0,ndat-1
4288                chi_rgp%buffer_cplx(:,ig2+idat) = cemiqr(:) * chi_rgp%buffer_cplx(:,ig2+idat)
4289              end do
4290            end if
4291 
4292            call uplan_q%execute_rg(ndat, chi_rgp%buffer_cplx(:, ig2), &
4293                                    gwr%tchi_qibz(iq_ibz, itau, spin)%buffer_cplx(:, ig2))
4294 
4295            !$OMP PARALLEL DO
4296            do idat=0,ndat-1
4297              gwr%tchi_qibz(iq_ibz, itau, spin)%buffer_cplx(:, ig2 + idat) = &
4298              gwr%tchi_qibz(iq_ibz, itau, spin)%buffer_cplx(:, ig2 + idat) * tchi_rfact
4299            end do
4300            !call gwr%tchi_qibz(iq_ibz, itau, spin)%scale_rows(ig2, ndat, tchi_rfact)
4301            !call xscal(npwsp, real(sqrt(gt_rfact), kind=gwpc), work_gb%buffer_cplx(:,il_b), 1)
4302          end do ! ig2
4303 
4304          call uplan_q%free()
4305          call chi_rgp%free()
4306 
4307          call gwr%tchi_qibz(iq_ibz, itau, spin)%set_imag_diago_to_zero(local_max)
4308        end do ! my_iqi
4309 
4310        write(msg,'(3(a,i0),a)')" My itau [", my_it, "/", gwr%my_ntau, "] (tot: ", gwr%ntau, ")"
4311        call cwtime_report(msg, cpu_tau, wall_tau, gflops_tau, end_str=ch10)
4312      end do ! my_it
4313    end do ! my_is
4314 
4315    if (use_shmem_for_k) then
4316      call xmpi_win_free(gt_scbox_win)
4317    else
4318      ABI_FREE(gt_scbox)
4319    end if
4320 
4321    ABI_FREE(green_scgvec)
4322    ABI_FREE(chi_scgvec)
4323    ABI_FREE(cemiqr)
4324    ABI_FREE(gt_gpr)
4325    ABI_FREE(desc_mykbz)
4326    call slk_array_free(chiq_gpr)
4327    ABI_FREE(chiq_gpr)
4328    call green_plan%free()
4329 
4330   else  ! not gwr%use_supercell_for_tchi
4331     ! ===================================================================
4332     ! Mixed-space algorithm in the unit cell with convolutions in k-space
4333     ! ===================================================================
4334     call print_chi_header()
4335 
4336     ! Allocate memory for G_k(r',r) and chi_q(r',r)
4337     ! Need all nqibz matrices here as the iq_ibz loop is the innermost one unlike in the legacy GW code.
4338     nrsp = gwr%g_nfft * gwr%nspinor
4339     col_bsize = nrsp / gwr%g_comm%nproc; if (mod(nrsp, gwr%g_comm%nproc) /= 0) col_bsize = col_bsize + 1
4340     ABI_MALLOC(chiq_rpr, (gwr%nqibz))
4341     do iq_ibz=1,gwr%nqibz
4342       call chiq_rpr(iq_ibz)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
4343     end do
4344     do ipm=1,2
4345       ! TODO: Can save memory here as we don't need +/- tau for each k/k+q
4346       call gk_rpr_pm(ipm)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
4347       call gkq_rpr_pm(ipm)%init(nrsp, nrsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
4348     end do
4349 
4350     mem_mb = sum(slk_array_locmem_mb(chiq_rpr)) + sum(slk_array_locmem_mb(gk_rpr_pm)) + sum(slk_array_locmem_mb(gkq_rpr_pm))
4351     call wrtout(std_out, sjoin(" Local memory for chi_q(r',r) (gt_gpr): ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4352 
4353     ! * The little group is needed when symchi == 1
4354     ! * If use_umklp == 1 then symmetries requiring an umklapp to preserve qibz are included as well.
4355     ! * Note that TR is not yet supported so timrev is set to 1 even if TR has been used to generate the GS IBZ.
4356     ABI_MALLOC(ltg_qibz, (gwr%nqibz))
4357     use_umklp = 1
4358     do iq_ibz=1,gwr%nqibz
4359       call ltg_qibz(iq_ibz)%init(gwr%qibz(:,iq_ibz), gwr%nkbz, gwr%kbz, gwr%cryst, use_umklp, npwe=0, timrev=1)
4360       !call ltg_qibz(iq_ibz)%print(unit=std_out, prtvol=gwr%dtset%prtvol)
4361     end do
4362 
4363     need_kibz = 0
4364     do my_ikf=1,gwr%my_nkbz
4365       ik_bz = gwr%my_kbz_inds(my_ikf); kk_bz = gwr%kbz(:, ik_bz)
4366       do iq_ibz=1,gwr%nqibz
4367         qq_ibz = gwr%qibz(:, iq_ibz)
4368         kpq_bz = kk_bz + qq_ibz
4369         !kpq_bz = qq_ibz - kk_bz
4370         !kpq_bz = kk_bz - qq_ibz
4371         ! TODO: here I may need to take into account the umklapp
4372         call findqg0(ikq_bz, g0_kq, kpq_bz, gwr%nkbz, gwr%kbz, gwr%mG0)
4373         !ABI_CHECK(all(g0_kq == 0), sjoin("g0_kq != 0, kk_bz", ktoa(kpq_bz), "qq_ibz:", ktoa(qq_ibz)))
4374         ikq_ibz = gwr%kbz2ibz(1,ikq_bz)
4375         need_kibz(ikq_ibz) = 1
4376       end do
4377     end do
4378 
4379     do my_is=1,gwr%my_nspins
4380     spin = gwr%my_spins(my_is)
4381     do my_it=1,gwr%my_ntau
4382       call cwtime(cpu_tau, wall_tau, gflops_tau, "start")
4383       itau = gwr%my_itaus(my_it)
4384 
4385       ! Redistribute G_k(g,g') with k in the IBZ so that each MPI proc
4386       ! can reconstruct G_{k+q} in the BZ inside the MPI-distributed loops.
4387       ! TODO: support for ipm_list else we have a memory leak.
4388       call gwr%redistrib_gt_kibz(itau, spin, need_kibz, got_kibz, "communicate") !ipm_list=
4389 
4390       ! Sum over my k-points in the BZ.
4391       call slk_array_set(chiq_rpr, czero)
4392 
4393       do my_ikf=1,gwr%my_nkbz
4394         print_time = gwr%comm%me == 0 .and. (my_ikf <= LOG_MODK .or. mod(my_ikf, LOG_MODK) == 0)
4395         if (print_time) call cwtime(cpu_ikf, wall_ikf, gflops_ikf, "start")
4396         ik_bz = gwr%my_kbz_inds(my_ikf); kk_bz = gwr%kbz(:, ik_bz)
4397 
4398         !ik_ibz = gwr%kbz2ibz(1, ik_bz); isym_k = gwr%kbz2ibz(2, ik_bz)
4399         !trev_k = gwr%kbz2ibz(6, ik_bz); g0_k = gwr%kbz2ibz(3:5, ik_bz)
4400         !isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
4401         !tsign_k = merge(1, -1, trev_k == 0)
4402         !if (.not. isirr_k) cycle
4403 
4404         ! Use symmetries to get G_kbz(g,g') from the IBZ, then G_kbz(g,g') -> G_kbz(r',r).
4405         ! TODO: here I may need to take into account the umklapp
4406         call gwr%get_gkbz_rpr_pm(ik_bz, itau, spin, gk_rpr_pm, ipm_list=[1]) ! g0=??
4407 
4408         do iq_ibz=1,gwr%nqibz
4409           if (gwr%dtset%symchi /= 0 .and. ltg_qibz(iq_ibz)%ibzq(ik_bz) == 0) cycle ! FIXME: iq_bz or ikq?
4410           qq_ibz = gwr%qibz(:,iq_ibz)
4411           kpq_bz = kk_bz + qq_ibz
4412           !kpq_bz = qq_ibz - kk_bz
4413 
4414           ! TODO: here I may need to take into account the umklapp if k+q is outside the BZ.
4415           call findqg0(ikq_bz, g0_kq, kpq_bz, gwr%nkbz, gwr%kbz, gwr%mG0)
4416           !ABI_CHECK(all(g0_kq == 0), sjoin("g0_kq != 0, kk_bz", ktoa(kpq_bz), "qq_ibz:", ktoa(qq_ibz)))
4417 
4418           ! Use symmetries to get G_kqbz(g,g') from the IBZ, then G_kqbz(g,g') -> G_kqbz(r',r).
4419           ! Also, we don't need G(+/-t) for both k, k+q wavevectors.
4420           call gwr%get_gkbz_rpr_pm(ikq_bz, itau, spin, gkq_rpr_pm, g0=g0_kq, ipm_list=[2])
4421 
4422           ! The weight depends on q_ibz and the symmetries of the little group of qq_ibz.
4423           wtqp = one / gwr%nkbz; wtqm = zero
4424           if (gwr%dtset%symchi /= 0) then
4425             wtqp = (one * sum(ltg_qibz(iq_ibz)%wtksym(1,:,ik_bz))) / gwr%nkbz
4426             wtqm = (one * sum(ltg_qibz(iq_ibz)%wtksym(2,:,ik_bz))) / gwr%nkbz
4427             ABI_CHECK(wtqm == zero, sjoin("TR is not yet implemented:, wqtm:", ftoa(wtqm)))
4428           end if
4429 
4430           chiq_rpr(iq_ibz)%buffer_cplx = chiq_rpr(iq_ibz)%buffer_cplx + &
4431              !wtqp * gkq_rpr_pm(1)%buffer_cplx * conjg(gk_rpr_pm(2)%buffer_cplx)  ! This should be OK
4432              wtqp * gk_rpr_pm(1)%buffer_cplx * conjg(gkq_rpr_pm(2)%buffer_cplx)   ! RECHECK EQ. This one works
4433                                                                                   ! but requires ptrans with C
4434         end do ! my_iqi
4435 
4436         if (print_time) then
4437           write(msg,'(4x,3(a,i0),a)')"Chi my_ikf [", my_ikf, "/", gwr%my_nkbz, "] (tot: ", gwr%nkbz, ")"
4438           call cwtime_report(msg, cpu_ikf, wall_ikf, gflops_ikf)
4439         end if
4440       end do ! my_ikf
4441 
4442       ! Deallocate got_kibz Green's functions.
4443       call gwr%redistrib_gt_kibz(itau, spin, need_kibz, got_kibz, "free")
4444 
4445       ! From chi_q(r',r) to chi_q(g,g') for each q in the IBZ.
4446       do iq_ibz=1,gwr%nqibz
4447         call xmpi_sum(chiq_rpr(iq_ibz)%buffer_cplx, gwr%kpt_comm%value, ierr)
4448       end do
4449 
4450       do iq_ibz=1,gwr%nqibz
4451         if (.not. any(iq_ibz == gwr%my_qibz_inds)) cycle
4452         ! TODO: Recheck API and scaling factor.
4453         call gwr_rpr_to_ggp(gwr, gwr%tchi_desc_qibz(iq_ibz), chiq_rpr(iq_ibz), gwr%tchi_qibz(iq_ibz,itau,spin))
4454         tchi_rfact = one / gwr%cryst%ucvol  !/ (gwr%nkbz * gwr%nqbz)
4455         gwr%tchi_qibz(iq_ibz,itau,spin)%buffer_cplx = gwr%tchi_qibz(iq_ibz,itau,spin)%buffer_cplx * tchi_rfact
4456       end do ! my_iqi
4457 
4458       write(msg,'(3(a,i0),a)')" My itau [", my_it, "/", gwr%my_ntau, "] (tot: ", gwr%ntau, ")"
4459       call cwtime_report(msg, cpu_tau, wall_tau, gflops_tau)
4460    end do ! my_it
4461    end do ! spin
4462 
4463    ! Free memory
4464    call slk_array_free(gk_rpr_pm); call slk_array_free(gkq_rpr_pm); call slk_array_free(chiq_rpr)
4465    ABI_FREE(chiq_rpr)
4466    do iq_ibz=1,gwr%nqibz
4467      call ltg_qibz(iq_ibz)%free()
4468    end do
4469    ABI_FREE(ltg_qibz)
4470    !call wrtout(std_out, " Mixed space algorithm for chi completed")
4471  end if
4472 
4473  !call wrtout(std_out, sjoin(" max_abs_imag_chit", ftoa(max_abs_imag_chit)))
4474 
4475  ! Print trace of chi_q(i tau) matrices for testing purposes.
4476  if (gwr%dtset%prtvol > 0) call gwr%print_trace("tchi_qibz")
4477 
4478  ! Transform irreducible tchi from imaginary tau to imaginary omega.
4479  ! Also sum over spins to get total tchi if collinear spin.
4480  call gwr%cos_transform("tchi", "it2w", sum_spins=.True.)
4481 
4482  if (gwr%kpt_comm%me == 0) then
4483    ! ===================================================
4484    ! ==== Construct head and wings from the tensor =====
4485    ! ===================================================
4486    associate (desc_q0 => gwr%tchi_desc_qibz(1), mat_ts => gwr%tchi_qibz(1,:,:))
4487    ABI_CHECK_IEQ(desc_q0%ig0, 1, "ig0 should be 1")
4488    ABI_MALLOC(up_wing_q, (desc_q0%npw))
4489    ABI_MALLOC(low_wing_q, (desc_q0%npw))
4490 
4491    do my_is=1,gwr%my_nspins
4492      spin = gwr%my_spins(my_is)
4493      do my_it=1,gwr%my_ntau
4494        itau = gwr%my_itaus(my_it)
4495 
4496        do ig=2,desc_q0%npw
4497          wng = gwr%chi0_uwing_myw(:,ig, my_it)
4498          up_wing_q(ig) = vdotw(gwr%q0, wng, gwr%cryst%gmet, "G")
4499          wng = gwr%chi0_lwing_myw(:,ig,my_it)
4500          low_wing_q(ig) = vdotw(gwr%q0, wng, gwr%cryst%gmet, "G")
4501        end do
4502        chq = matmul(gwr%chi0_head_myw(:,:,my_it), gwr%q0)
4503        head_q = vdotw(gwr%q0, chq, gwr%cryst%gmet, "G")
4504 
4505        call mat_ts(itau, spin)%set_head_and_wings(head_q, low_wing_q, up_wing_q)
4506      end do ! my_it
4507    end do ! my_is
4508    end associate
4509    ABI_FREE(up_wing_q)
4510    ABI_FREE(low_wing_q)
4511  end if
4512 
4513  ! Print trace of chi_q(i omega) matrices for testing purposes.
4514  if (gwr%dtset%prtvol > 0) call gwr%print_trace("tchi_qibz")
4515 
4516  ! Write file with chi0(i omega) if asked by user.
4517  if (gwr%dtset%prtsuscep > 0) call gwr%ncwrite_tchi_wc("tchi", trim(gwr%dtfil%filnam_ds(4))//'_TCHIM.nc')
4518 
4519  call cwtime_report(" gwr_build_tchi:", cpu_all, wall_all, gflops_all)
4520  call timab(1923, 2, tsec)
4521 
4522 contains
4523 
4524 subroutine print_chi_header()
4525  if (gwr%comm%me /= 0) return
4526  if (gwr%use_supercell_for_tchi) then
4527    call wrtout(std_out, " Building chi0 in the supercell with FFTs ", pre_newlines=2)
4528  else
4529    call wrtout(std_out, " Building chi_q(r,r') with convolutions in k-space:", pre_newlines=2)
4530  end if
4531  call wrtout(std_out, sjoin(" gwr_np_kgts:", ltoa(gwr%dtset%gwr_np_kgts)))
4532  call wrtout(std_out, sjoin(" ngkpt:", ltoa(gwr%ngkpt), ", ngqpt:", ltoa(gwr%ngqpt)))
4533  call wrtout(std_out, sjoin(" gwr_boxcutmin:", ftoa(gwr%dtset%gwr_boxcutmin)))
4534  call wrtout(std_out, sjoin(" sc_ngfft:", ltoa(sc_ngfft(1:8))))
4535  call wrtout(std_out, sjoin(" my_ntau:", itoa(gwr%my_ntau), ", ntau:", itoa(gwr%ntau)))
4536  call wrtout(std_out, sjoin(" my_nkbz:", itoa(gwr%my_nkbz), ", nkbz:", itoa(gwr%nkbz)))
4537  call wrtout(std_out, sjoin(" my_nkibz:", itoa(gwr%my_nkibz), ", nkibz:", itoa(gwr%nkibz)))
4538  call wrtout(std_out, sjoin("- FFT uc_batch_size:", itoa(gwr%uc_batch_size)))
4539  call wrtout(std_out, sjoin("- FFT sc_batch_size:", itoa(gwr%sc_batch_size)), do_flush=.True.)
4540 end subroutine print_chi_header
4541 
4542 end subroutine gwr_build_tchi

m_gwr/gwr_build_wc [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_build_wc

FUNCTION

  Compute Wc(i tau,g,g') from tchi(i omega,g,g')

INPUTS

OUTPUT

SOURCE

4892 subroutine gwr_build_wc(gwr)
4893 
4894 !Arguments ------------------------------------
4895  class(gwr_t),target,intent(inout) :: gwr
4896 
4897 #ifndef FC_CRAY
4898 !Local variables-------------------------------
4899 !scalars
4900  integer,parameter :: master = 0
4901  integer :: my_iqi, my_it, my_is, iq_ibz, spin, itau, iw, ierr
4902  integer :: il_g1, il_g2, ig1, ig2, iglob1, iglob2, ig0
4903  real(dp) :: cpu_all, wall_all, gflops_all, cpu_q, wall_q, gflops_q
4904  logical :: q_is_gamma, free_tchi, print_time
4905  character(len=5000) :: msg
4906  complex(dpc) :: vcs_g1, vcs_g2
4907  type(__slkmat_t) :: em1
4908  type(yamldoc_t) :: ydoc
4909 !arrays
4910  real(dp) :: qq_ibz(3), tsec(2)
4911  complex(dpc) :: em1_wq(gwr%ntau, gwr%nqibz), eps_wq(gwr%ntau, gwr%nqibz)
4912 
4913 ! *************************************************************************
4914 
4915  call cwtime(cpu_all, wall_all, gflops_all, "start")
4916  call timab(1924, 1, tsec)
4917  call wrtout([std_out, ab_out], " Building correlated screening Wc ...", pre_newlines=2)
4918  ABI_CHECK(gwr%tchi_space == "iomega", sjoin("tchi_space: ", gwr%tchi_space, " != iomega"))
4919 
4920  if (allocated(gwr%wc_qibz)) then
4921    call slk_array_free(gwr%wc_qibz)
4922    ABI_FREE(gwr%wc_qibz)
4923    gwr%wc_space = "none"
4924  end if
4925 
4926  ABI_CHECK(gwr%wc_space == "none", sjoin("wc_space: ", gwr%wc_space, " != none"))
4927  gwr%wc_space = "iomega"
4928 
4929  ! =======================================
4930  ! Allocate PBLAS arrays for wc_qibz(g,g')
4931  ! =======================================
4932  ! Note that we have already summed tchi over spin.
4933  ! Also, G=0 corresponds to iglob = 1 as only q-points in the IBZ are treated here.
4934  ! This is not true for the other q-points in the full BZ as we may have a non-zero umklapp g0_q.
4935  ABI_MALLOC(gwr%wc_qibz, (gwr%nqibz, gwr%ntau, gwr%nsppol))
4936 
4937  free_tchi = .True.; if (free_tchi) gwr%tchi_space = "none"
4938  em1_wq = zero; eps_wq = zero
4939 
4940  ! If possible, use 2d rectangular grid of processors for diagonalization.
4941  !call slkproc_4diag%init(gwr%g_comm%value)
4942 
4943  do my_iqi=1,gwr%my_nqibz
4944    print_time = gwr%comm%me == 0 .and. (my_iqi <= LOG_MODK .or. mod(my_iqi, LOG_MODK) == 0)
4945    if (print_time) call cwtime(cpu_q, wall_q, gflops_q, "start")
4946    iq_ibz = gwr%my_qibz_inds(my_iqi); qq_ibz = gwr%qibz(:, iq_ibz)
4947    q_is_gamma = normv(qq_ibz, gwr%cryst%gmet, "G") < GW_TOLQ0
4948 
4949    associate (desc_q => gwr%tchi_desc_qibz(iq_ibz))
4950    ig0 = desc_q%ig0
4951 
4952    ! The spin loop is needed so that procs in different pools can operate
4953    ! on their own matrix that has been already summed over (collinear) spins.
4954    do my_is=1,gwr%my_nspins
4955      spin = gwr%my_spins(my_is)
4956      do my_it=1,gwr%my_ntau
4957        itau = gwr%my_itaus(my_it)
4958 
4959        ! Build symmetrized RPA epsilon: 1 - Vc^{1/2} chi0 Vc^{1/2}
4960        associate (wc => gwr%wc_qibz(iq_ibz, itau, spin))
4961        call gwr%tchi_qibz(iq_ibz, itau, spin)%copy(wc)
4962        if (free_tchi) call gwr%tchi_qibz(iq_ibz, itau, spin)%free()
4963 
4964        do il_g2=1,wc%sizeb_local(2)
4965          iglob2 = wc%loc2gcol(il_g2)
4966          ig2 = mod(iglob2 - 1, desc_q%npw) + 1
4967          vcs_g2 = desc_q%vc_sqrt(ig2)
4968          do il_g1=1,wc%sizeb_local(1)
4969            iglob1 = wc%loc2grow(il_g1)
4970            ig1 = mod(iglob1 - 1, desc_q%npw) + 1
4971            vcs_g1 = desc_q%vc_sqrt(ig1)
4972            wc%buffer_cplx(il_g1, il_g2) = -wc%buffer_cplx(il_g1, il_g2) * vcs_g1 * vcs_g2
4973            if (iglob1 == iglob2) then
4974              wc%buffer_cplx(il_g1, il_g2) = one + wc%buffer_cplx(il_g1, il_g2)
4975              if (iglob1 == ig0 .and. iglob2 == ig0) then
4976                ! Store epsilon_{iw, iq_ibz}(0, 0)
4977                ! Rescale by np_qibz because we will MPI reduce this array.
4978                eps_wq(itau, iq_ibz) = wc%buffer_cplx(il_g1, il_g2) / gwr%np_qibz(iq_ibz)
4979              end if
4980            end if
4981          end do ! il_g1
4982        end do ! il_g2
4983 
4984        ! Invert symmetrized epsilon.
4985        ! NB: PZGETRF requires square block cyclic decomposition along the two axes
4986        ! hence we have to redistribute the data before calling invert.
4987 
4988        call wc%change_size_blocs(em1) ! processor=slkproc_4diag
4989        !call em1%invert()
4990        call em1%hpd_invert("U") ! TODO: Can use hpd_invert
4991        call wc%take_from(em1, free=.True.)  ! processor=wc%processor)
4992 
4993        !call wrtout(std_out, sjoin(" e-1 at q:", ktoa(qq_ibz), "i omega:", ftoa(gwr%iw_mesh(itau) * Ha_eV), "eV"))
4994        !call print_arr(wc%buffer_cplx, unit=std_out)
4995 
4996        ! Build Wc(q, iw) = e^{-1}_q(g,g',iw) - delta_{gg'} v_q(g,g') by removing bare vc
4997        do il_g2=1,wc%sizeb_local(2)
4998          iglob2 = wc%loc2gcol(il_g2)
4999          ig2 = mod(iglob2 - 1, desc_q%npw) + 1
5000          vcs_g2 = desc_q%vc_sqrt(ig2)
5001          do il_g1=1,wc%sizeb_local(1)
5002            iglob1 = wc%loc2grow(il_g1)
5003            ig1 = mod(iglob1 - 1, desc_q%npw) + 1
5004            vcs_g1 = desc_q%vc_sqrt(ig1)
5005 
5006            if (iglob1 == ig0 .and. iglob2 == ig0) then
5007              ! Store epsilon^{-1}_{iw, iq_ibz}(0, 0). Rescale by np_qibz because we will MPI reduce this array.
5008              em1_wq(itau, iq_ibz) = wc%buffer_cplx(il_g1, il_g2) / gwr%np_qibz(iq_ibz)
5009            end if
5010 
5011            ! Subtract exchange part.
5012            if (iglob1 == iglob2) wc%buffer_cplx(il_g1, il_g2) = wc%buffer_cplx(il_g1, il_g2) - one
5013 
5014            ! Handle divergence in Wc for q --> 0
5015            if (q_is_gamma .and. (iglob1 == ig0 .or. iglob2 == ig0)) then
5016              if (iglob1 == ig0 .and. iglob2 == ig0) then
5017                vcs_g1 = sqrt(gwr%vcgen%i_sz); vcs_g2 = sqrt(gwr%vcgen%i_sz)
5018              else if (iglob1 == ig0) then
5019                !vcs_g1 = (four_pi) ** (three/two) * q0sph ** 2 / two
5020                vcs_g1 = sqrt(gwr%vcgen%i_sz)
5021              else if (iglob2 == ig0) then
5022                !vcs_g2 = (four_pi) ** (three/two) * q0sph ** 2 / two
5023                vcs_g2 = sqrt(gwr%vcgen%i_sz)
5024              end if
5025            end if
5026 
5027            wc%buffer_cplx(il_g1, il_g2) = wc%buffer_cplx(il_g1, il_g2) * vcs_g1 * vcs_g2 / gwr%cryst%ucvol
5028          end do ! il_g1
5029        end do ! il_g2
5030        end associate
5031 
5032      end do  ! my_it
5033    end do ! my_is
5034    end associate
5035 
5036    if (print_time) then
5037      write(msg,'(4x,2(a,i0),a)')"My iqi [", my_iqi, "/", gwr%my_nqibz, "]"
5038      call cwtime_report(msg, cpu_q, wall_q, gflops_q)
5039    end if
5040  end do ! my_iqi
5041 
5042  !call slkproc_4diag%free()
5043 
5044  call xmpi_sum_master(em1_wq, master, gwr%kgt_comm%value, ierr)
5045  call xmpi_sum_master(eps_wq, master, gwr%kgt_comm%value, ierr)
5046 
5047  if (gwr%comm%me == master) then
5048    ! Print results to ab_out for testing purposes.
5049    ydoc = yamldoc_open('EMACRO_WITHOUT_LOCAL_FIELDS') !, width=11, real_fmt='(3f8.3)')
5050    call ydoc%open_tabular("epsilon_{iw, q -> Gamma}(0,0)") ! comment="(iomega, iq_ibz)")
5051    do iw=1,gwr%ntau
5052      write(msg, "(3(es16.8,2x))") gwr%iw_mesh(iw), real(eps_wq(iw, 1)), aimag(eps_wq(iw, 1))
5053      call ydoc%add_tabular_line(msg)
5054    end do
5055    call ydoc%write_units_and_free([ab_out, std_out])
5056 
5057    ydoc = yamldoc_open('EMACRO_WITH_LOCAL_FIELDS') !, width=11, real_fmt='(3f8.3)')
5058    call ydoc%open_tabular("epsilon_{iw, q -> Gamma}(0,0)") !, comment="(iomega, iq_ibz)")
5059    do iw=1,gwr%ntau
5060      write(msg, "(3(es16.8,2x))") gwr%iw_mesh(iw), real(em1_wq(iw, 1)), aimag(em1_wq(iw, 1))
5061      call ydoc%add_tabular_line(msg)
5062    end do
5063    call ydoc%write_units_and_free([ab_out, std_out])
5064  end if
5065 
5066  ! Print trace of wc_q(itau) matrices for testing purposes.
5067  if (gwr%dtset%prtvol > 0) call gwr%print_trace("wc_qibz")
5068 
5069  ! Write file with Wc(i omega)
5070  !if (gwr%dtset%prtsuscep > 0) call gwr%ncwrite_tchi_wc("wc", trim(gwr%dtfil%filnam_ds(4))//'_WCIMW.nc')
5071 
5072  ! Cosine transform from iomega to itau to get Wc(i tau)
5073  call gwr%cos_transform("wc", "iw2t")
5074 
5075  ! Write file with Wc(i tau)
5076  !if (gwr%dtset%prtsuscep > 0) call gwr%ncwrite_tchi_wc("wc", trim(gwr%dtfil%filnam_ds(4))//'_WCIMT.nc')
5077 
5078  ! Print trace of wc_q(iomega) matrices for testing purposes.
5079  !if (gwr%dtset%prtvol > 0) call gwr%print_trace("wc_qibz")
5080 
5081  call cwtime_report(" gwr_build_wc:", cpu_all, wall_all, gflops_all)
5082  call timab(1924, 2, tsec)
5083 
5084 #endif
5085 end subroutine gwr_build_wc

m_gwr/gwr_cos_transform [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_cos_transform

FUNCTION

  Perform cosine transform.

INPUTS

SOURCE

3496 subroutine gwr_cos_transform(gwr, what, mode, sum_spins)
3497 
3498 !Arguments ------------------------------------
3499  class(gwr_t),target,intent(inout) :: gwr
3500  character(len=*),intent(in) :: what, mode
3501  logical,optional,intent(in) :: sum_spins
3502 
3503 !Local variables-------------------------------
3504 !scalars
3505  integer :: my_iqi, my_is, ig1, ig2, my_it, ierr, iq_ibz, itau, spin, it0, iw
3506  integer :: ndat, idat, loc1_size, loc2_size, batch_size
3507  real(dp) :: cpu, wall, gflops
3508  logical :: sum_spins_
3509 !arrays
3510  integer,allocatable :: requests(:)
3511  real(dp), contiguous, pointer :: weights_ptr(:,:)
3512  complex(dp) :: wgt_globmy(gwr%ntau, gwr%my_ntau)  ! Complex instead of real to be able to call ZGEMM.
3513  complex(dp),allocatable :: cwork_myit(:,:,:), glob_cwork(:,:,:)
3514  type(__slkmat_t), pointer :: mats(:)
3515 
3516 ! *************************************************************************
3517 
3518  call cwtime(cpu, wall, gflops, "start")
3519  sum_spins_ = .False.; if (present(sum_spins)) sum_spins_ = sum_spins
3520 
3521  call wrtout(std_out, sjoin(" Performing cosine transform. what:", what, ", mode:", mode))
3522 
3523  ! Target weights depending on mode.
3524  select case(mode)
3525  case ("iw2t")
3526    ! From omega to tau
3527    if (what == "tchi") then
3528      ABI_CHECK(gwr%tchi_space == "iomega", sjoin("mode:", mode, "with what:", what, "and tchi_space:", gwr%tchi_space))
3529      gwr%tchi_space = "itau"
3530    end if
3531    if (what == "wc") then
3532      ABI_CHECK(gwr%wc_space == "iomega", sjoin("mode:", mode, "with what:", what, "and wc_space:", gwr%wc_space))
3533      gwr%wc_space = "itau"
3534    end if
3535    weights_ptr => gwr%cosft_tw
3536 
3537  case ("it2w")
3538    ! From tau to omega
3539    if (what == "tchi") then
3540      ABI_CHECK(gwr%tchi_space == "itau", sjoin("mode:", mode, " with what:", what, "and tchi_space:", gwr%tchi_space))
3541      gwr%tchi_space = "iomega"
3542    end if
3543    if (what == "wc") then
3544      ABI_CHECK(gwr%wc_space == "itau", sjoin("mode:", mode, " with what:", what, "and wc_space:", gwr%wc_space))
3545      gwr%wc_space = "iomega"
3546    end if
3547    weights_ptr => gwr%cosft_wt
3548 
3549  case default
3550    ABI_ERROR(sjoin("Wrong mode:", mode))
3551  end select
3552 
3553  ! Extract my weights from global array.
3554  do my_it=1,gwr%my_ntau
3555    itau = gwr%my_itaus(my_it)
3556    do iw=1,gwr%ntau
3557      wgt_globmy(iw, my_it) = weights_ptr(iw, itau)
3558    end do
3559  end do
3560 
3561  ! Perform inhomogeneous FT in parallel.
3562  do my_is=1,gwr%my_nspins
3563    spin = gwr%my_spins(my_is)
3564    do my_iqi=1,gwr%my_nqibz
3565      iq_ibz = gwr%my_qibz_inds(my_iqi)
3566      associate (desc_q => gwr%tchi_desc_qibz(iq_ibz))
3567 
3568      mats => null()
3569      if (what == "tchi") mats => gwr%tchi_qibz(iq_ibz, :, spin)
3570      if (what =="wc")   mats => gwr%wc_qibz(iq_ibz, :, spin)
3571      ABI_CHECK(associated(mats), sjoin("Invalid value for what:", what))
3572 
3573      ! Use the first itau index to get the size of the local buffer.
3574      ! Block over ig2 to reduce the number of MPI calls and take advantage of ZGEMM.
3575      it0 = gwr%my_itaus(1)
3576      loc1_size = mats(it0)%sizeb_local(1)
3577      loc2_size = mats(it0)%sizeb_local(2)
3578 
3579      ! batch_size in terms of columns
3580      ! TODO: Determine batch_size automatically to avoid going OOM
3581      !batch_size = 1
3582      !batch_size = 24
3583      batch_size = 48
3584      !batch_size = loc2_size
3585 
3586      ABI_MALLOC(cwork_myit, (gwr%my_ntau, loc1_size, batch_size))
3587      ABI_MALLOC(glob_cwork, (gwr%ntau, loc1_size, batch_size))
3588      ABI_MALLOC(requests, (batch_size))
3589 
3590      do ig2=1,mats(it0)%sizeb_local(2), batch_size
3591        ndat = blocked_loop(ig2, mats(it0)%sizeb_local(2), batch_size)
3592 
3593        ! Extract matrix elements as a function of tau.
3594        do idat=1,ndat
3595          do my_it=1,gwr%my_ntau
3596            itau = gwr%my_itaus(my_it)
3597            do ig1=1,mats(it0)%sizeb_local(1)
3598              cwork_myit(my_it, ig1, idat) = mats(itau)%buffer_cplx(ig1, ig2+idat-1)
3599            end do
3600          end do
3601        end do
3602 
3603        ! Compute contribution to itau matrix
3604        do idat=1,ndat
3605          call ZGEMM("N", "N", gwr%ntau, loc1_size, gwr%my_ntau, cone, &
3606                     wgt_globmy, gwr%ntau, cwork_myit(1,1,idat), gwr%my_ntau, czero, glob_cwork(1,1,idat), gwr%ntau)
3607          !call xmpi_isum_ip(glob_cwork(:,:,idat), gwr%tau_comm%value, requests(idat), ierr)
3608        end do
3609 
3610        !call xmpi_waitall_1d(requests(1:ndat), ierr)
3611        call xmpi_sum(glob_cwork, gwr%tau_comm%value, ierr)
3612 
3613        ! Update my local (g1, g2) entry to have it in imaginary-frequency.
3614        !!!$OMP PARALLEL DO PRIVATE(itau)
3615        do idat=1,ndat
3616          do my_it=1,gwr%my_ntau
3617            itau = gwr%my_itaus(my_it)
3618            do ig1=1,mats(it0)%sizeb_local(1)
3619              mats(itau)%buffer_cplx(ig1, ig2+idat-1) = glob_cwork(itau, ig1, idat)
3620            end do
3621          end do
3622        end do
3623 
3624      end do ! ig2
3625 
3626      ABI_FREE(cwork_myit)
3627      ABI_FREE(glob_cwork)
3628      ABI_FREE(requests)
3629      end associate
3630    end do ! my_iqi
3631  end do ! my_is
3632 
3633  if (sum_spins_ .and. gwr%nspinor /= 2) then  ! gwr%nsppol == 2 .and.
3634    ! Sum over spins
3635    do my_iqi=1,gwr%my_nqibz
3636       iq_ibz = gwr%my_qibz_inds(my_iqi)
3637       do my_is=1,gwr%my_nspins
3638         spin = gwr%my_spins(my_is)
3639         mats => null()
3640         if (what == "tchi") mats => gwr%tchi_qibz(iq_ibz,:,spin)
3641         !if (what =="wc")   mats => gwr%wc_qibz(iq_ibz, :, spin)
3642         ABI_CHECK(associated(mats), sjoin("Invalid value for what:", what))
3643 
3644         do my_it=1,gwr%my_ntau
3645           itau = gwr%my_itaus(my_it)
3646 
3647           if (gwr%nsppol == 1) then
3648             mats(itau)%buffer_cplx = two * mats(itau)%buffer_cplx
3649 
3650           else if (gwr%nsppol == 2) then
3651             if (gwr%spin_comm%nproc > 1) then
3652               ! Spins are distributed thus we have to sum them.
3653               call xmpi_sum(mats(itau)%buffer_cplx, gwr%spin_comm%value, ierr)
3654             else
3655               ! Spins are not distributed. This should happen only in sequential.
3656               if (spin == 1) then
3657                 mats(itau)%buffer_cplx = mats(itau)%buffer_cplx + gwr%tchi_qibz(iq_ibz,itau,spin+1)%buffer_cplx
3658                 gwr%tchi_qibz(iq_ibz,itau,spin+1)%buffer_cplx = mats(itau)%buffer_cplx
3659               end if
3660             end if
3661           end if
3662 
3663         end do ! my_it
3664       end do ! my_is
3665    end do ! my_iqi
3666  end if
3667 
3668  call cwtime_report(" gwr_cos_transform:", cpu, wall, gflops)
3669 
3670 end subroutine gwr_cos_transform

m_gwr/gwr_free [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 gwr_free

FUNCTION

  Free dynamic memory

SOURCE

1899 subroutine gwr_free(gwr)
1900 
1901 !Arguments ------------------------------------
1902  class(gwr_t), intent(inout) :: gwr
1903 
1904 ! *************************************************************************
1905 
1906  ABI_SFREE(gwr%ks_vbik)
1907  ABI_SFREE(gwr%kbz)
1908  ABI_SFREE(gwr%kbz2ibz)
1909  ABI_SFREE(gwr%kbz2ibz_symrel)
1910  ABI_SFREE(gwr%qbz2ibz)
1911  ABI_SFREE(gwr%my_kbz_inds)
1912  ABI_SFREE(gwr%my_kibz_inds)
1913  ABI_SFREE(gwr%my_qbz_inds)
1914  ABI_SFREE(gwr%my_qibz_inds)
1915  ABI_SFREE(gwr%qbz)
1916  ABI_SFREE(gwr%qibz)
1917  ABI_SFREE(gwr%wtq)
1918  ABI_SFREE(gwr%chi0_head_myw)
1919  ABI_SFREE(gwr%chi0_uwing_myw)
1920  ABI_SFREE(gwr%chi0_lwing_myw)
1921  ABI_SFREE(gwr%qbz2ibz)
1922  ABI_SFREE(gwr%my_spins)
1923  ABI_SFREE(gwr%my_itaus)
1924  ABI_SFREE(gwr%tau_master)
1925  ABI_SFREE(gwr%np_kibz)
1926  ABI_SFREE(gwr%itreat_ikibz)
1927  ABI_SFREE(gwr%np_qibz)
1928  ABI_SFREE(gwr%itreat_iqibz)
1929 !#ifdef __HAVE_GREENX
1930  ABI_SFREE_NOCOUNT(gwr%tau_mesh)
1931  ABI_SFREE_NOCOUNT(gwr%tau_wgs)
1932  ABI_SFREE_NOCOUNT(gwr%iw_mesh)
1933  ABI_SFREE_NOCOUNT(gwr%iw_wgs)
1934  ABI_SFREE_NOCOUNT(gwr%cosft_tw)
1935  ABI_SFREE_NOCOUNT(gwr%cosft_wt)
1936  ABI_SFREE_NOCOUNT(gwr%sinft_wt)
1937 !#endif
1938  ABI_SFREE(gwr%kcalc)
1939  ABI_SFREE(gwr%bstart_ks)
1940  ABI_SFREE(gwr%bstop_ks)
1941  ABI_SFREE(gwr%nbcalc_ks)
1942  ABI_SFREE(gwr%kcalc2ibz)
1943  ABI_SFREE(gwr%sigx_mat)
1944  ABI_SFREE(gwr%sigc_iw_mat)
1945  ABI_SFREE(gwr%chinpw_qibz)
1946 
1947  call gwr%ks_gaps%free()
1948  call ebands_free(gwr%qp_ebands)
1949  call ebands_free(gwr%qp_ebands_prev)
1950  call gwr%kcalc_wfd%free()
1951  call gwr%wfk_hdr%free()
1952 
1953  ! Free descriptors
1954  if (allocated(gwr%green_desc_kibz)) then
1955    call desc_array_free(gwr%green_desc_kibz)
1956    ABI_FREE(gwr%green_desc_kibz)
1957  end if
1958  if (allocated(gwr%tchi_desc_qibz)) then
1959    call desc_array_free(gwr%tchi_desc_qibz)
1960    ABI_FREE(gwr%tchi_desc_qibz)
1961  end if
1962 
1963  ! Free PBLAS matrices
1964  if (allocated(gwr%gt_kibz)) then
1965    call slk_array_free(gwr%gt_kibz)
1966    ABI_FREE(gwr%gt_kibz)
1967  end if
1968  if (allocated(gwr%tchi_qibz)) then
1969    call slk_array_free(gwr%tchi_qibz)
1970    ABI_FREE(gwr%tchi_qibz)
1971  end if
1972  if (allocated(gwr%wc_qibz)) then
1973    call slk_array_free(gwr%wc_qibz)
1974    ABI_FREE(gwr%wc_qibz)
1975  end if
1976  if (allocated(gwr%sigc_kibz)) then
1977    call slk_array_free(gwr%sigc_kibz)
1978    ABI_FREE(gwr%sigc_kibz)
1979  end if
1980  ! Release the scalapack pressor.
1981  call gwr%g_slkproc%free()
1982 
1983  if (allocated(gwr%ugb)) then
1984    call slk_array_free(gwr%ugb)
1985    ABI_FREE(gwr%ugb)
1986  end if
1987  !if (allocated(gwr%nato_ugb)) then
1988  !  call slk_array_free(gwr%nato_ugb)
1989  !  ABI_FREE(gwr%nato_ugb)
1990  !end if
1991  call gwr%gtau_slkproc%free()
1992 
1993  ! datatypes.
1994  call gwr%ks_me%free()
1995  call gwr%vcgen%free()
1996 
1997  if (allocated(gwr%degtab)) then
1998    call degtab_array_free(gwr%degtab)
1999    ABI_FREE(gwr%degtab)
2000  end if
2001 
2002  ! Free MPI communicators
2003  call gwr%spin_comm%free(); call gwr%g_comm%free(); call gwr%tau_comm%free()
2004  call gwr%kpt_comm%free(); call gwr%gtau_comm%free(); call gwr%kg_comm%free()
2005  call gwr%kgt_comm%free(); call gwr%kts_comm%free(); call gwr%comm%free()
2006 
2007 end subroutine gwr_free

m_gwr/gwr_gamma_gw [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_gamma_gw

FUNCTION

INPUTS

  vpsp(nfft)=local psp (Hartree)

OUTPUT

SOURCE

8061 subroutine gwr_gamma_gw(gwr, nfftf, ngfftf, vpsp)
8062 
8063  use m_gwrdm,         only : calc_rdmx,calc_rdmc,natoccs,update_hdr_bst,print_tot_occ,get_chkprdm,&
8064                              print_chkprdm,change_matrix,print_total_energy,print_band_energies
8065  use m_spacepar,      only : hartre
8066 
8067 !Arguments ------------------------------------
8068  class(gwr_t),target,intent(inout) :: gwr
8069  integer,intent(in) :: nfftf
8070 !arrays
8071  integer,intent(in) :: ngfftf(18)
8072  real(dp),intent(in) :: vpsp(nfftf)
8073 
8074 !Local variables-------------------------------
8075 !scalars
8076  integer,parameter :: master = 0, cplex1 = 1, tim_fourdp5 = 5
8077  integer :: spin, ikcalc, ik_ibz, ib, ib1, ib2, nkibz, nsppol, mband, ierr, b1gw, b2gw
8078  !real(dp) :: cpu, wall, gflops
8079  real(dp) :: evext_energy,den_int !,coef_hyb,exc_mbb_energy eh_energy, ekin_energy,
8080  real(dp) :: gsqcut,boxcut,ecutf
8081  character(len=500) :: msg
8082  type(hdr_type) :: Hdr_sigma
8083 !arrays
8084  integer :: units(2)
8085  real(dp),parameter ::  k0(3) = zero
8086  !real(dp) :: kgw(3) ! kk_ibz(3),
8087  real(dp),allocatable :: nat_occs(:,:), gw_rhor(:,:), gw_rhog(:,:), gw_vhartr(:)
8088  complex(dpc),allocatable :: xrdm_k_full(:,:,:), rdm_k(:,:), pot_k(:,:), nateigv(:,:,:,:), old_ks_purex(:,:), new_hartr(:,:)
8089  complex(dp) :: omega_i(gwr%ntau)
8090  complex(dpc),allocatable :: sigcme_k(:,:,:,:)
8091 
8092 ! *************************************************************************
8093 
8094  call gwr%run_g0w0(free_ugb=.False.)
8095  ! TODO: Might release some PBLAS memory for W at this point
8096 
8097  ! This section is copied from m_sigma_driver with small changes in order to intergrate it with the gwr% object.
8098  associate (dtset => gwr%dtset, qp_ebands => gwr%qp_ebands, ks_me => gwr%ks_me, psps => gwr%psps, &
8099             Wfd_nato_master => gwr%kcalc_wfd, dtfil => gwr%dtfil, cryst => gwr%cryst)
8100 
8101  units = [std_out, ab_out]
8102  nkibz = gwr%nkibz; nsppol = gwr%nsppol; b1gw = gwr%b1gw; b2gw = gwr%b2gw
8103  ! Don't take mband from ks_ebands but compute it from gwr%bstop_ks
8104  mband = maxval(gwr%bstop_ks) !; mband = gwr%ks_ebands%mband
8105 
8106  ! Note: all subroutines of 70_gw/m_gwrdm.F90 are implemented assuming nsppol == 1
8107  ABI_CHECK(dtset%nsppol == 1, "1-RDM GW correction only implemented for restricted closed-shell calculations!")
8108  ABI_CHECK(.not. gwr%sig_diago, "sig_diago should be false")
8109 
8110  ABI_CALLOC(nateigv, (mband, mband, nkibz, nsppol))
8111  ABI_CALLOC(nat_occs, (mband, nkibz))
8112  ABI_CALLOC(xrdm_k_full, (b1gw:b2gw, b1gw:b2gw, nkibz))
8113 
8114  write(msg,'(a34,2i9)')' Bands used for the GW 1RDM arrays',b1gw,b2gw
8115  call wrtout(units, msg)
8116 
8117  do ik_ibz=1,nkibz
8118    do ib=b1gw,b2gw
8119      xrdm_k_full(ib,ib,ik_ibz) = qp_ebands%occ(ib,ik_ibz,1)
8120    end do
8121    do ib=1,mband
8122      ! Copy initial occ numbers (in principle 2 or 0 from KS-DFT)
8123      nat_occs(ib,ik_ibz) = qp_ebands%occ(ib,ik_ibz,1)
8124      ! Set to identity matrix
8125      nateigv(ib,ib,ik_ibz,1) = cone
8126    end do
8127  end do
8128 
8129  omega_i = j_dpc * gwr%iw_mesh
8130 
8131  do spin=1,gwr%nsppol
8132  do ikcalc=1,gwr%nkcalc ! TODO: Should be spin dependent!
8133    ! Index of the irred k-point
8134    ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
8135    !kgw = gwr%kcalc(:, ikcalc)
8136    ! min and max band indices for GW corrections (for this k-point)
8137    ib1 = gwr%bstart_ks(ikcalc, spin); ib2 = gwr%bstop_ks(ikcalc, spin)
8138 
8139    ! Compute Sigma_x - Vxc or DELTA Sigma_x - Vxc
8140    ! where DELTA Sigma_x = Sigma_x - hyb_parameter Vx^exact for hyb Functionals.
8141    ! NB: Only restricted closed-shell calcs are implemented here
8142    ABI_CALLOC(pot_k, (ib1:ib2, ib1:ib2))
8143    ABI_CALLOC(rdm_k, (ib1:ib2, ib1:ib2))
8144    pot_k(ib1:ib2,ib1:ib2) = gwr%sigx_mat(ib1:ib2,ib1:ib2,ik_ibz,spin) - ks_me%vxcval(ib1:ib2,ib1:ib2,ik_ibz,spin)
8145    call calc_rdmx(ib1, ib2, ik_ibz, pot_k, rdm_k, qp_ebands)
8146 
8147    ! Update the full 1RDM with the exchange corrected one for this k-point
8148    xrdm_k_full(ib1:ib2,ib1:ib2,ik_ibz) = xrdm_k_full(ib1:ib2,ib1:ib2,ik_ibz) + rdm_k(ib1:ib2,ib1:ib2)
8149 
8150    ! Compute NAT ORBS for exchange corrected 1-RDM
8151    ! Only restricted closed-shell calcs
8152    do ib=ib1,ib2
8153      rdm_k(ib,ib) = rdm_k(ib,ib) + qp_ebands%occ(ib,ik_ibz,1)
8154    end do
8155    call natoccs(ib1, ib2, rdm_k, nateigv, nat_occs, qp_ebands, ik_ibz, iinfo=0)
8156 
8157    ! ================
8158    ! Correlation part
8159    ! ================
8160    ! TODO
8161    !ABI_CALLOC(sigcme_k, (gwr%ntau, ib2-ib1+1, ib2-ib1+1, nsppol*gwr%nsig_ab))
8162    !gwr%sigc_iw_mat((gwr%ntau, ib1:, ib1:, nsppol*gwr%nsig_ab))
8163    call calc_rdmc(ib1, ib2, ik_ibz, omega_i, gwr%iw_wgs, sigcme_k, qp_ebands, rdm_k)
8164    !ABI_FREE(sigcme_k)
8165 
8166    ! Update the full 1RDM with the GW corrected one for this k-point
8167    ! Only restricted closed-shell calcs
8168    rdm_k(ib1:ib2,ib1:ib2) = xrdm_k_full(ib1:ib2,ib1:ib2,ik_ibz) + rdm_k(ib1:ib2,ib1:ib2)
8169    ! Compute nat orbs and occ numbers at k-point ik_ibz
8170    call natoccs(ib1, ib2, rdm_k, nateigv, nat_occs, qp_ebands, ik_ibz, iinfo=1)
8171 
8172    ABI_FREE(pot_k)
8173    ABI_FREE(rdm_k)
8174  end do ! ikcalc
8175  end do ! spin
8176 
8177  ABI_CALLOC(gw_rhor, (nfftf, dtset%nspden))
8178  call hdr_copy(gwr%wfk_hdr, hdr_sigma)
8179 
8180  ! NRM WARNING: only the master has bands on Wfd_nato_master so it prints everything and computes gw_rhor
8181  !
8182  ! All procs. update the qp_ebands and the Hdr_sigma
8183  call update_hdr_bst(Wfd_nato_master, nat_occs, b1gw, b2gw, qp_ebands, Hdr_sigma, Dtset%ngfft(1:3))
8184 
8185  ! Compute unit cell (averaged) occ = \sum _k weight_k occ_k
8186  call print_tot_occ(qp_ebands)
8187 
8188  if (gwr%comm%me == master) then
8189    call Wfd_nato_master%rotate(cryst, nateigv) !, bmask=bdm_mask)               ! Let it use bdm_mask and build NOs
8190    call Wfd_nato_master%mkrho(cryst, psps, qp_ebands, ngfftf, nfftf, gw_rhor)   ! Construct the density
8191    if (dtset%prtwf == 1) then
8192      ! Print WFK file, here qp_ebands contains nat. orb. occs.
8193      call Wfd_nato_master%write_wfk(Hdr_sigma, qp_ebands, dtfil%fnameabo_wfk, wfknocheck=.True.)
8194    end if
8195    if (dtset%prtden == 1) then
8196       ! Print DEN file
8197      call fftdatar_write("density",dtfil%fnameabo_den,dtset%iomode,Hdr_sigma,&
8198                          Cryst,ngfftf,cplex1,nfftf,dtset%nspden,gw_rhor,gwr%mpi_enreg,ebands=qp_ebands)
8199    end if
8200  end if
8201  call xmpi_bcast(gw_rhor, master, gwr%comm%value, ierr)
8202  call hdr_sigma%free()
8203 
8204  ! Compute energies only if all k-points are available
8205  ! We need the hole 1-RDM to build Fock[GW.1RDM]!
8206  ABI_CALLOC(old_ks_purex, (b1gw:b2gw, gwr%nkcalc))
8207  ABI_CALLOC(new_hartr, (b1gw:b2gw, gwr%nkcalc))
8208  ABI_CALLOC(gw_rhog, (2, nfftf))
8209  ABI_CALLOC(gw_vhartr, (nfftf))
8210  !
8211  ! A) Compute Evext = int rho(r) vext(r) dr -> simply dot product on the FFT grid
8212  ! Only restricted closed-shell calcs
8213  !
8214  den_int = sum(gw_rhor(:,1)) * cryst%ucvol / nfftf
8215  evext_energy = sum(gw_rhor(:,1) * vpsp(:)) * cryst%ucvol / nfftf
8216  !
8217  ! B) Coulomb <KS_i|Vh[NO]|KS_j>
8218  !
8219  ! FFT to build gw_rhog
8220  call fourdp(1, gw_rhog, gw_rhor(:,1), -1, gwr%mpi_enreg, nfftf, ndat1, ngfftf, tim_fourdp5)
8221 
8222  ecutf = dtset%ecut
8223  if (psps%usepaw == 1) then
8224    ecutf = dtset%pawecutdg
8225    call wrtout(std_out, ch10//' FFT (fine) grid used in PAW GW update:')
8226  end if
8227 
8228  call getcut(boxcut, ecutf, cryst%gmet, gsqcut, dtset%iboxcut, std_out, k0, ngfftf)
8229  call hartre(1, gsqcut, dtset%icutcoul, psps%usepaw, gwr%mpi_enreg, nfftf, ngfftf, dtset%nkpt, dtset%rcut, &
8230              gw_rhog, cryst%rprimd, dtset%vcutgeo, gw_vhartr)
8231 
8232  ! TODO
8233 
8234  ABI_FREE(nateigv)
8235  ABI_FREE(nat_occs)
8236  ABI_FREE(xrdm_k_full)
8237  ABI_FREE(gw_rhor)
8238  ABI_FREE(old_ks_purex)
8239  ABI_FREE(new_hartr)
8240  ABI_FREE(gw_rhog)
8241  ABI_FREE(gw_vhartr)
8242  end associate
8243 
8244 end subroutine gwr_gamma_gw

m_gwr/gwr_get_gkbz_rpr_pm [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_get_gkbz_rpr_pm

FUNCTION

  Compute G_k(r',r) from G_k(g,g') for k in the BZ and given spin and tau.
  Note that output matrix `gk_rpr_pm` is transposed i.e. (r',r) instead of (r,r').

INPUTS

OUTPUT

SOURCE

2978 subroutine gwr_get_gkbz_rpr_pm(gwr, ik_bz, itau, spin, gk_rpr_pm, g0, ipm_list)
2979 
2980 !Arguments ------------------------------------
2981  class(gwr_t),intent(in) :: gwr
2982  integer,intent(in) :: ik_bz, itau, spin
2983  type(__slkmat_t),intent(inout) :: gk_rpr_pm(2)
2984  integer,optional,intent(in) :: g0(3), ipm_list(:)
2985 
2986 !Local variables-------------------------------
2987 !scalars
2988  integer :: ig2, ipm, npwsp, col_bsize, ir1, idat, ndat, ii, num_pm, ipm_list__(2)
2989  logical :: have_g0
2990  !real(dp) :: cpu, wall, gflops
2991  type(__slkmat_t) :: rgp, gt_pm(2), gpr
2992  type(desc_t) :: desc_kbz
2993  type(uplan_t) :: uplan_k
2994  complex(gwpc),allocatable :: ceig0r(:)
2995  character(len=500) :: msg
2996 
2997 ! *************************************************************************
2998 
2999  !call cwtime(cpu, wall, gflops, "start")
3000 
3001  num_pm = 2; ipm_list__ = [1, 2]
3002  if (present(ipm_list)) then
3003    num_pm = size(ipm_list)
3004    ABI_CHECK_IRANGE(num_pm, 1, 2, "num_pm not in [1, 2]")
3005    ipm_list__(1:num_pm) = ipm_list
3006  end if
3007 
3008  have_g0 = .False.
3009  if (present(g0)) then
3010    ! NB: Non-zero g0, requires the application of the phase.
3011    if (any(g0 /= 0)) then
3012      have_g0 = .True.
3013      ABI_MALLOC(ceig0r, (gwr%g_nfft * gwr%nspinor))
3014      call calc_ceigr(-g0, gwr%g_nfft, gwr%nspinor, gwr%g_ngfft, ceig0r)
3015    end if
3016  end if
3017 
3018  ! Get G_k(g,g', +/- itau) in the BZ.
3019  call gwr%rotate_gpm(ik_bz, itau, spin, desc_kbz, gt_pm, ipm_list=ipm_list__)
3020 
3021  call uplan_k%init(desc_kbz%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, desc_kbz%istwfk, &
3022                    desc_kbz%gvec, gwpc, gwr%dtset%gpu_option)
3023 
3024  ! For each tau in imp_list__
3025  do ii=1,num_pm
3026    ipm = ipm_list__(ii)
3027    ! Allocate temporary rgp PBLAS matrix to store G(r,g')
3028    npwsp = desc_kbz%npw * gwr%nspinor
3029    ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
3030    call rgp%init(gwr%g_nfft * gwr%nspinor, npwsp, gwr%g_slkproc, desc_kbz%istwfk, size_blocs=[-1, col_bsize])
3031 
3032    associate (g_gp => gt_pm(ipm))
3033    do ig2=1, g_gp%sizeb_local(2), gwr%uc_batch_size
3034      ! G_k(g,g') -> G_k(r,g') and store results in rgp.
3035      ndat = blocked_loop(ig2, g_gp%sizeb_local(2), gwr%uc_batch_size)
3036      call uplan_k%execute_gr(ndat, g_gp%buffer_cplx(:,ig2), rgp%buffer_cplx(:,ig2))
3037 
3038      ! Multiply by e^{ig0.r}
3039      if (have_g0) then
3040        do idat=0,ndat-1
3041          rgp%buffer_cplx(:,ig2+idat) = ceig0r(:) * rgp%buffer_cplx(:,ig2+idat)
3042        end do
3043      end if
3044    end do ! ig2
3045    end associate
3046 
3047    ! MPI transpose: G_k(r,g') -> G_k(g',r) and transform g' index.
3048    call rgp%ptrans("N", gpr, free=.True.)
3049 
3050    do ir1=1, gpr%sizeb_local(2), gwr%uc_batch_size
3051      ! G_k(g',r) -> G_k(r',r) and store results in rgp.
3052      ndat = blocked_loop(ir1, gpr%sizeb_local(2), gwr%uc_batch_size)
3053      call uplan_k%execute_gr(ndat, gpr%buffer_cplx(:,ir1), gk_rpr_pm(ipm)%buffer_cplx(:,ir1), isign=-1, iscale=0)
3054 
3055      ! Multiply by e^{ig0.r}
3056      if (have_g0) then
3057        do idat=0,ndat-1
3058          gk_rpr_pm(ipm)%buffer_cplx(:, ir1+idat) = conjg(ceig0r) * gk_rpr_pm(ipm)%buffer_cplx(:, ir1+idat)
3059        end do
3060      end if
3061    end do ! ir1
3062    call gpr%free()
3063 
3064    ! Rescale?
3065    !gk_rpr_pm(ipm)%buffer_cplx = gk_rpr_pm(ipm)%buffer_cplx * gwr%g_nfft
3066  end do ! ipm
3067 
3068  call slk_array_free(gt_pm); call desc_kbz%free(); call uplan_k%free()
3069 
3070  ABI_SFREE(ceig0r)
3071  !call cwtime_report(" gwr_get_gkbz_rpr_pm:", cpu, wall, gflops)
3072 
3073 end subroutine gwr_get_gkbz_rpr_pm

m_gwr/gwr_get_myk_green_gpr [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_get_myk_green_gpr

FUNCTION

  Use FFTs to compute G_k(g,g') --> G_k(g',r) for each k in the BZ treated by this MPI proc for given spin and tau.

  1) FFT Transform the first index and multiply by e^{ik.r}:

          G_k(g,g') --> G_k(r,g') = e^{ik.r} \sum_g e^{ig.r} G_k(g,g')

     NB: This is a local operation.

  2) MPI transpose the matrix to go from (r,g') to (g',r) distribution.

INPUTS

OUTPUT

SOURCE

2882 subroutine gwr_get_myk_green_gpr(gwr, itau, spin, desc_mykbz, gt_gpr)
2883 
2884 !Arguments ------------------------------------
2885  class(gwr_t),intent(in) :: gwr
2886  integer,intent(in) :: itau, spin
2887  type(desc_t),intent(out) :: desc_mykbz(gwr%my_nkbz)
2888  type(__slkmat_t),intent(inout) :: gt_gpr(2, gwr%my_nkbz)
2889 
2890 !Local variables-------------------------------
2891 !scalars
2892  integer :: my_ikf, ik_bz, ig2, ipm, npwsp, col_bsize, idat, ndat
2893  logical :: k_is_gamma
2894  real(dp) :: kk_bz(3), cpu, wall, gflops, mem_mb
2895  complex(gwpc),allocatable :: ceikr(:)
2896  character(len=500) :: msg
2897  type(__slkmat_t) :: rgp, gt_pm(2)
2898  type(uplan_t) :: uplan_k
2899 
2900 ! *************************************************************************
2901 
2902  call cwtime(cpu, wall, gflops, "start")
2903 
2904  !mem_mb = two * gwr%my_nkbz * two * gwpc * gwr%g_nfft * gwr%gree_mpw * b2Mb /  gwr%g_slkproc%nbprocs
2905  !call wrtout(std_out, sjoin("Local memory for Green's functions: ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
2906 
2907  ABI_MALLOC(ceikr, (gwr%g_nfft * gwr%nspinor))
2908 
2909  do my_ikf=1,gwr%my_nkbz
2910    ik_bz = gwr%my_kbz_inds(my_ikf)
2911    kk_bz = gwr%kbz(:, ik_bz)
2912    k_is_gamma = normv(kk_bz, gwr%cryst%gmet, "G") < GW_TOLQ0
2913    if (.not. k_is_gamma) call calc_ceikr(kk_bz, gwr%g_ngfft, gwr%g_nfft, gwr%nspinor, ceikr)
2914 
2915    ! Get G_kbz(+/- itau) in the BZ.
2916    call gwr%rotate_gpm(ik_bz, itau, spin, desc_mykbz(my_ikf), gt_pm)
2917 
2918    associate (desc_k => desc_mykbz(my_ikf))
2919    call uplan_k%init(desc_k%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, desc_k%istwfk, &
2920                      desc_k%gvec, gwpc, gwr%dtset%gpu_option)
2921 
2922    do ipm=1,2
2923      ! Allocate rgp PBLAS matrix to store G_kbz(r,g')
2924      associate (g_gp => gt_pm(ipm))
2925      npwsp = desc_k%npw * gwr%nspinor
2926      ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
2927      call rgp%init(gwr%g_nfft * gwr%nspinor, npwsp, gwr%g_slkproc, desc_k%istwfk, size_blocs=[-1, col_bsize])
2928      !ABI_CHECK_IEQ(size(g_gp%buffer_cplx, dim=2), size(rgp%buffer_cplx, dim=2), "len2")
2929 
2930      ! Perform FFT G_k(g,g') -> G_k(r,g') and store results in rgp.
2931      do ig2=1, g_gp%sizeb_local(2), gwr%uc_batch_size
2932        ndat = blocked_loop(ig2, g_gp%sizeb_local(2), gwr%uc_batch_size)
2933        call uplan_k%execute_gr(ndat, g_gp%buffer_cplx(:, ig2), rgp%buffer_cplx(:, ig2))
2934 
2935        if (.not. k_is_gamma) then
2936          ! Multiply by e^{ik.r}
2937          !$OMP PARALLEL DO
2938          do idat=0,ndat-1
2939            rgp%buffer_cplx(:, ig2 + idat) = ceikr(:) * rgp%buffer_cplx(:, ig2 + idat)
2940          end do
2941        end if
2942      end do ! ig2
2943 
2944      ! MPI transpose: G_k(r,g') -> G_k(g',r)
2945      call rgp%ptrans("N", gt_gpr(ipm, my_ikf), free=.True.)
2946      end associate
2947    end do ! ipm
2948 
2949    call uplan_k%free(); call slk_array_free(gt_pm)
2950    end associate
2951  end do ! my_ikf
2952 
2953  mem_mb = sum(slk_array_locmem_mb(gt_gpr))
2954  call wrtout(std_out, sjoin(" Local memory for G_kbz(g',r,itau): ", ftoa(mem_mb, fmt="f8.1"), "[Mb] <<< MEM"))
2955 
2956  ABI_FREE(ceikr)
2957  call cwtime_report(" gwr_get_myk_green_gpr:", cpu, wall, gflops)
2958 
2959 end subroutine gwr_get_myk_green_gpr

m_gwr/gwr_get_myq_wc_gpr [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_get_myq_wc_gpr

FUNCTION

  Use FFTs to compute: Wc_q(g,g') --> Wc_q(g',r)
  for each q in the BZ treated by this MPI proc for given `spin` and `itau` index:

      1) FFT Transform the first index: Wc(g,g',it) --> Wc(r,g',it)  (local operation)
      2) MPI transposition: Wc(r,g',it) --> Wc(g',r,it)

INPUTS

OUTPUT

SOURCE

3326 subroutine gwr_get_myq_wc_gpr(gwr, itau, spin, desc_myqbz, wc_gpr)
3327 
3328 !Arguments ------------------------------------
3329  class(gwr_t),intent(inout) :: gwr
3330  integer,intent(in) :: itau, spin
3331  type(desc_t),target,intent(out) :: desc_myqbz(gwr%my_nqbz)
3332  type(__slkmat_t),intent(inout) :: wc_gpr(gwr%my_nqbz)
3333 
3334 !Local variables-------------------------------
3335 !scalars
3336  integer :: my_iqf, iq_bz, ig2, npwsp, col_bsize, idat, ndat
3337  real(dp) :: cpu, wall, gflops, mem_mb, qq_bz(3)
3338  logical :: q_is_gamma
3339  character(len=500) :: msg
3340  type(__slkmat_t) :: rgp, wc_qbz
3341  type(uplan_t) :: uplan_q
3342  complex(gwpc),allocatable :: ceiqr(:)
3343 
3344 ! *************************************************************************
3345 
3346  call cwtime(cpu, wall, gflops, "start")
3347  ABI_MALLOC(ceiqr, (gwr%g_nfft * gwr%nspinor))
3348 
3349  do my_iqf=1,gwr%my_nqbz
3350    iq_bz = gwr%my_qbz_inds(my_iqf); qq_bz = gwr%qbz(:, iq_bz)
3351    q_is_gamma = normv(qq_bz, gwr%cryst%gmet, "G") < GW_TOLQ0
3352    if (.not. q_is_gamma) call calc_ceikr(qq_bz, gwr%g_ngfft, gwr%g_nfft, gwr%nspinor, ceiqr)
3353 
3354    ! Get Wc_q in the BZ.
3355    call gwr%rotate_wc(iq_bz, itau, spin, desc_myqbz(my_iqf), wc_qbz)
3356    associate (desc_q => desc_myqbz(my_iqf))
3357 
3358    ! Allocate rgp PBLAS matrix to store Wc_q(r, g')
3359    npwsp = desc_q%npw * gwr%nspinor
3360    ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
3361    call rgp%init(gwr%g_nfft * gwr%nspinor, npwsp, gwr%g_slkproc, desc_q%istwfk, size_blocs=[-1, col_bsize])
3362 
3363    call uplan_q%init(desc_q%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, desc_q%istwfk, &
3364                      desc_q%gvec, gwpc, gwr%dtset%gpu_option)
3365 
3366    ! FFT and store results in rgp
3367    do ig2=1,wc_qbz%sizeb_local(2), gwr%uc_batch_size
3368      ndat = blocked_loop(ig2, wc_qbz%sizeb_local(2), gwr%uc_batch_size)
3369      call uplan_q%execute_gr(ndat, wc_qbz%buffer_cplx(:, ig2), rgp%buffer_cplx(:, ig2))
3370 
3371      ! Multiply by e^{iq.r}
3372      if (.not. q_is_gamma) then
3373        !$OMP PARALLEL DO
3374        do idat=0,ndat-1
3375          rgp%buffer_cplx(:, ig2+idat) = ceiqr(:) * rgp%buffer_cplx(:, ig2+idat)
3376        end do
3377      end if
3378    end do ! ig2
3379 
3380    call uplan_q%free()
3381 
3382    ! MPI transposition: Wc(r,g') -> Wc(g',r)
3383    call rgp%ptrans("N", wc_gpr(my_iqf), free=.True.)
3384    end associate
3385    call wc_qbz%free()
3386  end do ! my_iqf
3387  ABI_FREE(ceiqr)
3388 
3389  mem_mb = sum(slk_array_locmem_mb(wc_gpr))
3390  call wrtout(std_out, sjoin(" Local memory for Wc(g',r):", ftoa(mem_mb, fmt="f8.1"), "[Mb] <<< MEM"))
3391  call cwtime_report(" gwr_get_myq_wc_gpr:", cpu, wall, gflops)
3392 
3393 end subroutine gwr_get_myq_wc_gpr

m_gwr/gwr_get_u_ngfft [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_get_u_ngfft

FUNCTION

  Compute FFT mesh from boxcutmin.

INPUTS

OUTPUT

SOURCE

7857 subroutine gwr_get_u_ngfft(gwr, boxcutmin, u_ngfft, u_nfft, u_mgfft, u_mpw, gmax)
7858 
7859 !Arguments ------------------------------------
7860  class(gwr_t),intent(in) :: gwr
7861  real(dp),intent(in) :: boxcutmin
7862  integer,intent(out) :: u_ngfft(18), u_nfft, u_mgfft, u_mpw, gmax(3)
7863 
7864 !Local variables-------------------------------
7865  integer :: ik_bz, npw_, ig, ii
7866  real(dp) :: kk_bz(3)
7867  integer,allocatable :: gvec_(:,:)
7868 
7869 ! *************************************************************************
7870 
7871  ! All MPI procs in gwr%comm execute this part.
7872  ! Note the loops over the full BZ to compute u_mpw
7873  ! FIXME: umklapp, ecutsigx and q-centered G-sphere
7874  ! TODO: Write new routine to compute best FFT mesh for ecut1 + ecut1. See set_mesh from GW code.
7875 
7876  u_ngfft = gwr%dtset%ngfft ! This to allow users to specify fftalg
7877 
7878  u_mpw = -1; gmax = 0
7879  do ik_bz=1,gwr%nkbz
7880    kk_bz = gwr%kbz(:, ik_bz)
7881    call get_kg(kk_bz, istwfk1, gwr%dtset%ecut, gwr%cryst%gmet, npw_, gvec_)
7882    u_mpw = max(u_mpw, npw_)
7883    ! TODO: g0 umklapp here can enter into play gmax may not be large enough!
7884    do ig=1,npw_
7885      do ii=1,3
7886        gmax(ii) = max(gmax(ii), abs(gvec_(ii, ig)))
7887      end do
7888    end do
7889    ABI_FREE(gvec_)
7890    call getng(boxcutmin, gwr%dtset%chksymtnons, gwr%dtset%ecut, gwr%cryst%gmet, &
7891               kk_bz, me_fft0, u_mgfft, u_nfft, u_ngfft, nproc_fft1, gwr%cryst%nsym, paral_fft0, &
7892               gwr%cryst%symrel, gwr%cryst%tnons, gpu_option=gwr%dtset%gpu_option, unit=dev_null)
7893  end do
7894 
7895 end subroutine gwr_get_u_ngfft

m_gwr/gwr_get_wc_rpr_qbz [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_get_wc_rpr_qbz

FUNCTION

  Compute Wc_q(r',r') for q in the BZ

INPUTS

OUTPUT

SOURCE

3411 subroutine gwr_get_wc_rpr_qbz(gwr, g0_q, iq_bz, itau, spin, wc_rpr)
3412 
3413 !Arguments ------------------------------------
3414  class(gwr_t),intent(inout) :: gwr
3415  integer,intent(in) :: g0_q(3)
3416  integer,intent(in) :: iq_bz, itau, spin
3417  type(__slkmat_t),intent(inout) :: wc_rpr
3418 
3419 !Local variables-------------------------------
3420 !scalars
3421  integer :: ig2, npwsp, nrsp, col_bsize, ir1, ndat, idat
3422  character(len=500) :: msg
3423  type(desc_t) :: desc_qbz
3424  type(__slkmat_t) :: wc_ggp, rgp, gpr
3425  type(uplan_t) :: uplan_k
3426  complex(gwpc),allocatable :: ceig0r(:)
3427 ! *************************************************************************
3428 
3429  ! NB: Non-zero g0, requires the application of the phase.
3430  if (any(g0_q /= 0)) then
3431    ABI_MALLOC(ceig0r, (gwr%g_nfft * gwr%nspinor))
3432    call calc_ceigr(-g0_q, gwr%g_nfft, gwr%nspinor, gwr%g_ngfft, ceig0r)
3433  end if
3434 
3435  ! Get W_q(g,g') in the BZ.
3436  call gwr%rotate_wc(iq_bz, itau, spin, desc_qbz, wc_ggp)
3437 
3438  ! Allocate rgp PBLAS matrix to store Wc(r,g')
3439  nrsp = gwr%g_nfft * gwr%nspinor
3440  npwsp = desc_qbz%npw * gwr%nspinor
3441  ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
3442  call rgp%init(nrsp, npwsp, gwr%g_slkproc, desc_qbz%istwfk, size_blocs=[-1, col_bsize])
3443 
3444  call uplan_k%init(desc_qbz%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, desc_qbz%istwfk, &
3445                    desc_qbz%gvec, gwpc, gwr%dtset%gpu_option)
3446 
3447  ! FFT Wc(g,g') -> Wc(r,g') and store results in rgp
3448  do ig2=1,wc_ggp%sizeb_local(2), gwr%uc_batch_size
3449    ndat = blocked_loop(ig2, wc_ggp%sizeb_local(2), gwr%uc_batch_size)
3450    call uplan_k%execute_gr(ndat, wc_ggp%buffer_cplx(:,ig2), rgp%buffer_cplx(:,ig2))
3451 
3452    ! Multiply by e^{ig0.r}
3453    if (any(g0_q /= 0)) then
3454      do idat=0,ndat-1
3455        rgp%buffer_cplx(:, ig2+idat) = ceig0r(:) * rgp%buffer_cplx(:, ig2+idat)
3456      end do
3457    end if
3458  end do ! ig2
3459 
3460  ! MPI transpose: Wc(r,g') -> Wc(g',r)
3461  call rgp%ptrans("N", gpr, free=.True.)
3462 
3463  ! Wc_q(g',r) -> Wc_q(r',r) and store results in wc_rgp.
3464  do ir1=1,gpr%sizeb_local(2), gwr%uc_batch_size
3465    ndat = blocked_loop(ir1, gpr%sizeb_local(2), gwr%uc_batch_size)
3466    call uplan_k%execute_gr(ndat, gpr%buffer_cplx(:, ir1), wc_rpr%buffer_cplx(:, ir1), isign=-1, iscale=0)
3467 
3468    ! Multiply by e^{ig0.r}
3469    if (any(g0_q /= 0)) then
3470      do idat=0,ndat-1
3471        wc_rpr%buffer_cplx(:, ir1+idat) = conjg(ceig0r) * wc_rpr%buffer_cplx(:, ir1+idat)
3472      end do
3473    end if
3474  end do ! ir1
3475 
3476  call uplan_k%free(); call gpr%free(); call desc_qbz%free(); call wc_ggp%free()
3477 
3478  ABI_SFREE(ceig0r)
3479 
3480 end subroutine gwr_get_wc_rpr_qbz

m_gwr/gwr_gk_to_scbox [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_gk_to_scbox

FUNCTION

INPUTS

OUTPUT

SOURCE

2522 subroutine gwr_gk_to_scbox(gwr, sc_ngfft, desc_mykbz, green_scgvec, my_ir, ndat, gt_gpr, gt_scbox, gt_scbox_win)
2523 
2524 !Arguments ------------------------------------
2525  class(gwr_t),target,intent(in) :: gwr
2526  integer,intent(in) :: sc_ngfft(18)
2527  integer,intent(out) :: green_scgvec(3, gwr%green_mpw)
2528  type(desc_t),intent(inout) :: desc_mykbz(gwr%my_nkbz)
2529  type(__slkmat_t),intent(in) :: gt_gpr(2, gwr%my_nkbz)
2530  integer,intent(in) :: my_ir, ndat
2531  complex(gwpc),intent(out) :: gt_scbox(product(sc_ngfft(4:6))*gwr%nspinor, gwr%sc_batch_size, 2)
2532  !complex(gwpc),intent(out) :: gt_scbox(:,:,:)
2533  integer,optional,intent(inout) :: gt_scbox_win
2534 
2535 !Local variables-------------------------------
2536  integer :: my_ikf, ik_bz, ipm, gg(3), idat, iepoch, ii, idat_list(gwr%kpt_comm%nproc) ! ig,
2537  !real(dp) :: tsec(2) !, cpu, wall, gflops
2538 
2539 ! *************************************************************************
2540 
2541  !call cwtime(cpu, wall, gflops, "start")
2542  !call timab(1929, 1, tsec)
2543 
2544  ! Take the union of (k,g') for k in the BZ.
2545  ! Note gwr%ngkpt instead of gwr%ngqpt.
2546  if (.not. present(gt_scbox_win)) then
2547 
2548    gt_scbox = czero_gw
2549    do my_ikf=1,gwr%my_nkbz
2550      ik_bz = gwr%my_kbz_inds(my_ikf); gg = nint(gwr%kbz(:,ik_bz) * gwr%ngkpt)
2551 #if 1
2552      do ipm=1,2
2553        call desc_mykbz(my_ikf)%to_scbox(gwr%kbz(:,ik_bz), gwr%ngkpt, sc_ngfft, gwr%nspinor*ndat, &
2554                                         gt_gpr(ipm, my_ikf)%buffer_cplx(1,my_ir), gt_scbox(:,:,ipm))
2555      end do
2556 #else
2557      associate (desc_k => desc_mykbz(my_ikf))
2558      do ig=1,desc_k%npw
2559        green_scgvec(:,ig) = gg + gwr%ngkpt * desc_k%gvec(:,ig)  ! k+g
2560      end do
2561      do ipm=1,2
2562        call gsph2box(sc_ngfft, desc_k%npw, gwr%nspinor*ndat, green_scgvec, &
2563                      gt_gpr(ipm, my_ikf)%buffer_cplx(1,my_ir), gt_scbox(:,:,ipm))
2564      end do
2565      end associate
2566 #endif
2567    end do ! my_ikf
2568 
2569  else
2570    ! Each MPI proc operates on a different idat vector at each epoch
2571    idat_list = cshift([(ii, ii=1,gwr%kpt_comm%nproc)], shift=-gwr%kpt_comm%me)
2572 
2573    do iepoch=1,gwr%kpt_comm%nproc
2574      call xmpi_win_fence(gt_scbox_win)
2575      idat = idat_list(iepoch)
2576      if (idat > ndat) goto 10
2577      if (iepoch == 1) then
2578        do ipm=1,2
2579          gt_scbox(:,idat,ipm) = czero_gw
2580        end do
2581      end if
2582 
2583      do my_ikf=1,gwr%my_nkbz
2584        ik_bz = gwr%my_kbz_inds(my_ikf); gg = nint(gwr%kbz(:, ik_bz) * gwr%ngkpt)
2585 #if 1
2586        do ipm=1,2
2587          call desc_mykbz(my_ikf)%to_scbox(gwr%kbz(:,ik_bz), gwr%ngkpt, sc_ngfft, gwr%nspinor * ndat1, &
2588                                           gt_gpr(ipm, my_ikf)%buffer_cplx(1,my_ir+idat-1), gt_scbox(:,idat,ipm))
2589        end do
2590 #else
2591        associate (desc_k => desc_mykbz(my_ikf))
2592        do ig=1,desc_k%npw
2593          green_scgvec(:,ig) = gg + gwr%ngkpt * desc_k%gvec(:,ig) ! k+g
2594        end do
2595        do ipm=1,2
2596          call gsph2box(sc_ngfft, desc_k%npw, gwr%nspinor * ndat1, green_scgvec, &
2597                        gt_gpr(ipm, my_ikf)%buffer_cplx(1,my_ir+idat-1), gt_scbox(:,idat,ipm))
2598        end do
2599        end associate
2600 #endif
2601      end do ! my_ikf
2602      10 continue
2603      !call xmpi_barrier(gwr%kpt_comm%value)
2604      !IF (.not. MPI_ASYNC_PROTECTS_NONBLOCKING) CALL MPI_F_SYNC_REG(gt_scbox)
2605      call xmpi_win_fence(gt_scbox_win)
2606    end do ! iepoch
2607  end if
2608 
2609  !call cwtime_report(" gwr_gk_to_scbox:", cpu, wall, gflops)
2610  !call timab(1929, 2, tsec)
2611 
2612 end subroutine gwr_gk_to_scbox

m_gwr/gwr_init [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 gwr_init

FUNCTION

  Initialize the gwr object.

INPUTS

OUTPUT

SOURCE

 835 subroutine gwr_init(gwr, dtset, dtfil, cryst, psps, pawtab, ks_ebands, mpi_enreg, input_comm)
 836 
 837 !Arguments ------------------------------------
 838 !scalars
 839  class(gwr_t),target,intent(out) :: gwr
 840  type(dataset_type),target,intent(in) :: dtset
 841  type(datafiles_type),target,intent(in) :: dtfil
 842  type(crystal_t),target,intent(in) :: cryst
 843  type(pseudopotential_type),target,intent(in) :: psps
 844  type(pawtab_type),target,intent(in) :: pawtab(psps%ntypat*psps%usepaw)
 845  type(ebands_t),target,intent(in) :: ks_ebands
 846  type(mpi_type),target,intent(in) :: mpi_enreg
 847  integer,intent(in) :: input_comm
 848 
 849 !Local variables-------------------------------
 850 !scalars
 851  integer,parameter :: qptopt1 = 1, qtimrev1 = 1, master = 0, ndims = 4
 852  integer :: my_it, my_ikf, ii, ebands_timrev, my_iki, my_iqi, itau, spin, my_iqf
 853  integer :: my_nshiftq, iq_bz, iq_ibz, npw_, ncid, smat_bsize1, smat_bsize2
 854  integer :: comm_cart, me_cart, ierr, all_nproc, my_rank, qprange_, gap_err, ncerr, omp_nt
 855  integer :: cnt, ikcalc, ndeg, mband, bstop, nbsum, jj
 856  integer :: ik_ibz, ik_bz, isym_k, trev_k, g0_k(3)
 857  integer :: ip_g, ip_k, ip_t, ip_s, np_g, np_k, np_t, np_s
 858  real(dp) :: cpu, wall, gflops, wmax, vc_ecut, delta, abs_rerr, exact_int, eval_int
 859  real(dp) :: prev_efficiency, prev_speedup, regterm, prev_dual_error
 860  logical :: isirr_k, changed, q_is_gamma, reorder
 861  character(len=5000) :: msg
 862  type(krank_t) :: qrank, krank_ibz
 863  type(est_t) :: est
 864 !arrays
 865  integer :: qptrlatt(3,3), dims_kgts(ndims), try_dims_kgts(ndims), indkk_k(6,1), units(2)
 866  integer,allocatable :: gvec_(:,:),degblock(:,:), degblock_all(:,:,:,:), ndeg_all(:,:), iwork(:,:), got(:)
 867  real(dp) :: my_shiftq(3,1), kk_ibz(3), kk_bz(3), qq_bz(3), qq_ibz(3), kk(3), tsec(2)
 868  real(dp),allocatable :: wtk(:), kibz(:,:)
 869  logical :: periods(ndims), keepdim(ndims)
 870 
 871 ! *************************************************************************
 872 
 873  call cwtime(cpu, wall, gflops, "start")
 874  call timab(1920, 1, tsec)
 875 
 876  all_nproc = xmpi_comm_size(input_comm); my_rank = xmpi_comm_rank(input_comm)
 877  units = [std_out, ab_out]
 878 
 879  ! Keep a reference to other objects to simplify the internal API.
 880  gwr%dtset => dtset; gwr%dtfil => dtfil; gwr%cryst => cryst; gwr%psps => psps; gwr%pawtab => pawtab
 881  gwr%ks_ebands => ks_ebands; gwr%kibz => ks_ebands%kptns; gwr%wtk => ks_ebands%wtk
 882  gwr%mpi_enreg => mpi_enreg
 883 
 884  ! Initialize qp_ebands with KS values.
 885  call ebands_copy(ks_ebands, gwr%qp_ebands)
 886  call ebands_copy(ks_ebands, gwr%qp_ebands_prev)
 887 
 888  ABI_MALLOC(gwr%ks_vbik, (gwr%ks_ebands%nkpt, gwr%ks_ebands%nsppol))
 889  gwr%ks_vbik(:,:) = ebands_get_valence_idx(gwr%ks_ebands)
 890 
 891  gwr%nspinor = dtset%nspinor; gwr%nsppol = dtset%nsppol; gwr%nspden = dtset%nspden; gwr%nsig_ab = gwr%nspinor ** 2
 892  gwr%natom = dtset%natom; gwr%usepaw = dtset%usepaw
 893 
 894  gwr%sig_diago = .True.
 895  if (string_in(gwr%dtset%gwr_task, "GAMMA_GW")) gwr%sig_diago = .False.
 896 
 897  ! Decide whether one should use supercells or convolutions in the BZ.
 898  gwr%use_supercell_for_tchi = .True.
 899  if (gwr%dtset%gwr_chi_algo == 0) then
 900    ! Automatic selection
 901    ABI_ERROR("Not implemented Error")
 902  else
 903    gwr%use_supercell_for_tchi = gwr%dtset%gwr_chi_algo == 1
 904  end if
 905 
 906  if (gwr%dtset%gwr_sigma_algo == 0) then
 907    ! Automatic selection
 908    ABI_ERROR("Not implemented Error")
 909  else
 910    gwr%use_supercell_for_sigma = gwr%dtset%gwr_sigma_algo == 1
 911  end if
 912 
 913  ! Set q0
 914  if (dtset%gw_nqlwl /= 0) gwr%q0 = dtset%gw_qlwl(:, 1)
 915 
 916  mband = ks_ebands%mband; nbsum = dtset%nband(1)
 917  ABI_CHECK_IRANGE(nbsum, 1, mband, "Invalid nbsum")
 918 
 919  !call gwr%pstat%from_pid()
 920 
 921  ! Define frequency mesh for sigma(w_real) and spectral functions.
 922  ! Note that in GWR computing quantities on the real-axis is really cheap
 923  ! so we can use very dense meshes without affecting performance.
 924  ! The default for nfresp and freqspmax is zero.
 925  ! TODO: Perhaps we can make it optional as in legacy-GW.
 926  wmax = dtset%freqspmax; if (abs(wmax) < tol6) wmax = 100 * eV_Ha
 927  gwr%nwr = dtset%nfreqsp
 928  if (gwr%nwr ==  0) gwr%nwr = nint(wmax / (0.05 * eV_Ha))
 929  if (mod(gwr%nwr, 2) == 0) gwr%nwr = gwr%nwr + 1
 930  gwr%wr_step = wmax / (gwr%nwr - 1)
 931 
 932  ! =======================
 933  ! Setup k-mesh and q-mesh
 934  ! =======================
 935 
 936  ! Get full kBZ associated to ks_ebands
 937  call kpts_ibz_from_kptrlatt(cryst, ks_ebands%kptrlatt, ks_ebands%kptopt, ks_ebands%nshiftk, ks_ebands%shiftk, &
 938                              gwr%nkibz, kibz, wtk, gwr%nkbz, gwr%kbz) !, bz2ibz=bz2ibz)
 939                              !new_kptrlatt=gwr%kptrlatt, new_shiftk=gwr%kshift,
 940                              !bz2ibz=new%ind_qbz2ibz)  # FIXME
 941  ABI_FREE(wtk)
 942 
 943  ! In principle kibz should be equal to ks_ebands%kptns.
 944  ABI_CHECK_IEQ(gwr%nkibz, ks_ebands%nkpt, "nkibz != ks_ebands%nkpt")
 945  ABI_CHECK(all(abs(ks_ebands%kptns - kibz) < tol12), "ks_ebands%kibz != kibz")
 946 
 947  if (.not. (isdiagmat(ks_ebands%kptrlatt) .and. ks_ebands%nshiftk == 1)) then
 948    ABI_ERROR("GWR code requires ngkpt with one shift!")
 949  end if
 950  gwr%ngkpt = get_diag(ks_ebands%kptrlatt)
 951 
 952  ! Note symrec convention.
 953  ebands_timrev = kpts_timrev_from_kptopt(ks_ebands%kptopt)
 954  krank_ibz = krank_from_kptrlatt(gwr%nkibz, kibz, ks_ebands%kptrlatt, compute_invrank=.False.)
 955 
 956  ABI_MALLOC(gwr%kbz2ibz, (6, gwr%nkbz))
 957  if (kpts_map("symrec", ebands_timrev, cryst, krank_ibz, gwr%nkbz, gwr%kbz, gwr%kbz2ibz) /= 0) then
 958    ABI_ERROR("Cannot map kBZ to IBZ!")
 959  end if
 960 
 961  ! Order kbz by stars and rearrange entries in kbz2ibz table.
 962  call kpts_pack_in_stars(gwr%nkbz, gwr%kbz, gwr%kbz2ibz)
 963 
 964  if (my_rank == master) then
 965    call kpts_map_print(units, " Mapping kBZ --> kIBZ", "symrec", gwr%kbz, kibz, gwr%kbz2ibz, gwr%dtset%prtvol)
 966  end if
 967 
 968  !call get_ibz2bz(gwr%nkibz, gwr%nkbz, gwr%kbz2ibz, kibz2bz, ierr)
 969  !ABI_CHECK(ierr == 0, "Something wrong in symmetry tables for k-points")
 970 
 971  ! Table with symrel conventions for the symmetrization of the wfs.
 972  ABI_MALLOC(gwr%kbz2ibz_symrel, (6, gwr%nkbz))
 973  if (kpts_map("symrel", ebands_timrev, cryst, krank_ibz, gwr%nkbz, gwr%kbz, gwr%kbz2ibz_symrel) /= 0) then
 974    ABI_ERROR("Cannot map kBZ to IBZ!")
 975  end if
 976 
 977  ! Setup qIBZ, weights and BZ.
 978  ! Always use q --> -q symmetry even in systems without inversion
 979  ! TODO: Might add input variable to rescale the q-mesh.
 980  my_nshiftq = 1; my_shiftq = zero; qptrlatt = ks_ebands%kptrlatt
 981  call kpts_ibz_from_kptrlatt(cryst, qptrlatt, qptopt1, my_nshiftq, my_shiftq, &
 982                              gwr%nqibz, gwr%qibz, gwr%wtq, gwr%nqbz, gwr%qbz)
 983                              !new_kptrlatt=gwr%qptrlatt, new_shiftk=gwr%qshift,
 984                              !bz2ibz=new%ind_qbz2ibz)  # FIXME
 985 
 986  ABI_CHECK(all(abs(gwr%qibz(:,1)) < tol16), "First qpoint in qibz should be Gamma!")
 987  gwr%ngqpt = get_diag(qptrlatt)
 988 
 989  ! HM: the bz2ibz produced above is incomplete, I do it here using listkk
 990  ABI_MALLOC(gwr%qbz2ibz, (6, gwr%nqbz))
 991 
 992  qrank = krank_from_kptrlatt(gwr%nqibz, gwr%qibz, qptrlatt, compute_invrank=.False.)
 993 
 994  if (kpts_map("symrec", qtimrev1, cryst, qrank, gwr%nqbz, gwr%qbz, gwr%qbz2ibz) /= 0) then
 995    ABI_ERROR("Cannot map qBZ to IBZ!")
 996  end if
 997  call qrank%free()
 998 
 999  ! Order qbz by stars and rearrange entries in qbz2ibz table.
1000  call kpts_pack_in_stars(gwr%nqbz, gwr%qbz, gwr%qbz2ibz)
1001  if (my_rank == master) then
1002    call kpts_map_print(units, " Mapping qBZ --> qIBZ", "symrec", gwr%qbz, gwr%qibz, gwr%qbz2ibz, gwr%dtset%prtvol)
1003  end if
1004 
1005  ! ==========================
1006  ! Setup k-points in Sigma_nk
1007  ! ==========================
1008  gwr%ks_gaps = ebands_get_gaps(ks_ebands, gap_err)
1009  if (my_rank == master) then
1010    !call ebands_print(ks_ebands, header="KS band structure", unit=std_out, prtvol=gwr%dtset%prtvol)
1011    !call ebands_print_gaps(ks_ebands, ab_out, header="KS gaps (Fermi energy set to zero)")
1012    msg = "Kohn-Sham gaps and band edges from IBZ mesh"
1013    call gwr%ks_gaps%print(unit=std_out, header=msg)
1014    call gwr%ks_gaps%print(unit=ab_out, header=msg)
1015  end if
1016 
1017  ! TODO: nkcalc should be spin dependent.
1018  ! This piece of code is taken from m_sigmaph.
1019  ! In principle one should use the same algorithm in setup_sigma (legacy GW code).
1020  if (dtset%nkptgw /= 0) then
1021    ! Treat the k-points and bands specified in the input file via kptgw and bdgw.
1022    call sigtk_kcalc_from_nkptgw(dtset, mband, gwr%nkcalc, gwr%kcalc, gwr%bstart_ks, gwr%nbcalc_ks)
1023 
1024  else
1025    if (any(abs(dtset%sigma_erange) > zero)) then
1026      ! Use sigma_erange and (optionally) sigma_ngkpt
1027      call sigtk_kcalc_from_erange(dtset, cryst, ks_ebands, gwr%ks_gaps, &
1028                                   gwr%nkcalc, gwr%kcalc, gwr%bstart_ks, gwr%nbcalc_ks, input_comm)
1029 
1030    else
1031      ! Use qp_range to select the interesting k-points and the corresponding bands.
1032      !
1033      !    0 --> Compute the QP corrections only for the fundamental and the direct gap.
1034      ! +num --> Compute the QP corrections for all the k-points in the irreducible zone and include `num`
1035      !          bands above and below the Fermi level.
1036      ! -num --> Compute the QP corrections for all the k-points in the irreducible zone.
1037      !          Include all occupied states and `num` empty states.
1038 
1039      qprange_ = dtset%gw_qprange
1040      if (gap_err /= 0 .and. qprange_ == 0) then
1041        ABI_WARNING("Cannot compute fundamental and direct gap (likely metal). Will replace qprange 0 with qprange 1")
1042        qprange_ = 1
1043      end if
1044 
1045      if (qprange_ /= 0) then
1046        call sigtk_kcalc_from_qprange(dtset, cryst, ks_ebands, qprange_, gwr%nkcalc, gwr%kcalc, gwr%bstart_ks, gwr%nbcalc_ks)
1047      else
1048        ! qprange is not specified in the input.
1049        ! Include direct and fundamental KS gap or include states depending on the position wrt band edges.
1050        call sigtk_kcalc_from_gaps(dtset, ks_ebands, gwr%ks_gaps, gwr%nkcalc, gwr%kcalc, gwr%bstart_ks, gwr%nbcalc_ks)
1051      end if
1052    end if
1053 
1054  end if ! nkptgw /= 0
1055 
1056  ! Include all degenerate states and map kcalc to the IBZ. NB: This part is copied from sigmaph.
1057 
1058  ! The k-point and the symmetries connecting the BZ k-point to the IBZ.
1059  ABI_MALLOC(gwr%kcalc2ibz, (gwr%nkcalc, 6))
1060 
1061  ! Workspace arrays used to compute degeneracy tables.
1062  ABI_ICALLOC(degblock_all, (2, mband, gwr%nkcalc, gwr%nsppol))
1063  ABI_ICALLOC(ndeg_all, (gwr%nkcalc, gwr%nsppol))
1064 
1065  ierr = 0
1066  do ikcalc=1,gwr%nkcalc
1067    ! Note symrel and use_symrel.
1068    ! These are the conventions for the symmetrization of the wavefunctions used in cgtk_rotate.
1069    kk = gwr%kcalc(:, ikcalc)
1070 
1071    if (kpts_map("symrel", ebands_timrev, cryst, krank_ibz, 1, kk, indkk_k) /= 0) then
1072       write(msg, '(5a)' ) &
1073        "The WFK file cannot be used to compute self-energy corrections at k-point: ",trim(ktoa(kk)),ch10,&
1074        "The k-point cannot be generated from a symmetrical one.", ch10
1075       ABI_ERROR(msg)
1076    end if
1077 
1078    ! TODO: Invert dims and update abipy
1079    gwr%kcalc2ibz(ikcalc, :) = indkk_k(:, 1)
1080 
1081    ik_ibz = indkk_k(1,1); isym_k = indkk_k(2,1)
1082    trev_k = indkk_k(6,1); g0_k = indkk_k(3:5,1)
1083    isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
1084    !kk_ibz = ks_ebands%kptns(:,ik_ibz)
1085    if (.not. isirr_k) then
1086      ABI_WARNING(sjoin("The k-point in Sigma_{nk} must be in the IBZ but got:", ktoa(kk)))
1087      ierr = ierr + 1
1088    end if
1089 
1090    ! We will have to average the QP corrections over degenerate states if symsigma=1 is used.
1091    ! Here we make sure that all the degenerate states are included.
1092    ! Store also band indices of the degenerate sets, used to average final results.
1093    if (abs(gwr%dtset%symsigma) == 1) then
1094      cnt = 0
1095      do spin=1,gwr%nsppol
1096        bstop = gwr%bstart_ks(ikcalc, spin) + gwr%nbcalc_ks(ikcalc, spin) - 1
1097        call ebands_enclose_degbands(ks_ebands, ik_ibz, spin, gwr%bstart_ks(ikcalc, spin), bstop, changed, TOL_EDIFF, &
1098                                     degblock=degblock)
1099        if (changed) then
1100          gwr%nbcalc_ks(ikcalc, spin) = bstop - gwr%bstart_ks(ikcalc, spin) + 1
1101          cnt = cnt + 1
1102          if (cnt < 5) then
1103            write(msg,'(2(a,i0),2a,2(1x,i0))') &
1104              "Not all the degenerate states for ikcalc: ",ikcalc,", spin: ",spin,ch10, &
1105              "were included in the bdgw set. bdgw has been automatically changed to: ",gwr%bstart_ks(ikcalc, spin), bstop
1106            ABI_COMMENT(msg)
1107          end if
1108          write(msg,'(2(a,i0),2a)') &
1109            "The number of included states: ", bstop, &
1110            " is larger than the number of bands in the input ",dtset%nband(ik_ibz + (spin-1)*ks_ebands%nkpt),ch10,&
1111            "Action: Increase nband."
1112          ABI_CHECK(bstop <= dtset%nband(ik_ibz + (spin-1)*ks_ebands%nkpt), msg)
1113        end if
1114 
1115        ! Store band indices used for averaging (shifted by bstart_ks)
1116        ndeg = size(degblock, dim=2)
1117        ndeg_all(ikcalc, spin) = ndeg
1118        degblock_all(:, 1:ndeg, ikcalc, spin) = degblock(:, 1:ndeg)
1119 
1120        ABI_FREE(degblock)
1121      end do
1122    end if ! symsigma
1123  end do ! ikcalc
1124 
1125  ABI_CHECK(ierr == 0, "kptgw wavevectors must be in the IBZ read from the WFK file.")
1126 
1127  ! Build degtab tables to average self-energy matrix element if symsigma /= 0
1128  if (abs(gwr%dtset%symsigma) == 1) then
1129    ABI_MALLOC(gwr%degtab, (gwr%nkcalc, gwr%nsppol))
1130    do ikcalc=1,gwr%nkcalc
1131      do spin=1,gwr%nsppol
1132        ndeg = ndeg_all(ikcalc, spin)
1133        ABI_MALLOC(gwr%degtab(ikcalc, spin)%bids, (ndeg))
1134        do ii=1,ndeg
1135          cnt = degblock_all(2, ii, ikcalc, spin) - degblock_all(1, ii, ikcalc, spin) + 1
1136          ABI_MALLOC(gwr%degtab(ikcalc, spin)%bids(ii)%vals, (cnt))
1137          ! Here we start to count bands from bstart_ks(ikcalc, spin)
1138          !gwr%degtab(ikcalc, spin)%bids(ii)%vals = [(jj, jj= &
1139          !  degblock_all(1, ii, ikcalc, spin) - gwr%bstart_ks(ikcalc, spin) + 1, &
1140          !  degblock_all(2, ii, ikcalc, spin) - gwr%bstart_ks(ikcalc, spin) + 1)]
1141          ! Note that we start to count bands from bstart_ks(ikcalc, spin)
1142          gwr%degtab(ikcalc, spin)%bids(ii)%vals = [(jj, jj= &
1143            degblock_all(1, ii, ikcalc, spin), degblock_all(2, ii, ikcalc, spin))]
1144        end do
1145      end do
1146    end do
1147  end if
1148 
1149  ABI_FREE(degblock_all)
1150  ABI_FREE(ndeg_all)
1151 
1152  ! Now we can finally compute max_nbcalc
1153  gwr%max_nbcalc = maxval(gwr%nbcalc_ks)
1154  ABI_MALLOC(gwr%bstop_ks, (gwr%nkcalc, gwr%nsppol))
1155  gwr%bstop_ks = gwr%bstart_ks + gwr%nbcalc_ks - 1
1156  gwr%b1gw = minval(gwr%bstart_ks); gwr%b2gw = maxval(gwr%bstop_ks)
1157 
1158  call krank_ibz%free()
1159  ABI_FREE(kibz) ! Deallocate kibz here because krank_ibz keeps a reference to this array.
1160 
1161  ! ================================
1162  ! Setup tau/omega mesh and weights
1163  ! ================================
1164  ! Compute min/max transition energy taking into account nsppol if any.
1165  gwr%te_min = minval(gwr%ks_gaps%cb_min - gwr%ks_gaps%vb_max)
1166  gwr%te_max = maxval(ks_ebands%eig(nbsum,:,:) - ks_ebands%eig(1,:,:))
1167  if (gwr%te_min <= tol6) then
1168    gwr%te_min = tol6
1169    ABI_ERROR("System is metallic or with a very small fundamental gap!")
1170  end if
1171  gwr%ntau = dtset%gwr_ntau
1172 
1173 !#ifdef __HAVE_GREENX
1174  regterm = dtset%gwr_regterm
1175  if (regterm > -tol16) then
1176      call wrtout(std_out, sjoin("Computing minimax grid with user-provided regterm:", ftoa(regterm)))
1177      call gx_minimax_grid(gwr%ntau, gwr%te_min, gwr%te_max, &  ! in
1178                           gwr%tau_mesh, gwr%tau_wgs, gwr%iw_mesh, gwr%iw_wgs, & ! out args allocated by the routine.
1179                           gwr%cosft_wt, gwr%cosft_tw, gwr%sinft_wt, &
1180                           gwr%ft_max_error, gwr%cosft_duality_error, ierr, regterm=regterm)
1181      ABI_CHECK(ierr == 0, "Error in gx_minimax_grid")
1182  else
1183      regterm = zero
1184      call wrtout(std_out, sjoin("Computing minimax grid with user-provided regterm:", ftoa(regterm)))
1185      call gx_minimax_grid(gwr%ntau, gwr%te_min, gwr%te_max, &  ! in
1186                           gwr%tau_mesh, gwr%tau_wgs, gwr%iw_mesh, gwr%iw_wgs, & ! out args allocated by the routine.
1187                           gwr%cosft_wt, gwr%cosft_tw, gwr%sinft_wt, &
1188                           gwr%ft_max_error, gwr%cosft_duality_error, ierr, regterm=regterm)
1189      ABI_CHECK(ierr == 0, "Error in gx_minimax_grid")
1190 
1191     ! If duality error is big, use regterm = 1e-6
1192     if (gwr%cosft_duality_error > half) then
1193        ABI_SFREE(gwr%tau_mesh)
1194        ABI_SFREE(gwr%tau_wgs)
1195        ABI_SFREE(gwr%iw_mesh)
1196        ABI_SFREE(gwr%iw_wgs)
1197        ABI_SFREE(gwr%cosft_wt)
1198        ABI_SFREE(gwr%cosft_tw)
1199        ABI_SFREE(gwr%sinft_wt)
1200        regterm = tol6
1201        call wrtout(std_out, sjoin("LARGE duality error -> recomputing minimax grid with regterm:", ftoa(regterm)))
1202        prev_dual_error = gwr%cosft_duality_error
1203        call gx_minimax_grid(gwr%ntau, gwr%te_min, gwr%te_max, &  ! in
1204                             gwr%tau_mesh, gwr%tau_wgs, gwr%iw_mesh, gwr%iw_wgs, & ! out args allocated by the routine.
1205                             gwr%cosft_wt, gwr%cosft_tw, gwr%sinft_wt, &
1206                             gwr%ft_max_error, gwr%cosft_duality_error, ierr, regterm=regterm)
1207        ABI_CHECK(ierr == 0, "Error in gx_minimax_grid")
1208 
1209        if (gwr%cosft_duality_error > prev_dual_error) then
1210          ABI_WARNING("Using regterm didn't decrease the duality error")
1211        end if
1212     end if
1213  end if
1214 
1215  ! FIXME: Here we need to rescale the weights because greenx convention is not what we expect!
1216  !gwr%iw_wgs(:) = gwr%iw_wgs(:) / four
1217 
1218  if (gwr%comm%me == 0) then
1219    write(std_out, "(3a)")ch10, " Computing F(delta) = \int_0^{\infty} dw / (w^2 + delta^2) = pi/2/delta ", ch10
1220    write(std_out, "(*(a12,2x))")"delta", "numeric", "exact", "abs_rerr (%)"
1221    do ii=1,10
1222      delta = (ii * gwr%te_min)
1223      eval_int = sum(gwr%iw_wgs(:) / (gwr%iw_mesh(:)**2 + delta**2))
1224      exact_int = pi / (two * delta)
1225      abs_rerr = 100 * abs(eval_int - exact_int) / exact_int
1226      write(std_out, "(*(es12.5,2x))") delta, eval_int, exact_int, abs_rerr
1227    end do
1228 
1229    write(std_out, "(3a)")ch10," Computing F(w) = \int_0^{\infty} e^{-w tau} dtau", ch10
1230    write(std_out, "(*(a12,2x))")"w", "numeric", "exact", "abs_rerr (%)"
1231    do itau=1,gwr%ntau
1232      eval_int = sum(gwr%tau_wgs(:) * exp(-gwr%tau_mesh(:) * gwr%iw_mesh(itau)))
1233      exact_int = one / gwr%iw_mesh(itau)
1234      abs_rerr = 100 * abs(eval_int - exact_int) / exact_int
1235      write(std_out, "(*(es12.5,2x))") gwr%iw_mesh(itau), eval_int, exact_int, abs_rerr
1236    end do
1237    write(std_out, "(a)")
1238  end if
1239 
1240  ! =========================================
1241  ! Find FFT mesh and max number of g-vectors
1242  ! =========================================
1243  ! Note the usage of gwr_boxcutmin and the loops over the full BZ. All the procs execute this part.
1244  gwr%g_ngfft = gwr%dtset%ngfft; gwr%g_ngfft(1:6) = 0  ! Allow user to specify fftalg
1245 
1246  gwr%green_mpw = -1
1247  do ik_bz=1,gwr%nkbz
1248    kk_bz = gwr%kbz(:, ik_bz)
1249    call get_kg(kk_bz, istwfk1, dtset%ecut, gwr%cryst%gmet, npw_, gvec_)
1250    ABI_FREE(gvec_)
1251    call getng(dtset%gwr_boxcutmin, dtset%chksymtnons, dtset%ecut, cryst%gmet, &
1252               kk_bz, me_fft0, gwr%g_mgfft, gwr%g_nfft, gwr%g_ngfft, nproc_fft1, cryst%nsym, paral_fft0, &
1253               cryst%symrel, cryst%tnons, gpu_option=gwr%dtset%gpu_option, unit=dev_null)
1254    gwr%green_mpw = max(gwr%green_mpw, npw_)
1255  end do
1256 
1257  gwr%tchi_mpw = -1
1258  do iq_bz=1,gwr%nqbz
1259    qq_bz = gwr%qbz(:, iq_bz)
1260    call get_kg(qq_bz, istwfk1, dtset%ecuteps, gwr%cryst%gmet, npw_, gvec_)
1261    ABI_FREE(gvec_)
1262    call getng(dtset%gwr_boxcutmin, dtset%chksymtnons, dtset%ecuteps, cryst%gmet, &
1263               qq_bz, me_fft0, gwr%g_mgfft, gwr%g_nfft, gwr%g_ngfft, nproc_fft1, cryst%nsym, &
1264               paral_fft0, cryst%symrel, cryst%tnons, gpu_option=gwr%dtset%gpu_option, unit=dev_null)
1265    gwr%tchi_mpw = max(gwr%tchi_mpw, npw_)
1266    if (iq_bz == 1) then
1267      ABI_CHECK(all(abs(qq_bz) < tol16), "First qpoint in the qbz should be Gamma!")
1268    end if
1269  end do
1270 
1271  ! For the time being no augmentation
1272  gwr%g_ngfft(4:6) = gwr%g_ngfft(1:3)
1273 
1274  ! ========================
1275  ! === MPI DISTRIBUTION ===
1276  ! ========================
1277  !
1278  ! Here we define the following quantities:
1279  !  - np_k, np_g, np_t, np_s
1280  !  - gwr%comm and gwr%idle_proc
1281  !
1282  ! NB: Do not use input_comm after this section as idle processors return immediately.
1283 
1284  if (any(dtset%gwr_np_kgts /= 0)) then
1285    ! Use grid from input file.
1286    np_k = dtset%gwr_np_kgts(1); np_g = dtset%gwr_np_kgts(2); np_t = dtset%gwr_np_kgts(3); np_s = dtset%gwr_np_kgts(4)
1287    !call xmpi_comm_multiple_of(product(dtset%gwr_np_kgts), input_comm, gwr%idle_proc, gwr%comm)
1288    !if (gwr%idle_proc) return
1289    gwr%comm = xcomm_from_mpi_int(input_comm)
1290    all_nproc = gwr%comm%nproc
1291 
1292  else
1293    ! Automatic grid generation.
1294    !
1295    !   Priorities        |  MPI Scalability                | Memory
1296    ! ==================================================================================================
1297    !   spin (if any)     |  excellent                      | scales
1298    !   g/r (PBLAS)       |  network-intensive              ! scales
1299    !   tau               |  excellent                      | scales
1300    !   kbz               |  newtwork-intensive             | scales (depends on the BZ -> IBZ mapping)
1301 
1302    gwr%comm = xcomm_from_mpi_int(input_comm)
1303    all_nproc = gwr%comm%nproc
1304    !call xmpi_comm_multiple_of(gwr%ntau * gwr%dtset%nsppol, input_comm, gwr%idle_proc, gwr%comm)
1305    !if (gwr%idle_proc) return
1306    !all_nproc = xmpi_comm_size(gwr%comm)
1307 
1308    ! Start from a configuration that minimizes memory i.e use all procs for g-parallelism,
1309    ! then check whether it's possible to move some procs to the other levels
1310    ! without spoiling parallel efficiency and/or increasing memory per MPI proc.
1311    ! Only master rank works here for consistency reasons.
1312    if (my_rank == master) then
1313      dims_kgts = [1, all_nproc, 1, 1]
1314      est = estimate(gwr, dims_kgts)
1315      prev_efficiency = est%efficiency; prev_speedup = est%speedup
1316      call wrtout(units, sjoin("- Optimizing MPI grid with mem_per_cpu_mb:", ftoa(mem_per_cpu_mb), "[Mb]"), pre_newlines=1)
1317      call wrtout(units, "- Use `abinit run.abi --mem-per-cpu=4G` to set mem_per_cpu_mb in the submission script")
1318      write(msg, "(a,4(a4,2x),3(a12,2x))") "- ", "np_k", "np_g", "np_t", "np_s", "memb_per_cpu", "efficiency", "speedup"
1319      call wrtout(units, msg)
1320      ip_k = dims_kgts(1); ip_g = dims_kgts(2); ip_t = dims_kgts(3); ip_s = dims_kgts(4)
1321      write(msg, "(a,4(i4,2x),3(es12.5,2x))") "- ", ip_k, ip_g, ip_t, ip_s, est%mem_total, est%efficiency, est%speedup
1322      call wrtout(units, msg)
1323 
1324      do ip_s=1,gwr%nsppol
1325        do ip_t=1,gwr%ntau
1326          if (mod(gwr%ntau, ip_t) /= 0) cycle ! ip_t should divide gwr%ntau.
1327          do ip_k=1,gwr%nkbz
1328            if (mod(gwr%nkbz, ip_k) /= 0) cycle ! ip_k is should divide gwr%nkbz.
1329            do ip_g=1,gwr%green_mpw
1330              try_dims_kgts = [ip_k, ip_g, ip_t, ip_s]
1331              if (product(try_dims_kgts) /= all_nproc .or. all(try_dims_kgts == dims_kgts)) cycle
1332              !ABI_CHECK(block_dist_1d(npwsp, ip_g, col_bsize, msg), msg)
1333              est = estimate(gwr, try_dims_kgts)
1334              !if (est%mem_total < mem_per_cpu_mb * 0.8_dp .and. est%efficiency > prev_efficiency) then
1335              if (est%mem_total < mem_per_cpu_mb * 0.8_dp .and. est%speedup > prev_speedup) then
1336                prev_efficiency = est%efficiency; prev_speedup = est%speedup; dims_kgts = try_dims_kgts
1337              end if
1338              write(msg,"(a,4(i4,2x),3(es12.5,2x))")"- ", ip_k, ip_g, ip_t, ip_s, est%mem_total, est%efficiency, est%speedup
1339              call wrtout(units, msg)
1340            end do
1341          end do
1342        end do
1343      end do
1344    end if ! master
1345 
1346    call xmpi_bcast(dims_kgts, master, gwr%comm%value, ierr)
1347    np_k = dims_kgts(1); np_g = dims_kgts(2); np_t = dims_kgts(3); np_s = dims_kgts(4)
1348 
1349    if (my_rank == master) then
1350      est = estimate(gwr, dims_kgts)
1351      call wrtout(units, "-")
1352      call wrtout(units, "- Selected MPI grid:")
1353      ip_k = dims_kgts(1); ip_g = dims_kgts(2); ip_t = dims_kgts(3); ip_s = dims_kgts(4)
1354      write(msg, "(a,4(a4,2x),3(a12,2x))") "- ", "np_k", "np_g", "np_t", "np_s", "memb_per_cpu", "efficiency", "speedup"
1355      call wrtout(units, msg)
1356      write(msg, "(a,4(i4,2x),3(es12.5,2x))")"- ", ip_k, ip_g, ip_t, ip_s, est%mem_total, est%efficiency, est%speedup
1357      call wrtout(units, msg, newlines=1)
1358      call est%print(units)
1359      !call gwr%ps%print([std_out])
1360    end if
1361  end if
1362 
1363  ! ================================
1364  ! Build MPI grid and communicators
1365  ! ================================
1366  dims_kgts = [np_k, np_g, np_t, np_s]
1367  gwr%dtset%gwr_np_kgts = dims_kgts
1368  periods(:) = .False.; reorder = .False.
1369 
1370  ! Consistency check.
1371  if (product(dims_kgts) /= all_nproc) then
1372    write(msg, "(a,i0,3a, 5(a,1x,i0))") &
1373      "Cannot create 4D Cartesian grid with total nproc: ", all_nproc, ch10, &
1374      "Idle MPI processes are not supported. The product of the `nproc_*` vars should be equal to nproc while is it:", ch10, &
1375      "k_nproc (", np_k, ") x g_nproc (", np_g, ") x tau_nproc (", np_t,") x spin_nproc (", np_s, ") == ", product(dims_kgts)
1376    ABI_ERROR(msg)
1377  end if
1378 
1379 #ifdef HAVE_MPI
1380 block
1381  !integer,parameter :: k=1, g=2, t=3, s=4  ! Bad placement
1382  integer,parameter :: k=4, g=3, t=2, s=1   ! Much better placement
1383  dims_kgts = dims_kgts(4:1:-1)
1384  call MPI_CART_CREATE(gwr%comm%value, ndims, dims_kgts, periods, reorder, comm_cart, ierr)
1385 
1386  ! Find the index and coordinates of the current processor
1387  call MPI_COMM_RANK(comm_cart, me_cart, ierr)
1388  call MPI_CART_COORDS(comm_cart, me_cart, ndims, gwr%coords_stgk, ierr)
1389 
1390  ! k-point communicator
1391  keepdim = .False.; keepdim(k) = .True.; call gwr%kpt_comm%from_cart_sub(comm_cart, keepdim)
1392  ! g-communicator
1393  keepdim = .False.; keepdim(g) = .True.; call gwr%g_comm%from_cart_sub(comm_cart, keepdim)
1394  ! tau-communicator
1395  keepdim = .False.; keepdim(t) = .True.; call gwr%tau_comm%from_cart_sub(comm_cart, keepdim)
1396  ! spin-communicator
1397  keepdim = .False.; keepdim(s) = .True.; call gwr%spin_comm%from_cart_sub(comm_cart, keepdim)
1398  ! Communicator for the g-tau 2D grid.
1399  keepdim = .False.; keepdim(g) = .True.; keepdim(t) = .True.; call gwr%gtau_comm%from_cart_sub(comm_cart, keepdim)
1400  ! Communicator for the k-g 2D grid.
1401  keepdim = .False.; keepdim(k) = .True.; keepdim(g) = .True.; call gwr%kg_comm%from_cart_sub(comm_cart, keepdim)
1402  ! Communicator for the k-g-tau 3D subgrid.
1403  keepdim = .True.; keepdim(s) = .False.; call gwr%kgt_comm%from_cart_sub(comm_cart, keepdim)
1404  ! Communicator for the k-tau-spin 3D subgrid.
1405  keepdim = .True.; keepdim(g) = .False.; call gwr%kts_comm%from_cart_sub(comm_cart, keepdim)
1406  call xmpi_comm_free(comm_cart)
1407 end block
1408 #endif
1409 
1410  !call gwr%kpt_comm%print_names(); call gwr%g_comm%print_names()
1411 
1412  ! Define batch sizes for FFT transforms taking into account k-point parallelism, OpenMP threads and GPUs.
1413  omp_nt = xomp_get_num_threads(open_parallel=.True.)
1414 
1415  if (gwr%dtset%gwr_ucsc_batch(1) > 0) then
1416    ! Take it from input file (user is always right)
1417    gwr%uc_batch_size = gwr%dtset%gwr_ucsc_batch(1) * omp_nt
1418  else
1419    ! Automatic detection
1420    gwr%uc_batch_size = 1 * omp_nt
1421    if (gwr%dtset%gpu_option /= ABI_GPU_DISABLED) gwr%uc_batch_size = 4 * omp_nt
1422  end if
1423 
1424  if (gwr%dtset%gwr_ucsc_batch(2) > 0) then
1425    ! Take it from input file (user is always right)
1426    gwr%sc_batch_size = gwr%dtset%gwr_ucsc_batch(2) * omp_nt
1427  else
1428    ! Automatic detection
1429    gwr%sc_batch_size = 1 * omp_nt
1430    if (gwr%dtset%gpu_option /= ABI_GPU_DISABLED) gwr%sc_batch_size = 4 * omp_nt
1431  end if
1432 
1433  ! Make sure all procs agree.
1434  !call xmpi_min_ip(gwr%sc_batch_size, gwr%comm%value, ierr)
1435  !call xmpi_min_ip(gwr%uc_batch_size, gwr%comm%value, ierr)
1436 
1437  if (my_rank == master) then
1438    call print_ngfft(gwr%g_ngfft, header="FFT mesh for Green's function", unit=std_out)
1439    !call print_ngfft(gwr%g_ngfft, header="FFT mesh for Green's function", unit=ab_out)
1440    call wrtout(units, sjoin("- FFT uc_batch_size:", itoa(gwr%uc_batch_size)))
1441    call wrtout(units, sjoin("- FFT sc_batch_size:", itoa(gwr%sc_batch_size)))
1442  end if
1443 
1444  ! Block-distribute dimensions and allocate redirection table local index --> global index.
1445  call xmpi_split_block(gwr%ntau, gwr%tau_comm%value, gwr%my_ntau, gwr%my_itaus)
1446  ABI_CHECK(gwr%my_ntau > 0, "my_ntau == 0, decrease number of procs for tau level")
1447 
1448  ! Store the rank of the MPI proc in tau_comm treating itau index.
1449  ABI_MALLOC(gwr%tau_master, (gwr%ntau))
1450  gwr%tau_master = -1
1451  do my_it=1,gwr%my_ntau
1452    itau = gwr%my_itaus(my_it); gwr%tau_master(itau) = gwr%tau_comm%me
1453  end do
1454  call xmpi_max_ip(gwr%tau_master, gwr%tau_comm%value, ierr)
1455  ABI_CHECK(all(gwr%tau_master > -1), "tau_master!")
1456 
1457  call xmpi_split_block(gwr%nsppol, gwr%spin_comm%value, gwr%my_nspins, gwr%my_spins)
1458  ABI_CHECK(gwr%my_nspins > 0, "my_nspins == 0, decrease number of MPI procs for spin level")
1459 
1460  ! Distribute k-points in the full BZ and build redirection tables.
1461  ! Finally, find the number of IBZ k-points treated by this MPI rank.
1462  call xmpi_split_block(gwr%nkbz, gwr%kpt_comm%value, gwr%my_nkbz, gwr%my_kbz_inds)
1463  ABI_CHECK(gwr%my_nkbz > 0, "my_nkbz == 0, decrease number of MPI procs for k-point level")
1464 
1465  ! Compute np_kibz
1466  ABI_ICALLOC(gwr%np_kibz, (gwr%nkibz))
1467  do my_ikf=1,gwr%my_nkbz
1468    ik_bz = gwr%my_kbz_inds(my_ikf); ik_ibz = gwr%kbz2ibz(1, ik_bz)
1469    gwr%np_kibz(ik_ibz) = 1
1470  end do
1471 
1472  gwr%my_nkibz = count(gwr%np_kibz > 0)
1473  ABI_MALLOC(gwr%my_kibz_inds, (gwr%my_nkibz))
1474  ii = 0
1475  do ik_ibz=1,gwr%nkibz
1476    if (gwr%np_kibz(ik_ibz) > 0) then
1477      ii = ii + 1; gwr%my_kibz_inds(ii) = ik_ibz
1478    end if
1479  end do
1480 
1481  call xmpi_sum(gwr%np_kibz, gwr%kpt_comm%value, ierr)
1482 
1483  ! Build table to distribute iterations over ik_ibz as kIBZ might be replicated across MPI procs.
1484  ABI_ICALLOC(iwork, (gwr%kpt_comm%nproc, gwr%nkibz))
1485  ABI_ICALLOC(got, (gwr%kpt_comm%nproc))
1486  do my_iki=1,gwr%my_nkibz
1487    ik_ibz = gwr%my_kibz_inds(my_iki)
1488    iwork(gwr%kpt_comm%me + 1, ik_ibz) = 1
1489  end do
1490  call xmpi_sum(iwork, gwr%kpt_comm%value, ierr)
1491  ABI_MALLOC(gwr%itreat_ikibz, (gwr%nkibz))
1492  gwr%itreat_ikibz = .False.
1493  do ik_ibz=1,gwr%nkibz
1494    ii = imin_loc(got, mask=iwork(:, ik_ibz) /= 0); got(ii) = got(ii) + 1
1495    if (ii == gwr%kpt_comm%me + 1) gwr%itreat_ikibz(ik_ibz) = .True.
1496  end do
1497  ABI_FREE(got)
1498  ABI_FREE(iwork)
1499 
1500  ! Distribute q-points in full BZ, transfer symmetry tables.
1501  ! Finally find the number of my IBZ q-points that should be stored in memory.
1502  call xmpi_split_block(gwr%nqbz, gwr%kpt_comm%value, gwr%my_nqbz, gwr%my_qbz_inds)
1503 
1504  ! Compute np_qibz
1505  ABI_ICALLOC(gwr%np_qibz, (gwr%nqibz))
1506  do my_iqf=1,gwr%my_nqbz
1507    iq_bz = gwr%my_qbz_inds(my_iqf); iq_ibz = gwr%qbz2ibz(1, iq_bz)
1508    gwr%np_qibz(iq_ibz) = 1
1509  end do
1510 
1511  gwr%my_nqibz = count(gwr%np_qibz > 0)
1512  ABI_MALLOC(gwr%my_qibz_inds, (gwr%my_nqibz))
1513  ii = 0
1514  do iq_ibz=1,gwr%nqibz
1515    if (gwr%np_qibz(iq_ibz) > 0) then
1516      ii = ii + 1; gwr%my_qibz_inds(ii) = iq_ibz
1517    end if
1518  end do
1519 
1520  call xmpi_sum(gwr%np_qibz, gwr%kpt_comm%value, ierr)
1521 
1522  ! Build table to distribute iterations over iq_ibz as qIBZ might be replicated.
1523  ABI_ICALLOC(iwork, (gwr%kpt_comm%nproc, gwr%nqibz))
1524  ABI_ICALLOC(got, (gwr%kpt_comm%nproc))
1525  do my_iqi=1,gwr%my_nqibz
1526    iq_ibz = gwr%my_qibz_inds(my_iqi)
1527    iwork(gwr%kpt_comm%me + 1, iq_ibz) = 1
1528  end do
1529  call xmpi_sum(iwork, gwr%kpt_comm%value, ierr)
1530 
1531  ABI_MALLOC(gwr%itreat_iqibz, (gwr%nqibz))
1532  gwr%itreat_iqibz = .False.
1533  do iq_ibz=1,gwr%nqibz
1534    ii = imin_loc(got, mask=iwork(:, iq_ibz) /= 0); got(ii) = got(ii) + 1
1535    if (ii == gwr%kpt_comm%me + 1) gwr%itreat_iqibz(iq_ibz) = .True.
1536  end do
1537  ABI_FREE(got)
1538  ABI_FREE(iwork)
1539 
1540  ! TODO: MC technique does not seem to work as expected, even in the legacy code.
1541  vc_ecut = max(dtset%ecutsigx, dtset%ecuteps)
1542  call gwr%vcgen%init(cryst, ks_ebands%kptrlatt, gwr%nkbz, gwr%nqibz, gwr%nqbz, gwr%qbz, &
1543                      dtset%rcut, dtset%gw_icutcoul, dtset%vcutgeo, vc_ecut, gwr%comm%value)
1544 
1545  ! Now we know the value of g_ngfft. Setup tables for zero-padded FFTs.
1546  ! Build descriptors for Green's functions and tchi and setup tables for zero-padded FFTs.
1547  ABI_MALLOC(gwr%green_desc_kibz, (gwr%nkibz))
1548 
1549  do my_iki=1,gwr%my_nkibz
1550    ik_ibz = gwr%my_kibz_inds(my_iki); kk_ibz = gwr%kibz(:, ik_ibz)
1551    call gwr%green_desc_kibz(ik_ibz)%init(kk_ibz, istwfk1, dtset%ecut, gwr)
1552  end do
1553 
1554  ABI_MALLOC(gwr%tchi_desc_qibz, (gwr%nqibz))
1555  ABI_ICALLOC(gwr%chinpw_qibz, (gwr%nqibz))
1556 
1557  do my_iqi=1,gwr%my_nqibz
1558    iq_ibz = gwr%my_qibz_inds(my_iqi); qq_ibz = gwr%qibz(:, iq_ibz)
1559    ! Note ecuteps instead of ecut. Also, sort the g-vectors by |q+g|^2/2 when q is in the IBZ to facilitate
1560    ! the extrapolation of the RPA energy as a function of ecut_chi
1561    call gwr%tchi_desc_qibz(iq_ibz)%init(qq_ibz, istwfk1, dtset%ecuteps, gwr, kin_sorted=.True.)
1562 
1563    ! Compute sqrt(vc(q,G))
1564    associate (desc_q => gwr%tchi_desc_qibz(iq_ibz))
1565    if (gwr%itreat_iqibz(iq_ibz)) gwr%chinpw_qibz(iq_ibz) = desc_q%npw
1566    q_is_gamma = (normv(qq_ibz, gwr%cryst%gmet, "G") < GW_TOLQ0)
1567    call desc_q%get_vc_sqrt(qq_ibz, q_is_gamma, gwr, gwr%gtau_comm%value)
1568    end associate
1569  end do
1570 
1571  ! Collect npwq on all procs
1572  call xmpi_sum(gwr%chinpw_qibz, gwr%comm%value, ierr)
1573 
1574  ! Init 1D PBLAS grid to block-distribute matrices along columns.
1575  call gwr%g_slkproc%init(gwr%g_comm%value, grid_dims=[1, gwr%g_comm%nproc])
1576  call gwr%gtau_slkproc%init(gwr%gtau_comm%value, grid_dims=[1, gwr%gtau_comm%nproc])
1577 
1578  ! ==================================
1579  ! Allocate arrays of PBLAS matrices
1580  ! ==================================
1581  ABI_MALLOC(gwr%gt_kibz, (2, gwr%nkibz, gwr%ntau, gwr%nsppol))
1582  ABI_MALLOC(gwr%tchi_qibz, (gwr%nqibz, gwr%ntau, gwr%nsppol))
1583  ABI_MALLOC(gwr%sigc_kibz, (2, gwr%nkibz, gwr%ntau, gwr%nsppol))
1584 
1585  ! ====================================
1586  ! Create netcdf file to store results
1587  ! ====================================
1588  gwr%gwrnc_path = strcat(dtfil%filnam_ds(4), "_GWR.nc")
1589 
1590  if (my_rank == master) then
1591    call gwr%print(units)
1592    NCF_CHECK(nctk_open_create(ncid, gwr%gwrnc_path, xmpi_comm_self))
1593    ! Write structure and ebands
1594    NCF_CHECK(cryst%ncwrite(ncid))
1595    NCF_CHECK(ebands_ncwrite(ks_ebands, ncid))
1596 
1597    ! Add GWR dimensions.
1598    smat_bsize1 = gwr%b2gw - gwr%b1gw + 1
1599    smat_bsize2 = merge(1, gwr%b2gw - gwr%b1gw + 1, gwr%sig_diago)
1600    ncerr = nctk_def_dims(ncid, [ &
1601      nctkdim_t("nsppol", gwr%nsppol), nctkdim_t("ntau", gwr%ntau), nctkdim_t("nwr", gwr%nwr), &
1602      nctkdim_t("chi_mpw", gwr%tchi_mpw), nctkdim_t("nqibz", gwr%nqibz), nctkdim_t("nqbz", gwr%nqbz), &
1603      nctkdim_t("nkcalc", gwr%nkcalc), nctkdim_t("max_nbcalc", gwr%max_nbcalc), &
1604      nctkdim_t("smat_bsize1", smat_bsize1), nctkdim_t("smat_bsize2", smat_bsize2) &
1605      ], defmode=.True.)
1606    NCF_CHECK(ncerr)
1607 
1608    ncerr = nctk_def_iscalars(ncid, [character(len=nctk_slen) :: &
1609      "sig_diago", "b1gw", "b2gw", "symsigma", "symchi", "scf_iteration" &
1610    ])
1611    NCF_CHECK(ncerr)
1612 
1613    ncerr = nctk_def_dpscalars(ncid, [character(len=nctk_slen) :: &
1614      "wr_step", "ecuteps", "ecut", "ecutsigx", "gwr_boxcutmin", &
1615      "min_transition_energy_eV", "max_transition_energy_eV", "eratio", &
1616      "ft_max_err_t2w_cos", "ft_max_err_w2t_cos", "ft_max_err_t2w_sin", "cosft_duality_error", "regterm" &
1617    ])
1618    NCF_CHECK(ncerr)
1619 
1620    ! Define arrays with results.
1621    ncerr = nctk_def_arrays(ncid, [ &
1622      nctkarr_t("gwr_task", "char", "character_string_length"), &
1623      nctkarr_t("tau_mesh", "dp", "ntau"), &
1624      nctkarr_t("tau_wgs", "dp", "ntau"), &
1625      nctkarr_t("iw_mesh", "dp", "ntau"), &
1626      nctkarr_t("iw_wgs", "dp", "ntau"), &
1627      nctkarr_t("cosft_wt", "dp", "ntau, ntau"), &
1628      nctkarr_t("cosft_tw", "dp", "ntau, ntau"), &
1629      nctkarr_t("sinft_wt", "dp", "ntau, ntau"), &
1630      !nctkarr_t("ngqpt", "int", "three"), &
1631      nctkarr_t("bstart_ks", "int", "nkcalc, nsppol"), &
1632      nctkarr_t("bstop_ks", "int", "nkcalc, nsppol"), &
1633      nctkarr_t("kcalc", "dp", "three, nkcalc"), &
1634      nctkarr_t("kcalc2ibz", "int", "nkcalc, six") &
1635    ])
1636    NCF_CHECK(ncerr)
1637 
1638    ! ======================================================
1639    ! Write data that do not depend on the (kpt, spin) loop.
1640    ! ======================================================
1641    NCF_CHECK(nctk_set_datamode(ncid))
1642    ncerr = nctk_write_iscalars(ncid, [character(len=nctk_slen) :: &
1643      "sig_diago", "b1gw", "b2gw", "symsigma", "symchi", "scf_iteration"], &
1644      [merge(1, 0, gwr%sig_diago), gwr%b1gw, gwr%b2gw, gwr%dtset%symsigma, dtset%symchi, gwr%scf_iteration])
1645    NCF_CHECK(ncerr)
1646 
1647    ncerr = nctk_write_dpscalars(ncid, [character(len=nctk_slen) :: &
1648      "wr_step", "ecuteps", "ecut", "ecutsigx", "gwr_boxcutmin", &
1649      "min_transition_energy_eV", "max_transition_energy_eV", "eratio", &
1650      "ft_max_err_t2w_cos", "ft_max_err_w2t_cos", "ft_max_err_t2w_sin", "cosft_duality_error", "regterm"], &
1651      [gwr%wr_step, dtset%ecuteps, dtset%ecut, dtset%ecutsigx, dtset%gwr_boxcutmin, &
1652       gwr%te_min, gwr%te_max, gwr%te_max / gwr%te_min, &
1653       gwr%ft_max_error(1), gwr%ft_max_error(2), gwr%ft_max_error(3), gwr%cosft_duality_error, regterm &
1654      ])
1655    NCF_CHECK(ncerr)
1656 
1657    NCF_CHECK(nf90_put_var(ncid, vid("gwr_task"), trim(dtset%gwr_task)))
1658    NCF_CHECK(nf90_put_var(ncid, vid("tau_mesh"), gwr%tau_mesh))
1659    NCF_CHECK(nf90_put_var(ncid, vid("tau_wgs"), gwr%tau_wgs))
1660    NCF_CHECK(nf90_put_var(ncid, vid("iw_mesh"), gwr%iw_mesh))
1661    NCF_CHECK(nf90_put_var(ncid, vid("iw_wgs"), gwr%iw_wgs))
1662    NCF_CHECK(nf90_put_var(ncid, vid("cosft_wt"), gwr%cosft_wt))
1663    NCF_CHECK(nf90_put_var(ncid, vid("cosft_tw"), gwr%cosft_tw))
1664    NCF_CHECK(nf90_put_var(ncid, vid("sinft_wt"), gwr%sinft_wt))
1665    NCF_CHECK(nf90_put_var(ncid, vid("bstart_ks"), gwr%bstart_ks))
1666    NCF_CHECK(nf90_put_var(ncid, vid("bstop_ks"), gwr%bstop_ks))
1667    NCF_CHECK(nf90_put_var(ncid, vid("kcalc"), gwr%kcalc))
1668    NCF_CHECK(nf90_put_var(ncid, vid("kcalc2ibz"), gwr%kcalc2ibz))
1669    NCF_CHECK(nf90_close(ncid))
1670  end if ! master
1671 
1672  call cwtime_report(" gwr_init:", cpu, wall, gflops)
1673  call timab(1920, 2, tsec)
1674 
1675 contains
1676 integer function vid(vname)
1677   character(len=*),intent(in) :: vname
1678   vid = nctk_idname(ncid, vname)
1679 end function vid
1680 
1681 end subroutine gwr_init

m_gwr/gwr_load_kcalc_wfd [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 gwr_load_kcalc_wfd

FUNCTION

  Load the KS states to compute Sigma_nk from the WFK file

INPUTS

OUTPUT

SOURCE

2034 subroutine gwr_load_kcalc_wfd(gwr, wfk_path, tmp_kstab)
2035 
2036 !Arguments ------------------------------------
2037  class(gwr_t),intent(inout) :: gwr
2038  character(len=*),intent(in) :: wfk_path
2039  integer,allocatable,intent(out) :: tmp_kstab(:,:,:)
2040 
2041 !Local variables-------------------------------
2042 !scalars
2043  integer :: mband, nkibz, nsppol, spin, ik_ibz, ikcalc
2044  real(dp) :: cpu, wall, gflops
2045  !character(len=5000) :: msg
2046  type(ebands_t) :: ks_ebands
2047  type(hdr_type) :: wfk_hdr
2048 !arrays
2049  integer,allocatable :: nband(:,:), wfd_istwfk(:)
2050  logical,allocatable :: bks_mask(:,:,:), keep_ur(:,:,:)
2051 
2052 ! *************************************************************************
2053 
2054  call cwtime(cpu, wall, gflops, "start")
2055 
2056  associate (wfd => gwr%kcalc_wfd, dtset => gwr%dtset)
2057 
2058  ks_ebands = wfk_read_ebands(wfk_path, gwr%comm%value, out_hdr=wfk_hdr)
2059  call wfk_hdr%vs_dtset(dtset)
2060 
2061  ! TODO: Add more consistency checks e.g. nkibz,...
2062  !cryst = wfk_hdr%get_crystal()
2063  !call cryst%print(header="crystal structure from WFK file")
2064 
2065  nkibz = ks_ebands%nkpt; nsppol = ks_ebands%nsppol
2066 
2067  ! Don't take mband from ks_ebands but compute it from gwr%bstop_ks
2068  mband = maxval(gwr%bstop_ks) !; mband = ks_ebands%mband
2069 
2070  ! Initialize the wave function descriptor.
2071  ! Only wavefunctions for the symmetrical imagine of the k wavevectors
2072  ! treated by this MPI rank are stored.
2073  ABI_MALLOC(nband, (nkibz, nsppol))
2074  ABI_MALLOC(bks_mask, (mband, nkibz, nsppol))
2075  ABI_MALLOC(keep_ur, (mband, nkibz, nsppol))
2076  nband = mband; bks_mask = .False.; keep_ur = .False.
2077 
2078  ABI_ICALLOC(tmp_kstab, (2, nkibz, nsppol))
2079 
2080  do spin=1,gwr%nsppol
2081    do ikcalc=1,gwr%nkcalc ! TODO: Should be spin dependent!
2082      ik_ibz = gwr%kcalc2ibz(ikcalc, 1)
2083      associate (b1 => gwr%bstart_ks(ikcalc, spin), b2 => gwr%bstop_ks(ikcalc, spin))
2084      tmp_kstab(:, ik_ibz, spin) = [b1, b2]
2085      bks_mask(b1:b2, ik_ibz, spin) = .True.
2086      end associate
2087    end do
2088  end do
2089 
2090  ! Impose istwfk = 1 for all k-points.
2091  ! wfd_read_wfk will handle a possible conversion if the WFK contains istwfk /= 1.
2092  ABI_MALLOC(wfd_istwfk, (nkibz))
2093  wfd_istwfk = 1
2094 
2095  call wfd_init(wfd, gwr%cryst, gwr%pawtab, gwr%psps, keep_ur, mband, nband, nkibz, dtset%nsppol, bks_mask, &
2096    dtset%nspden, dtset%nspinor, dtset%ecut, dtset%ecutsm, dtset%dilatmx, wfd_istwfk, ks_ebands%kptns, gwr%g_ngfft, &
2097    dtset%nloalg, dtset%prtvol, dtset%pawprtvol, gwr%comm%value)
2098 
2099  call wfd%print(header="Wavefunctions for GWR calculation")
2100 
2101  ABI_FREE(nband)
2102  ABI_FREE(keep_ur)
2103  ABI_FREE(wfd_istwfk)
2104  ABI_FREE(bks_mask)
2105 
2106  call ebands_free(ks_ebands)
2107  call wfk_hdr%free()
2108 
2109  ! Read KS wavefunctions.
2110  call wfd%read_wfk(wfk_path, iomode_from_fname(wfk_path))
2111  end associate
2112 
2113  call cwtime_report(" gwr_load_kcalc_from_wfk:", cpu, wall, gflops)
2114  !call gwr%pstat%print([std_out], reload=.True.)
2115 
2116 end subroutine gwr_load_kcalc_wfd

m_gwr/gwr_malloc_free_mats [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

 gwr_malloc_free_mats

FUNCTION

 Allocate/Free PBLAS matrices according to `what` for the set of k/q-points selected by `mask_ibz`.

SOURCE

1793 subroutine gwr_malloc_free_mats(gwr, mask_ibz, what, action)
1794 
1795 !Arguments ------------------------------------
1796  class(gwr_t), target, intent(inout) :: gwr
1797  integer,intent(in) :: mask_ibz(:)
1798  character(len=*),intent(in) :: what, action
1799 
1800 !Local variables-------------------------------
1801  integer :: my_is, my_it, ipm, npwsp, col_bsize, itau, spin, ik_ibz, iq_ibz
1802  !integer :: ii, num_pm, ipm_list__(2)
1803  type(__slkmat_t), pointer :: mat
1804  character(len=500) :: msg
1805 
1806 ! *************************************************************************
1807 
1808  ABI_CHECK(string_in(action, "malloc, free"), sjoin("Invalid action:", action))
1809 
1810  !num_pm = 2; ipm_list__ = [1, 2]
1811  !if (present(ipm_list)) then
1812  !  num_pm = size(ipm_list)
1813  !  ABI_CHECK_IRANGE(num_pm, 1, 2, "num_pm not in [1, 2]")
1814  !  ipm_list__(1:num_pm) = ipm_list(:)
1815  !end if
1816 
1817  do my_is=1,gwr%my_nspins
1818    spin = gwr%my_spins(my_is)
1819    do my_it=1,gwr%my_ntau
1820      itau = gwr%my_itaus(my_it)
1821      ! All the PBLAS matrices are MPI distributed over g' in blocks
1822 
1823      select case (what)
1824      case ("green")
1825        ! ========================
1826        ! Allocate/free G_k(g,g')
1827        ! ========================
1828        ABI_CHECK_IEQ(size(mask_ibz), gwr%nkibz, "wrong mask size")
1829 
1830        do ik_ibz=1,gwr%nkibz
1831          if (mask_ibz(ik_ibz) == 0) cycle
1832          npwsp = gwr%green_desc_kibz(ik_ibz)%npw * gwr%nspinor
1833          ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
1834          associate (gt => gwr%gt_kibz(:, ik_ibz, itau, spin))
1835          do ipm=1,2
1836            if (action == "malloc") call gt(ipm)%init(npwsp, npwsp, gwr%g_slkproc, istwfk1, size_blocs=[-1, col_bsize])
1837            if (action == "free") call gt(ipm)%free()
1838          end do
1839          end associate
1840        end do
1841 
1842      case ("tchi", "wc")
1843        ! ===========================
1844        ! Allocate/free tchi_q(g,g')
1845        ! ===========================
1846        ABI_CHECK_IEQ(size(mask_ibz), gwr%nqibz, "wrong mask size")
1847 
1848        do iq_ibz=1,gwr%nqibz
1849          if (mask_ibz(iq_ibz) == 0) cycle
1850          npwsp = gwr%tchi_desc_qibz(iq_ibz)%npw * gwr%nspinor
1851          ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
1852          if (what == "tchi") mat => gwr%tchi_qibz(iq_ibz, itau, spin)
1853          if (what == "wc") mat => gwr%wc_qibz(iq_ibz, itau, spin)
1854          if (action == "malloc") call mat%init(npwsp, npwsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
1855          if (action == "free") call mat%free()
1856        end do
1857 
1858     case ("sigma")
1859        ! ================================
1860        ! Allocate/free sigmac_kibz(g,g')
1861        ! ================================
1862        ABI_CHECK_IEQ(size(mask_ibz), gwr%nkibz, "wrong mask size")
1863        do ik_ibz=1,gwr%nkibz
1864          if (mask_ibz(ik_ibz) == 0) cycle
1865          npwsp = gwr%tchi_desc_qibz(iq_ibz)%npw * gwr%nspinor
1866          ABI_CHECK(block_dist_1d(npwsp, gwr%g_comm%nproc, col_bsize, msg), msg)
1867          associate (sigc => gwr%sigc_kibz(:, ik_ibz, itau, spin))
1868          do ipm=1,2
1869            if (action == "malloc") call sigc(ipm)%init(npwsp, npwsp, gwr%g_slkproc, 1, size_blocs=[-1, col_bsize])
1870            if (action == "free") call sigc(ipm)%free()
1871          end do
1872          end associate
1873        end do
1874 
1875      case default
1876        ABI_ERROR(sjoin("Invalid what:", what))
1877      end select
1878 
1879    end do ! my_it
1880  end do ! my_is
1881 
1882  call wrtout(std_out, "")
1883  call gwr%print_mem(unit=std_out)
1884 
1885 end subroutine gwr_malloc_free_mats

m_gwr/gwr_ncwrite_tchi_wc [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_ncwrite_tchi_wc

FUNCTION

  Write tchi or wc to netcdf file

INPUTS

OUTPUT

SOURCE

6582 subroutine gwr_ncwrite_tchi_wc(gwr, what, filepath)
6583 
6584 !Arguments ------------------------------------
6585  class(gwr_t),target,intent(in) :: gwr
6586  character(len=*),intent(in) :: what, filepath
6587 
6588 !Local variables-------------------------------
6589 !scalars
6590  integer,parameter :: master = 0
6591  integer :: my_is, my_iqi, my_it, spin, iq_ibz, itau, npwtot_q, my_ncols, my_gcol_start, ncid, ncerr !, ierr
6592  real(dp) :: cpu, wall, gflops
6593 !arrays
6594  real(dp), ABI_CONTIGUOUS pointer :: fptr(:,:,:)
6595  type(__slkmat_t), pointer :: mats(:)
6596 
6597 ! *************************************************************************
6598 
6599  ! Cannot reuse SCR.nc/SUSC.nc fileformat as:
6600  !  - hscr_new requires ep%
6601  !  - old file formats assume Gamma-centered g vectors.
6602 
6603  call cwtime(cpu, wall, gflops, "start")
6604 
6605  if (gwr%comm%me == master) then
6606    call wrtout(std_out, sjoin(" Writing", what, "to:", filepath))
6607    NCF_CHECK(nctk_open_create(ncid, filepath, xmpi_comm_self))
6608    NCF_CHECK(gwr%cryst%ncwrite(ncid))
6609 
6610    ! Add dimensions.
6611    ncerr = nctk_def_dims(ncid, [ &
6612      nctkdim_t("nsppol", gwr%nsppol), nctkdim_t("ntau", gwr%ntau), nctkdim_t("mpw", gwr%tchi_mpw), &
6613      nctkdim_t("nqibz", gwr%nqibz), nctkdim_t("nqbz", gwr%nqbz)], &
6614      defmode=.True.)
6615    NCF_CHECK(ncerr)
6616 
6617    ! Define arrays with results.
6618    ! TODO: Add metadata for mats: spin sum, vc cutoff, t/w mesh, handle nspinor 2
6619    ncerr = nctk_def_arrays(ncid, [ &
6620      nctkarr_t("ngkpt", "int", "three"), &
6621      nctkarr_t("ngqpt", "int", "three"), &
6622      nctkarr_t("qibz", "dp", "three, nqibz"), &
6623      nctkarr_t("wtq", "dp", "nqibz"), &
6624      nctkarr_t("tau_mesh", "dp", "ntau"), &
6625      nctkarr_t("tau_wgs", "dp", "ntau"), &
6626      nctkarr_t("iw_mesh", "dp", "ntau"), &
6627      nctkarr_t("iw_wgs", "dp", "ntau"), &
6628      nctkarr_t("gvecs", "int", "three, mpw, nqibz"), &
6629      nctkarr_t("chinpw_qibz", "int", "nqibz"), &
6630      nctkarr_t("mats", "dp", "two, mpw, mpw, ntau, nqibz, nsppol") &
6631    ])
6632    NCF_CHECK(ncerr)
6633 
6634    ! Write global arrays.
6635    NCF_CHECK(nctk_set_datamode(ncid))
6636    NCF_CHECK(nf90_put_var(ncid, vid("ngkpt"), gwr%ngkpt))
6637    NCF_CHECK(nf90_put_var(ncid, vid("ngqpt"), gwr%ngqpt))
6638    NCF_CHECK(nf90_put_var(ncid, vid("qibz"), gwr%qibz))
6639    NCF_CHECK(nf90_put_var(ncid, vid("wtq"), gwr%wtq))
6640    NCF_CHECK(nf90_put_var(ncid, vid("tau_mesh"), gwr%tau_mesh))
6641    NCF_CHECK(nf90_put_var(ncid, vid("tau_wgs"), gwr%tau_wgs))
6642    NCF_CHECK(nf90_put_var(ncid, vid("iw_mesh"), gwr%iw_mesh))
6643    NCF_CHECK(nf90_put_var(ncid, vid("iw_wgs"), gwr%iw_wgs))
6644    NCF_CHECK(nf90_put_var(ncid, vid("chinpw_qibz"), gwr%chinpw_qibz))
6645    NCF_CHECK(nf90_close(ncid))
6646  end if
6647 
6648  call xmpi_barrier(gwr%comm%value)
6649 
6650  ! Reopen the file in gwr%comm.
6651  NCF_CHECK(nctk_open_modify(ncid, filepath, gwr%comm%value))
6652 
6653  do my_is=1,gwr%my_nspins
6654    spin = gwr%my_spins(my_is)
6655    do my_iqi=1,gwr%my_nqibz
6656      iq_ibz = gwr%my_qibz_inds(my_iqi)
6657 
6658      ! The same q-point in the IBZ might be stored on different pools.
6659      ! To avoid writing the same array multiple times, we use itreat_qibz
6660      ! to select the procs inside gwr%kpt_comm who are gonna write this iq_ibz q-point.
6661      if (.not. gwr%itreat_iqibz(iq_ibz)) cycle
6662 
6663      associate (desc_q => gwr%tchi_desc_qibz(iq_ibz))
6664      npwtot_q = desc_q%npw
6665 
6666      if (spin == 1 .and. gwr%gtau_comm%me == 0) then
6667        ! Write all G-vectors for this q
6668        NCF_CHECK(nf90_put_var(ncid, vid("gvecs"), desc_q%gvec, start=[1,1,iq_ibz], count=[3,npwtot_q,1]))
6669      end if
6670 
6671      mats => null()
6672      if (what == "tchi") mats => gwr%tchi_qibz(iq_ibz, :, spin)
6673      if (what == "wc")   mats => gwr%wc_qibz(iq_ibz, :, spin)
6674      ABI_CHECK(associated(mats), sjoin("Invalid value for what:", what))
6675 
6676      do my_it=1,gwr%my_ntau
6677        itau = gwr%my_itaus(my_it)
6678 
6679        ! FIXME: Assuming PBLAS matrix distributed in contiguous blocks along the column index.
6680        ! This part must be changed if we use round robin distribution.
6681        my_ncols = mats(itau)%sizeb_local(2)
6682        my_gcol_start = mats(itau)%loc2gcol(1)
6683 
6684        ! FIXME: This is wrong if spc
6685        !call c_f_pointer(c_loc(mats(itau)%buffer_cplx), fptr, shape=[2, npwtot_q, my_ncols])
6686        ABI_MALLOC(fptr, (2, npwtot_q, my_ncols))
6687        fptr(1,:,:) = dble(mats(itau)%buffer_cplx)
6688        fptr(2,:,:) = aimag(mats(itau)%buffer_cplx)
6689 
6690        ncerr = nf90_put_var(ncid, vid("mats"), fptr, &
6691                             start=[1, 1, my_gcol_start, itau, iq_ibz, spin], &
6692                             count=[2, npwtot_q, my_ncols, 1, 1, 1])
6693                             !stride=[1, gwr%g_comm%nproc, 1, 1, 1])
6694        ABI_FREE(fptr)
6695        NCF_CHECK(ncerr)
6696      end do
6697      end associate
6698    end do ! my_iqi
6699  end do ! my_is
6700 
6701  NCF_CHECK(nf90_close(ncid))
6702  call cwtime_report(" gwr_ncwrite_tchi_wc:", cpu, wall, gflops)
6703 
6704 contains
6705 integer function vid(vname)
6706   character(len=*),intent(in) :: vname
6707   vid = nctk_idname(ncid, vname)
6708 end function vid
6709 
6710 end subroutine gwr_ncwrite_tchi_wc

m_gwr/gwr_print [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_print

FUNCTION

  Print info on the gwr object.

INPUTS

SOURCE

3913 subroutine gwr_print(gwr, units, header)
3914 
3915 !Arguments ------------------------------------
3916  class(gwr_t),intent(in) :: gwr
3917  integer,intent(in) :: units(:)
3918  character(len=*),optional,intent(in) :: header
3919 
3920 !Local variables-------------------------------
3921  integer :: ii
3922  character(len=500) :: msg
3923  type(yamldoc_t) :: ydoc
3924 
3925 ! *********************************************************************
3926 
3927  msg = ' ==== Info on the gwr_t object ==== '; if (present(header)) msg=' ==== '//trim(adjustl(header))//' ==== '
3928  call wrtout(units, msg)
3929 
3930  ydoc = yamldoc_open('GWR_params') !, width=11, real_fmt='(3f8.3)')
3931  call ydoc%add_string("gwr_task", gwr%dtset%gwr_task)
3932  call ydoc%add_int("nband", gwr%dtset%nband(1))
3933  call ydoc%add_int("ntau", gwr%ntau)
3934  call ydoc%add_int1d("ngkpt", gwr%ngkpt)
3935  call ydoc%add_int1d("ngqpt", gwr%ngqpt)
3936  msg = "supercell"; if (.not. gwr%use_supercell_for_tchi) msg = "BZ-convolutions"
3937  call ydoc%add_string("chi_algo", msg)
3938  msg = "supercell"; if (.not. gwr%use_supercell_for_sigma) msg = "BZ-convolutions"
3939  call ydoc%add_string("sigma_algo", msg)
3940  call ydoc%add_int("nkibz", gwr%nkibz)
3941  call ydoc%add_int("nqibz", gwr%nqibz)
3942  call ydoc%add_int("inclvkb", gwr%dtset%inclvkb)
3943  call ydoc%add_real1d("q0", gwr%q0)  ! "for long-wavelenght limit"))
3944  call ydoc%add_int("gw_icutcoul", gwr%dtset%gw_icutcoul)
3945  call ydoc%add_int("green_mpw", gwr%green_mpw)
3946  call ydoc%add_int("tchi_mpw", gwr%tchi_mpw)
3947  call ydoc%add_int1d("g_ngfft", gwr%g_ngfft(1:6))
3948  call ydoc%add_real("gwr_boxcutmin", gwr%dtset%gwr_boxcutmin)
3949  call ydoc%add_int1d("P gwr_np_kgts", gwr%dtset%gwr_np_kgts)
3950  call ydoc%add_int1d("P np_kibz", gwr%np_kibz)
3951  call ydoc%add_int1d("P np_qibz", gwr%np_qibz)
3952  ! Print Max error due to the inhomogeneous FT.
3953  call ydoc%add_real("min_transition_energy_eV", gwr%te_min)
3954  call ydoc%add_real("max_transition_energy_eV", gwr%te_max)
3955  call ydoc%add_real("eratio", gwr%te_max / gwr%te_min)
3956  call ydoc%add_real("ft_max_err_t2w_cos", gwr%ft_max_error(1))
3957  call ydoc%add_real("ft_max_err_w2t_cos", gwr%ft_max_error(2))
3958  call ydoc%add_real("ft_max_err_t2w_sin", gwr%ft_max_error(3))
3959  call ydoc%add_real("cosft_duality_error", gwr%cosft_duality_error)
3960  ! Print imaginary time/frequency mesh with weights.
3961  call ydoc%open_tabular("Minimax imaginary tau/omega mesh", comment="tau, weight(tau), omega, weight(omega)")
3962  do ii=1,gwr%ntau
3963    write(msg, "(i0, 4(es12.5,2x))")ii, gwr%tau_mesh(ii), gwr%tau_wgs(ii), gwr%iw_mesh(ii), gwr%iw_wgs(ii)
3964    call ydoc%add_tabular_line(msg)
3965  end do
3966 
3967  call ydoc%write_units_and_free(units)
3968 
3969 end subroutine gwr_print

m_gwr/gwr_print_mem [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_print_mem

FUNCTION

  Print memory allocated for matrices.

SOURCE

3981 subroutine gwr_print_mem(gwr, unit)
3982 
3983 !Arguments ------------------------------------
3984  class(gwr_t),intent(in) :: gwr
3985  integer,optional,intent(in) :: unit
3986 
3987 !Local variables-------------------------------
3988 !scalars
3989  integer :: unt
3990  real(dp) :: mem_mb
3991  !character(len=500) :: msg
3992 
3993 ! *********************************************************************
3994 
3995  unt = std_out; if (present(unit)) unt =unit
3996 
3997  if (allocated(gwr%gt_kibz)) then
3998    mem_mb = sum(slk_array_locmem_mb(gwr%gt_kibz))
3999    if (mem_mb > zero) then
4000      call wrtout(std_out, sjoin("- Local memory for G(g,g',kibz,itau): ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4001    end if
4002  end if
4003  if (allocated(gwr%tchi_qibz)) then
4004    mem_mb = sum(slk_array_locmem_mb(gwr%tchi_qibz))
4005    if (mem_mb > zero) then
4006      call wrtout(std_out, sjoin("- Local memory for Chi(g,g',qibz,itau): ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4007    end if
4008  end if
4009  if (allocated(gwr%wc_qibz)) then
4010    mem_mb = sum(slk_array_locmem_mb(gwr%wc_qibz))
4011    if (mem_mb > zero) then
4012      call wrtout(std_out, sjoin("- Local memory for Wc(g,g,qibz,itau): ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4013    end if
4014  end if
4015  if (allocated(gwr%sigc_kibz)) then
4016    mem_mb = sum(slk_array_locmem_mb(gwr%sigc_kibz))
4017    if (mem_mb > zero) then
4018      call wrtout(std_out, sjoin("- Local memory for Sigma_c(g,g',kibz,itau): ", ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4019    end if
4020  end if
4021  if (allocated(gwr%ugb)) then
4022    mem_mb = sum(slk_array_locmem_mb(gwr%ugb))
4023    if (mem_mb > zero) then
4024      call wrtout(std_out, sjoin('- Local memory for u_gb wavefunctions: ', ftoa(mem_mb, fmt="f8.1"), ' [Mb] <<< MEM'))
4025    end if
4026  end if
4027  call wrtout(std_out, " ")
4028 
4029 end subroutine gwr_print_mem

m_gwr/gwr_print_trace [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_print_trace

FUNCTION

  Print traces of PBLAS matrices to std_out and ab_out.
  NB: This is a global routine that should be called by all procs inside gwr%comm.

INPUTS

OUTPUT

SOURCE

4780 subroutine gwr_print_trace(gwr, what)
4781 
4782 !Arguments ------------------------------------
4783  class(gwr_t),target,intent(inout) :: gwr
4784  character(len=*),intent(in) :: what
4785 
4786 !Local variables-------------------------------
4787  integer,parameter :: master = 0
4788  integer :: my_is, spin, my_it, itau, iq_ibz, ierr, my_iqi, my_iki, ik_ibz, ipm
4789  character(len=5000) :: comment
4790  integer :: units(2)
4791  complex(dp),allocatable :: ctrace3(:,:,:), ctrace4(:,:,:,:)
4792  type(__slkmat_t),contiguous, pointer :: mats(:,:,:)
4793 
4794 ! *************************************************************************
4795 
4796  ! NB: The same q/k point in the IBZ might be available on different procs in kpt_comm
4797  ! hence we have to rescale the trace before summing the results in gwr%comm.
4798  comment = "Invalid space!"; units = [std_out, ab_out]
4799 
4800  select case (what)
4801  case ("tchi_qibz", "wc_qibz")
4802    ! Trace of tchi or Wc
4803    ABI_CALLOC(ctrace3, (gwr%nqibz, gwr%ntau, gwr%nsppol))
4804 
4805    if (what == "tchi_qibz") then
4806      mats => gwr%tchi_qibz
4807      if (gwr%tchi_space == "iomega") comment = " (iq_ibz, iomega) table"
4808      if (gwr%tchi_space == "itau") comment = " (iq_ibz, itau) table"
4809    else if (what == "wc_qibz") then
4810      mats => gwr%wc_qibz
4811      if (gwr%wc_space == "iomega") comment = " (iq_ibz, iomega) table"
4812      if (gwr%wc_space == "itau") comment = " (iq_ibz, itau) table"
4813    end if
4814 
4815    do my_is=1,gwr%my_nspins
4816      spin = gwr%my_spins(my_is)
4817      do my_it=1,gwr%my_ntau
4818        itau = gwr%my_itaus(my_it)
4819        do my_iqi=1,gwr%my_nqibz
4820          iq_ibz = gwr%my_qibz_inds(my_iqi)
4821          ctrace3(iq_ibz, itau, spin) = mats(iq_ibz, itau, spin)%get_trace() / gwr%np_qibz(iq_ibz)
4822        end do
4823      end do
4824    end do
4825 
4826    call xmpi_sum_master(ctrace3, 0, gwr%kts_comm%value, ierr)
4827 
4828    if (gwr%comm%me == master) then
4829      do spin=1,gwr%nsppol
4830        call wrtout(units, sjoin(" Trace of:", what, "for spin:", itoa(spin), "for testing purposes:"))
4831        call wrtout(units, comment, pre_newlines=2)
4832        call print_arr(ctrace3(:,:,spin), unit=ab_out)
4833        call print_arr(ctrace3(:,:,spin), unit=std_out)
4834      end do
4835    end if
4836    ABI_FREE(ctrace3)
4837 
4838  case ("gt_kibz")
4839    ! Trace of Green's functions.
4840    ABI_CALLOC(ctrace4, (gwr%nkibz, gwr%ntau, 2, gwr%nsppol))
4841 
4842    do my_is=1,gwr%my_nspins
4843      spin = gwr%my_spins(my_is)
4844      do my_it=1,gwr%my_ntau
4845        itau = gwr%my_itaus(my_it)
4846        do my_iki=1,gwr%my_nkibz
4847          ik_ibz = gwr%my_kibz_inds(my_iki)
4848          do ipm=1,2
4849            ctrace4(ik_ibz, itau, ipm, spin) = gwr%gt_kibz(ipm, ik_ibz, itau, spin)%get_trace() / gwr%np_kibz(ik_ibz)
4850          end do
4851        end do
4852      end do
4853    end do
4854    comment = " (ik_ibz, itau) table"
4855 
4856    call xmpi_sum_master(ctrace4, master, gwr%kts_comm%value, ierr)
4857 
4858    if (gwr%comm%me == master) then
4859      do spin=1,gwr%nsppol
4860        do ipm=1,2
4861          call wrtout(units, sjoin(" Trace of:", what, "for ipm:", itoa(ipm), ", spin:", itoa(spin), "for testing purposes:"))
4862          call wrtout(units, comment, newlines=1)
4863          call print_arr(ctrace4(:,:, ipm, spin), unit=ab_out)
4864          call print_arr(ctrace4(:,:, ipm, spin), unit=std_out)
4865        end do
4866      end do
4867    end if
4868    ABI_FREE(ctrace4)
4869 
4870  case default
4871    ABI_ERROR(sjoin("Invalid value of what:", what))
4872  end select
4873 
4874 end subroutine gwr_print_trace

m_gwr/gwr_read_ugb_from_wfk [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_read_ugb_from_wfk

FUNCTION

  Read wavefunctions from the WFK file wfk_path and store them in gwr%ugb (MPI distributed).

SOURCE

2130 subroutine gwr_read_ugb_from_wfk(gwr, wfk_path)
2131 
2132 !Arguments ------------------------------------
2133  class(gwr_t),target,intent(inout) :: gwr
2134  character(len=*),intent(in) :: wfk_path
2135 
2136 !Local variables-------------------------------
2137 !scalars
2138  integer,parameter :: formeig0 = 0, master = 0
2139  integer :: mband, min_nband, nkibz, nsppol, my_is, my_iki, spin, ik_ibz, ierr, io_algo, bcast_comm, color
2140  integer :: npw_k, mpw, istwf_k, il_b, ib, band, iloc !, itau
2141  integer :: nbsum, npwsp, bstart, bstop, band_step, nb !my_nband ! nband_k,
2142  logical :: print_time
2143  real(dp) :: cpu, wall, gflops, cpu_green, wall_green, gflops_green
2144  character(len=5000) :: msg
2145  logical :: have_band, need_block_ks, io_in_kcomm
2146  type(ebands_t) :: wfk_ebands
2147  type(wfk_t) :: wfk
2148  type(dataset_type),pointer :: dtset
2149  type(xcomm_t), pointer :: io_comm
2150 !arrays
2151  integer,allocatable :: kg_k(:,:)
2152  logical,allocatable :: bmask(:)
2153  real(dp) :: kk_ibz(3), tsec(2)
2154  real(dp),target,allocatable :: cg_work(:,:,:)
2155  real(dp),ABI_CONTIGUOUS pointer :: cg_k(:,:)
2156 
2157 ! *************************************************************************
2158 
2159  call cwtime(cpu, wall, gflops, "start")
2160  call timab(1921, 1, tsec)
2161 
2162  dtset => gwr%dtset
2163  wfk_ebands = wfk_read_ebands(wfk_path, gwr%comm%value, out_hdr=gwr%wfk_hdr)
2164  call gwr%wfk_hdr%vs_dtset(dtset)
2165 
2166  ! TODO: Add more consistency checks e.g. nkibz,...
2167  !cryst = gwr%wfk_hdr%get_crystal()
2168  !call cryst%print(header="crystal structure from WFK file")
2169 
2170  nkibz = wfk_ebands%nkpt; nsppol = wfk_ebands%nsppol; mband = wfk_ebands%mband
2171  min_nband = minval(wfk_ebands%nband)
2172 
2173  nbsum = dtset%nband(1)
2174  if (nbsum > min_nband) then
2175    ABI_WARNING(sjoin("WFK file contains", itoa(min_nband), "states while you're asking for:", itoa(nbsum)))
2176    nbsum = min_nband
2177  end if
2178  call ebands_free(wfk_ebands)
2179 
2180  ! ==============================================
2181  ! Build Green's functions in g-space for given k
2182  ! ==============================================
2183 
2184  ! for tau > 0:
2185  !
2186  !      G_k(r,r',itau) = i \sum_b^{occ} psi_b(r) \psi_b^*(r') exp(e_b tau)
2187  !
2188  ! for tau < 0:
2189  !
2190  !      G_k(r,r',itau) = -i \sum_b^{empty} psi_b(r) \psi_b^*(r') exp(e_b tau)
2191  !
2192  ! NB: G_k is constructed for k in the IBZ, then we rotate the k-point to obtain G_k in the BZ.
2193  !
2194  ! TODO:
2195  !     1) Make sure that gvec in gwr and wfd agree with each other.
2196  !     2) May implement trick used in gwst to add empty states approximated with LC of PWs.
2197 
2198  ! Select occupied or empty G.
2199  ! if (eig_nk < -tol6) then
2200  !   !ipm = 1
2201  !   !gt_cfact = j_dpc * exp(gwr%tau_mesh(itau) * eig_nk)
2202  !   ! Vasp convention
2203  !   ipm = 2
2204  !   gt_cfact = exp(gwr%tau_mesh(itau) * eig_nk)
2205  ! else if (eig_nk > tol6) then
2206  !   !ipm = 2
2207  !   !gt_cfact = -j_dpc * exp(-gwr%tau_mesh(itau) * eig_nk)
2208  !   ! Vasp convention
2209  !   ipm = 1
2210  !   gt_cfact = -exp(-gwr%tau_mesh(itau) * eig_nk)
2211  ! else
2212  !   ABI_WARNING("Metallic system of semiconductor with Fermi level inside bands!!!!")
2213  ! end if
2214 
2215  call wrtout(std_out, sjoin(" Reading KS states with nbsum:", itoa(nbsum), "..."), do_flush=.True.)
2216 
2217  ! Init set of (npwsp, nbsum) PBLAS matrix distributed within the g_comm communicator.
2218  ! and distribute it over bands so that each proc reads a subset of bands in read_band_block
2219  ! Note size_blocs below that corresponds to a round-robin distribution along the band axis.
2220 
2221  ABI_MALLOC(gwr%ugb, (gwr%nkibz, gwr%nsppol))
2222  gwr%ugb_nband = nbsum
2223 
2224  do my_is=1,gwr%my_nspins
2225    spin = gwr%my_spins(my_is)
2226    do my_iki=1,gwr%my_nkibz
2227      ik_ibz = gwr%my_kibz_inds(my_iki)
2228      npw_k = gwr%green_desc_kibz(ik_ibz)%npw; npwsp = npw_k * gwr%nspinor
2229      call gwr%ugb(ik_ibz, spin)%init(npwsp, gwr%ugb_nband, gwr%g_slkproc, istwfk1, size_blocs=[-1, 1])
2230    end do
2231  end do
2232  call gwr%print_mem(unit=std_out)
2233 
2234  mpw = maxval(gwr%wfk_hdr%npwarr)
2235  ABI_MALLOC(kg_k, (3, mpw))
2236  ABI_MALLOC(bmask, (mband))
2237 
2238  io_algo = 2
2239 
2240  if (io_algo == 1) then
2241    ! This version is very bad on LUMI
2242    call wrtout(std_out, " Using collective MPI-IO with wfk%read_bmask ...")
2243    call wfk_open_read(wfk, wfk_path, formeig0, iomode_from_fname(wfk_path), get_unit(), gwr%gtau_comm%value)
2244    !call wfk_open_read(wfk, wfk_path, formeig0, iomode_from_fname(wfk_path), get_unit(), gwr%gt_comm%value)
2245 
2246    do my_is=1,gwr%my_nspins
2247      spin = gwr%my_spins(my_is)
2248      do my_iki=1,gwr%my_nkibz
2249        print_time = gwr%comm%me == 0 .and. (my_iki <= LOG_MODK .or. mod(my_iki, LOG_MODK) == 0)
2250        if (print_time) call cwtime(cpu_green, wall_green, gflops_green, "start")
2251        ik_ibz = gwr%my_kibz_inds(my_iki); kk_ibz = gwr%kibz(:, ik_ibz)
2252        npw_k = gwr%wfk_hdr%npwarr(ik_ibz); istwf_k = gwr%wfk_hdr%istwfk(ik_ibz)
2253        npwsp = npw_k * gwr%nspinor
2254        ! TODO
2255        ABI_CHECK_IEQ(istwf_k, 1, "istwfk_k should be 1")
2256 
2257        associate (ugb => gwr%ugb(ik_ibz, spin), desc_k => gwr%green_desc_kibz(ik_ibz))
2258        ABI_CHECK_IEQ(npw_k, desc_k%npw, "npw_k != desc_k%npw")
2259 
2260        ! use round-robin distribution inside gtau_comm% for IO.
2261        ! TODO: Optimize wfk_read_bmask and/or read WFK with all procs and master broadcasting.
2262        bmask = .False.
2263        do il_b=1, ugb%sizeb_local(2)
2264          band = ugb%loc2gcol(il_b); bmask(band) = .True.
2265        end do
2266        ! FIXME: This is wrong if spc
2267        call c_f_pointer(c_loc(ugb%buffer_cplx), cg_k, shape=[2, npwsp * ugb%sizeb_local(2)])
2268        call wfk%read_bmask(bmask, ik_ibz, spin, &
2269                            !xmpio_single, &
2270                            xmpio_collective, &
2271                            kg_k=kg_k, cg_k=cg_k)
2272 
2273        ABI_CHECK(all(kg_k(:,1:npw_k) == desc_k%gvec), "kg_k != desc_k%gvec")
2274 
2275        if (print_time) then
2276          write(msg,'(4x,3(a,i0),a)')"Read ugb_k: my_iki [", my_iki, "/", gwr%my_nkibz, "] (tot: ", gwr%nkibz, ")"
2277          call cwtime_report(msg, cpu_green, wall_green, gflops_green); if (my_iki == LOG_MODK) call wrtout(std_out, " ...")
2278        end if
2279        end associate
2280      end do ! my_iki
2281    end do ! my_is
2282 
2283    call wfk%close()
2284 
2285  else
2286    ! Master reads and broadcasts. Much faster on lumi
2287    call wrtout(std_out, " Using IO version based on master reads and brodcasts ...")
2288    io_comm => gwr%comm; io_in_kcomm = .False.
2289    io_comm => gwr%kpt_comm; io_in_kcomm = .True.
2290 
2291    if (io_comm%me == master) then
2292      call wfk_open_read(wfk, wfk_path, formeig0, iomode_from_fname(wfk_path), get_unit(), xmpi_comm_self)
2293    end if
2294 
2295    ! TODO This to be able to maximize the size of cg_work
2296    !call gwr%pstat%mpi_max(vmrss_mb, gwr%comm%value)
2297 
2298    do spin=1,gwr%nsppol
2299      if (io_in_kcomm .and. .not. any(gwr%my_spins == spin)) cycle
2300 
2301      do ik_ibz=1,gwr%nkibz
2302        print_time = gwr%comm%me == 0 .and. (ik_ibz < LOG_MODK .or. mod(ik_ibz, LOG_MODK) == 0)
2303        if (print_time) call cwtime(cpu_green, wall_green, gflops_green, "start")
2304        kk_ibz = gwr%kibz(:, ik_ibz)
2305        npw_k = gwr%wfk_hdr%npwarr(ik_ibz); istwf_k = gwr%wfk_hdr%istwfk(ik_ibz); npwsp = npw_k * gwr%nspinor
2306        ABI_CHECK_IEQ(istwf_k, 1, "istwfk_k should be 1")
2307 
2308        ! Create communicator with master and all procs requiring this (k,s) block (color == 1)
2309        need_block_ks = any(gwr%my_spins == spin) .and. any(gwr%my_kibz_inds == ik_ibz)
2310        color = merge(1, 0, (need_block_ks .or. io_comm%me == master))
2311        call xmpi_comm_split(io_comm%value, color, io_comm%me, bcast_comm, ierr)
2312 
2313        ! TODO: Optimize this part
2314        ! Find band_step that gives good compromise between memory and efficiency.
2315        band_step = memb_limited_step(1, nbsum, 2*npwsp, xmpi_bsize_dp, 1024.0_dp)
2316        band_step = 200
2317        !band_step = 100
2318        do bstart=1, nbsum, band_step
2319          bstop = min(bstart + band_step - 1, nbsum); nb = bstop - bstart + 1
2320 
2321          ABI_MALLOC(cg_work, (2, npwsp, nb)) ! This array is always dp
2322          if (io_comm%me == master) then
2323            call c_f_pointer(c_loc(cg_work), cg_k, shape=[2, npwsp * nb])
2324            call wfk%read_band_block([bstart, bstop], ik_ibz, spin, xmpio_single, kg_k=kg_k, cg_k=cg_k)
2325          end if
2326 
2327          if (color == 1) then
2328            call xmpi_bcast(kg_k, master, bcast_comm, ierr)
2329            call xmpi_bcast(cg_work, master, bcast_comm, ierr)
2330          endif
2331 
2332          ! Copy my portion of cg_work to buffer_cplx (here we have dp --> sp conversion).
2333          if (need_block_ks) then
2334            associate (ugb => gwr%ugb(ik_ibz, spin), desc_k => gwr%green_desc_kibz(ik_ibz))
2335            ABI_CHECK(all(kg_k(:,1:npw_k) == desc_k%gvec), "kg_k != desc_k%gvec")
2336            do band=bstart, bstop
2337              ib = band - bstart + 1
2338              call ugb%glob2loc(1, band, iloc, il_b, have_band); if (.not. have_band) cycle
2339              ugb%buffer_cplx(:, il_b) = cmplx(cg_work(1,:,ib), cg_work(2,:,ib), kind=gwpc)
2340            end do
2341            end associate
2342          end if
2343 
2344          ABI_FREE(cg_work)
2345        end do ! bstart
2346 
2347        call xmpi_comm_free(bcast_comm)
2348 
2349        if (print_time) then
2350          write(msg,'(4x,2(a,i0),a)')"Read ugb_k: ik_ibz [", ik_ibz, "/", gwr%nkibz, "]"
2351          call cwtime_report(msg, cpu_green, wall_green, gflops_green); if (ik_ibz == LOG_MODK) call wrtout(std_out, " ...")
2352        end if
2353      end do ! ik_ibz
2354    end do ! spin
2355    if (io_comm%me == master) call wfk%close()
2356  end if ! io_algo
2357 
2358  ABI_FREE(kg_k)
2359  ABI_FREE(bmask)
2360  call gwr%print_mem(unit=std_out)
2361 
2362  call cwtime_report(" gwr_read_ugb_from_wfk:", cpu, wall, gflops)
2363  call timab(1921, 2, tsec)
2364 
2365 end subroutine gwr_read_ugb_from_wfk

m_gwr/gwr_redistrib_gt_kibz [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_redistrib_gt_kibz

FUNCTION

  If action == "communicate":
      Redistribute G_k for fixed (itau, spin) according to `need_kibz` table.
      Also, set got_kibz to 1 for each IBZ k-point that has been received.
  If action == "free":
      Use input `got_kibz` array to deallocate matrices.

INPUTS

OUTPUT

SOURCE

4564 subroutine gwr_redistrib_gt_kibz(gwr, itau, spin, need_kibz, got_kibz, action)
4565 
4566 !Arguments ------------------------------------
4567  class(gwr_t),target,intent(inout) :: gwr
4568  integer,intent(in) :: itau, spin, need_kibz(gwr%nkibz)
4569  integer,intent(inout) :: got_kibz(gwr%nkibz)
4570  character(len=*),intent(in) :: action
4571  !integer,optional,intent(in) :: ipm_list(:)
4572 
4573 !Local variables-------------------------------
4574  integer :: ik_ibz, ipm, ierr, do_mpi_kibz(gwr%nkibz), sender_kibz(gwr%nkibz)
4575  integer :: bcast_comm, sender_in_bcast_comm, color
4576  logical :: im_sender
4577  !integer :: num_pm, ipm_list__(2)
4578  real(dp) :: kk_ibz(3), cpu, wall, gflops
4579  complex(gwpc),contiguous, pointer :: ck_ptr(:,:)
4580 
4581 ! *************************************************************************
4582 
4583  call cwtime(cpu, wall, gflops, "start")
4584  !num_pm = 2; ipm_list__ = [1, 2]
4585  !if (present(ipm_list)) then
4586  !  num_pm = size(ipm_list)
4587  !  ABI_CHECK_IRANGE(num_pm, 1, 2, "num_pm not in [1, 2]")
4588  !  ipm_list__(1:num_pm) = ipm_list(:)
4589  !end if
4590 
4591  select case (action)
4592  case ("communicate")
4593    do_mpi_kibz = need_kibz
4594    do ik_ibz=1,gwr%nkibz
4595      if (allocated(gwr%green_desc_kibz(ik_ibz)%gvec)) do_mpi_kibz(ik_ibz) = 0
4596    end do
4597    call xmpi_sum(do_mpi_kibz, gwr%kpt_comm%value, ierr)
4598    !do_mpi_kibz = 1
4599 
4600    ! All procs enter the loop. Sender_kibz stores the rank of the sender in gwr%kpt_comm
4601    got_kibz = 0; sender_kibz(:) = huge(1)
4602    do ik_ibz=1,gwr%nkibz
4603      if (do_mpi_kibz(ik_ibz) == 0) cycle
4604      kk_ibz = gwr%kibz(:, ik_ibz)
4605      if (allocated(gwr%green_desc_kibz(ik_ibz)%gvec)) sender_kibz(ik_ibz) = gwr%kpt_comm%me
4606      if (need_kibz(ik_ibz) /= 0 .and. .not. allocated(gwr%green_desc_kibz(ik_ibz)%gvec)) then
4607        ! NB: Use same args as those used to init the descriptors in gwr_init
4608        ! so that gvec ordering is consistent across MPI procs.
4609        got_kibz(ik_ibz) = 1
4610        call gwr%green_desc_kibz(ik_ibz)%init(kk_ibz, istwfk1, gwr%dtset%ecut, gwr)
4611      end if
4612    end do
4613 
4614    ! Define the sender for each kibz in do_mpi_kibz
4615    call xmpi_min_ip(sender_kibz, gwr%kpt_comm%value, ierr)
4616 
4617    ! Allocate memory
4618    call gwr%malloc_free_mats(got_kibz, "green", "malloc")
4619 
4620    ! MPI communication
4621    do ik_ibz=1,gwr%nkibz
4622      if (do_mpi_kibz(ik_ibz) == 0) cycle
4623 
4624      ! Create subcommunicators with color and bcast only inside subcomm.
4625      im_sender = gwr%kpt_comm%me == sender_kibz(ik_ibz)
4626      color = merge(1, 0, im_sender .or. need_kibz(ik_ibz) /= 0)
4627      call xmpi_comm_split(gwr%kpt_comm%value, color, gwr%kpt_comm%me, bcast_comm, ierr)
4628 
4629      if (color == 1) then
4630        sender_in_bcast_comm = xmpi_comm_translate_rank(gwr%kpt_comm%value, sender_kibz(ik_ibz), bcast_comm)
4631        do ipm=1,2
4632          ck_ptr => gwr%gt_kibz(ipm, ik_ibz, itau, spin)%buffer_cplx
4633          call xmpi_bcast(ck_ptr, sender_in_bcast_comm, bcast_comm, ierr)
4634        end do
4635      end if
4636      call xmpi_comm_free(bcast_comm)
4637    end do
4638 
4639  case ("free")
4640    ! Use got_kibz to free previously allocated memory.
4641    do ik_ibz=1,gwr%nkibz
4642      if (got_kibz(ik_ibz) == 1) call gwr%green_desc_kibz(ik_ibz)%free()
4643    end do
4644    call gwr%malloc_free_mats(got_kibz, "green", "free")
4645 
4646  case default
4647    ABI_ERROR(sjoin("Invalid action:", action))
4648  end select
4649 
4650  if (action == "communicate") call cwtime_report(" gwr_redistrib_gt_kibz:", cpu, wall, gflops)
4651 
4652 end subroutine gwr_redistrib_gt_kibz

m_gwr/gwr_redistrib_mats_qibz [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_redistrib_mats_qibz

FUNCTION

  If action == "communicate":
      Redistribute chi_q (wc_q) for fixed (itau, spin) according to `need_qibz` table.
      Also, set `got_qibz` to 1 for each IBZ q-point that has been received.
  If action == "free":
      Use `got_qibz` to deallocate matrices received in a previous call with "communicate".

INPUTS

OUTPUT

SOURCE

4672 subroutine gwr_redistrib_mats_qibz(gwr, what, itau, spin, need_qibz, got_qibz, action)
4673 
4674 !Arguments ------------------------------------
4675  class(gwr_t),target,intent(inout) :: gwr
4676  character(len=*),intent(in) :: what
4677  integer,intent(in) :: itau, spin, need_qibz(gwr%nqibz)
4678  integer,intent(inout) :: got_qibz(gwr%nqibz)
4679  character(len=*),intent(in) :: action
4680 
4681 !Local variables-------------------------------
4682  integer :: iq_ibz, ierr, bcast_comm, color, do_mpi_qibz(gwr%nqibz), sender_qibz(gwr%nqibz), sender_in_bcast_comm
4683  logical :: im_sender
4684  logical, parameter :: timeit = .False.
4685  real(dp) :: qq_ibz(3), cpu, wall, gflops
4686  complex(gwpc),contiguous, pointer :: cq_ptr(:,:)
4687 
4688 ! *************************************************************************
4689 
4690  ABI_CHECK(what == "tchi" .or. what == "wc", sjoin("Invalid what:", what))
4691  if (timeit) call cwtime(cpu, wall, gflops, "start")
4692 
4693  select case (action)
4694  case ("communicate")
4695    do_mpi_qibz = need_qibz
4696    do iq_ibz=1,gwr%nqibz
4697      select case (what)
4698      case ("tchi")
4699        if (allocated(gwr%tchi_qibz(iq_ibz, itau, spin)%buffer_cplx)) do_mpi_qibz(iq_ibz) = 0
4700      case ("wc")
4701        if (allocated(gwr%wc_qibz(iq_ibz, itau, spin)%buffer_cplx)) do_mpi_qibz(iq_ibz) = 0
4702      case default
4703        ABI_ERROR(sjoin("Invalid what:", what))
4704      end select
4705    end do
4706 
4707    call xmpi_sum(do_mpi_qibz, gwr%kpt_comm%value, ierr)
4708    !do_mpi_qibz = 1
4709 
4710    ! All procs enter the loop. Sender_qibz stores the rank of the sender in gwr%kpt_comm
4711    got_qibz = 0; sender_qibz(:) = huge(1)
4712    do iq_ibz=1,gwr%nqibz
4713      if (do_mpi_qibz(iq_ibz) == 0) cycle
4714      qq_ibz = gwr%qibz(:, iq_ibz)
4715      if (allocated(gwr%tchi_desc_qibz(iq_ibz)%gvec)) sender_qibz(iq_ibz) = gwr%kpt_comm%me
4716      if (need_qibz(iq_ibz) /= 0 .and. .not. allocated(gwr%tchi_desc_qibz(iq_ibz)%gvec)) then
4717        ! NB: Use same args as those used to init the descriptors in gwr_init
4718        ! so that gvec ordering is consistent across MPI procs.
4719        got_qibz(iq_ibz) = 1
4720        call gwr%tchi_desc_qibz(iq_ibz)%init(qq_ibz, istwfk1, gwr%dtset%ecuteps, gwr, kin_sorted=.True.)
4721      end if
4722    end do
4723 
4724    ! Define the sender for each qibz in do_mpi_qibz
4725    call xmpi_min_ip(sender_qibz, gwr%kpt_comm%value, ierr)
4726 
4727    ! Allocate memory
4728    call gwr%malloc_free_mats(got_qibz, what, "malloc")
4729 
4730    ! MPI communication
4731    do iq_ibz=1,gwr%nqibz
4732      if (do_mpi_qibz(iq_ibz) == 0) cycle
4733 
4734      ! Create subcommunicators with color and bcast only inside subcomm.
4735      im_sender = gwr%kpt_comm%me == sender_qibz(iq_ibz)
4736      color = merge(1, 0, im_sender .or. need_qibz(iq_ibz) /= 0)
4737      call xmpi_comm_split(gwr%kpt_comm%value, color, gwr%kpt_comm%me, bcast_comm, ierr)
4738 
4739      if (color == 1) then
4740        if (what == "tchi") cq_ptr => gwr%tchi_qibz(iq_ibz, itau, spin)%buffer_cplx
4741        if (what == "wc")   cq_ptr => gwr%wc_qibz(iq_ibz, itau, spin)%buffer_cplx
4742        sender_in_bcast_comm = xmpi_comm_translate_rank(gwr%kpt_comm%value, sender_qibz(iq_ibz), bcast_comm)
4743        call xmpi_bcast(cq_ptr, sender_in_bcast_comm, bcast_comm, ierr)
4744      end if
4745      call xmpi_comm_free(bcast_comm)
4746    end do ! iq_ibz
4747 
4748  case ("free")
4749    ! Use got_qibz table to free previously allocated memory
4750    do iq_ibz=1,gwr%nqibz
4751      if (got_qibz(iq_ibz) /= 0) call gwr%tchi_desc_qibz(iq_ibz)%free()
4752    end do
4753    call gwr%malloc_free_mats(got_qibz, what, "free")
4754 
4755  case default
4756    ABI_ERROR(sjoin("Invalid action:", action))
4757  end select
4758 
4759  if (timeit) call cwtime_report(" gwr_redistrib_mats_qibz:", cpu, wall, gflops)
4760 
4761 end subroutine gwr_redistrib_mats_qibz

m_gwr/gwr_rotate_gpm [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_rotate_gpm

FUNCTION

  Reconstruct the Green's functions in the BZ from the IBZ.

INPUTS

   ik_bz = Index of the k-point in the BZ
   itau = tau index (global index)
   spin = spin index (global index)
   [ipm_list]=Optional list of ipm indices to be considered, e.g. ipm_list=[2] to compute the -tau component.

OUTPUT

  desc_kbz = Descriptor in the BZ
  gt_pm(2) = Gk(+/-tau)

NOTES

  * Remember the symmetry properties of \tilde\espilon^{-1}

    If q_bz = S q_ibz + G0:

      $\epsilon^{-1}_{SG1-G0, SG2-G0}(q_bz) = e^{+iS(G2-G1).\tau} \epsilon^{-1}_{G1, G2)}(q)

    If time-reversal symmetry can be used then:

      $\epsilon^{-1}_{G1,G2}(-q_bz) = e^{+i(G1-G2).\tau} \epsilon^{-1}_{-S^{-1}(G1+Go), -S^{-1}(G2+G0)}^*(q)

  In the present implementation we are not considering a possible umklapp vector G0 in the
  expression Sq = q+G0. Treating this case would require some changes in the G-sphere
  since we have to consider G - G0. The code however stops in sigma if a nonzero G0 is required
  to reconstruct the BZ.

SOURCE

2749 subroutine gwr_rotate_gpm(gwr, ik_bz, itau, spin, desc_kbz, gt_pm, ipm_list)
2750 
2751 !Arguments ------------------------------------
2752  class(gwr_t),intent(in) :: gwr
2753  integer,intent(in) :: ik_bz, spin, itau
2754  type(desc_t),intent(out) :: desc_kbz
2755  type(__slkmat_t),intent(out) :: gt_pm(2)
2756  integer,optional,intent(in) :: ipm_list(:)
2757 
2758 !Local variables-------------------------------
2759 !scalars
2760  integer :: ig1, ig2, il_g1, il_g2, ipm, ik_ibz, isym_k, trev_k, g0_k(3), tsign_k, ii, num_pm, ipm_list__(2)
2761  logical :: isirr_k
2762 !arrays
2763  integer :: g1(3), g2(3)
2764  real(dp) :: tnon(3) !, cpu, wall, gflops
2765  complex(dp) :: ph2, ph1
2766 
2767 ! *************************************************************************
2768 
2769  !call cwtime(cpu, wall, gflops, "start")
2770  num_pm = 2; ipm_list__ = [1, 2]
2771  if (present(ipm_list)) then
2772    num_pm = size(ipm_list)
2773    ABI_CHECK_IRANGE(num_pm, 1, 2, "num_pm not in [1, 2]")
2774    ipm_list__(1:num_pm) = ipm_list(:)
2775  end if
2776 
2777  ik_ibz = gwr%kbz2ibz(1, ik_bz); isym_k = gwr%kbz2ibz(2, ik_bz)
2778  trev_k = gwr%kbz2ibz(6, ik_bz); g0_k = gwr%kbz2ibz(3:5, ik_bz)
2779  isirr_k = (isym_k == 1 .and. trev_k == 0 .and. all(g0_k == 0))
2780  tsign_k = merge(1, -1, trev_k == 0)
2781  !ABI_CHECK(all(g0_k == 0), sjoin("For kbz:", ktoa(gwr%kbz(:, ik_bz)), "g0_k:", ltoa(g0_k), " != 0"))
2782 
2783  ! Copy descriptor from IBZ
2784  associate (desc_kibz => gwr%green_desc_kibz(ik_ibz))
2785  call desc_kibz%copy(desc_kbz)
2786 
2787  if (isirr_k) then
2788    ! Copy the PBLAS matrices with the two Green's functions and we are done.
2789    do ii=1,num_pm
2790      ipm = ipm_list__(ii)
2791      call gwr%gt_kibz(ipm, ik_ibz, itau, spin)%copy(gt_pm(ipm))
2792    end do
2793    goto 10
2794  end if
2795 
2796  ! From:
2797  !
2798  !      u_{Sk}(Sg) = e^{-i(Sk+g).tnon} u_k(g)
2799  !
2800  ! and
2801  !
2802  !      u_{k+g0}(g-g0) = u_k(g)
2803  !
2804  ! one obtains:
2805  !
2806  !      G_{Sk+g0}(Sg-g0,Sg'-g0) = e^{-i tnon.S(g-g')} G_k{g,g'}
2807  !
2808  ! For time-reversal, we have u_{-k}(g) = u_{k}{-g}^*
2809  !
2810  !      G_{-k}(-g,-g') = [G_k(g,g')]*
2811 
2812  !ABI_WARNING_IF(trev_k == 0, "green: trev_k /= 0 should be tested")
2813 
2814  ! Rotate gvec, recompute gbound and rotate vc_sqrt
2815  ! TODO: 1) Handle TR and routine to rotate tchi/W including vc_sqrt
2816  !       2) Make sure that the FFT box is large enough to accommodate umklapps
2817 
2818  desc_kbz%ig0 = -1
2819  do ig1=1,desc_kbz%npw
2820    desc_kbz%gvec(:,ig1) = tsign_k * matmul(gwr%cryst%symrec(:,:,isym_k), desc_kibz%gvec(:,ig1)) - g0_k
2821    if (all(desc_kbz%gvec(:,ig1) == 0)) desc_kbz%ig0 = ig1
2822  end do
2823  desc_kbz%kin_sorted = .False.
2824  ABI_CHECK(desc_kbz%ig0 /= -1, "Cannot find g=0 after rotation!")
2825 
2826  call sphereboundary(desc_kbz%gbound, desc_kbz%istwfk, desc_kbz%gvec, gwr%g_mgfft, desc_kbz%npw)
2827 
2828  ! Get G_k with k in the BZ.
2829  tnon = gwr%cryst%tnons(:, isym_k)
2830  do ii=1,num_pm
2831    ipm = ipm_list__(ii)
2832    associate (gk_i => gwr%gt_kibz(ipm, ik_ibz, itau, spin), gk_f => gt_pm(ipm))
2833    call gk_i%copy(gk_f)
2834    !!$OMP PARALLEL DO PRIVATE(ig1, g2, ph2, ig1, g2, ph1)
2835    do il_g2=1, gk_f%sizeb_local(2)
2836      ig2 = mod(gk_f%loc2gcol(il_g2) - 1, desc_kbz%npw) + 1
2837      g2 = desc_kbz%gvec(:,ig2)
2838      !g2 = desc_kibz%gvec(:,ig2)
2839      ph2 = exp(+j_dpc * two_pi * dot_product(g2, tnon))
2840      do il_g1=1, gk_f%sizeb_local(1)
2841        ig1 = mod(gk_f%loc2grow(il_g1) - 1, desc_kbz%npw) + 1
2842        g1 = desc_kbz%gvec(:,ig1)
2843        !g1 = desc_kibz%gvec(:,ig1)
2844        ph1 = exp(-j_dpc * two_pi * dot_product(g1, tnon))
2845        gk_f%buffer_cplx(il_g1, il_g2) = gk_i%buffer_cplx(il_g1, il_g2) * ph1 * ph2
2846        if (trev_k == 1) gk_f%buffer_cplx(il_g1, il_g2) = conjg(gk_f%buffer_cplx(il_g1, il_g2))
2847      end do
2848    end do
2849    end associate
2850  end do
2851  end associate
2852 
2853 10 continue
2854  !call cwtime_report(" gwr_rotate_gpm:", cpu, wall, gflops)
2855 
2856 end subroutine gwr_rotate_gpm

m_gwr/gwr_rotate_wc [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_rotate_wc

FUNCTION

  Reconstruct Wc(q,g,g') in the BZ from the IBZ.

INPUTS

OUTPUT

SOURCE

3219 subroutine gwr_rotate_wc(gwr, iq_bz, itau, spin, desc_qbz, wc_qbz)
3220 
3221 !Arguments ------------------------------------
3222  class(gwr_t),intent(inout) :: gwr
3223  integer,intent(in) :: iq_bz, itau, spin
3224  type(desc_t),intent(out) :: desc_qbz
3225  type(__slkmat_t),intent(inout) :: wc_qbz
3226 
3227 !Local variables-------------------------------
3228 !scalars
3229  integer :: ig1, ig2, il_g1, il_g2, iq_ibz, isym_q, trev_q, tsign_q
3230  logical :: isirr_q, q_is_gamma
3231 !arrays
3232  integer :: g1(3), g2(3), g0_q(3)
3233  real(dp) :: tnon(3), qq_bz(3)
3234  complex(dp) :: ph2, ph1
3235 
3236 ! *************************************************************************
3237 
3238  ABI_CHECK(gwr%wc_space == "itau", sjoin("wc_space:", gwr%wc_space, " != itau"))
3239 
3240  qq_bz = gwr%qbz(:, iq_bz)
3241  q_is_gamma = normv(qq_bz, gwr%cryst%gmet, "G") < GW_TOLQ0
3242 
3243  iq_ibz = gwr%qbz2ibz(1, iq_bz); isym_q = gwr%qbz2ibz(2, iq_bz)
3244  trev_q = gwr%qbz2ibz(6, iq_bz); g0_q = gwr%qbz2ibz(3:5, iq_bz)
3245  isirr_q = (isym_q == 1 .and. trev_q == 0 .and. all(g0_q == 0))
3246  tsign_q = merge(1, -1, trev_q == 0)
3247  ! TODO: Understand why legacy GW does not need umklapp
3248  !ABI_CHECK(all(g0_q == 0), sjoin("For qbz:", ktoa(gwr%qbz(:, iq_bz)), "g0_q:", ltoa(g0_q), " != 0"))
3249 
3250  ! Copy descriptor from IBZ
3251  associate (desc_qibz => gwr%tchi_desc_qibz(iq_ibz))
3252  call desc_qibz%copy(desc_qbz)
3253 
3254  if (isirr_q) then
3255    ! Copy the PBLAS matrix in wc_qbz and we are done.
3256    call gwr%wc_qibz(iq_ibz, itau, spin)%copy(wc_qbz); return
3257  end if
3258 
3259  !ABI_WARNING_IF(trev_q == 0, "trev_q should be tested")
3260  ! rotate gvec, recompute gbound and rotate vc_sqrt.
3261  ! TODO: 1) Handle TR and routine to rotate tchi/W including vc_sqrt
3262  !       2) Make sure that FFT box is large enough to accomodate umklapps
3263  desc_qbz%ig0 = -1
3264  do ig1=1,desc_qbz%npw
3265    desc_qbz%gvec(:,ig1) = tsign_q * matmul(gwr%cryst%symrec(:,:,isym_q), desc_qibz%gvec(:,ig1)) - g0_q
3266    if (all(desc_qbz%gvec(:,ig1) == 0)) desc_qbz%ig0 = ig1
3267  end do
3268  desc_qbz%kin_sorted = .False.
3269  ABI_CHECK(desc_qbz%ig0 /= -1, "Cannot find g = 0 after g-vector rotation!")
3270 
3271  call sphereboundary(desc_qbz%gbound, desc_qbz%istwfk, desc_qbz%gvec, gwr%g_mgfft, desc_qbz%npw)
3272 
3273  ! Compute sqrt(vc(q,G))
3274  ! TODO: rotate vc_sqrt
3275  ! vc(Sq, Sg) = vc(q, g)
3276  ! vc(-q, -g) = vc(q, g)
3277  call desc_qbz%get_vc_sqrt(qq_bz, q_is_gamma, gwr, gwr%gtau_comm%value)
3278 
3279  ! Get Wc_q with q in the BZ.
3280  tnon = gwr%cryst%tnons(:, isym_q)
3281  associate (wq_i => gwr%wc_qibz(iq_ibz, itau, spin), wq_f => wc_qbz)
3282  call wq_i%copy(wc_qbz)
3283 
3284  !!!$OMP PARALLEL DO PRIVATE(ig2, g2, phs2, ig1, g2, ph1)
3285  do il_g2=1, wq_f%sizeb_local(2)
3286    ig2 = mod(wq_f%loc2gcol(il_g2) - 1, desc_qbz%npw) + 1
3287    g2 = desc_qbz%gvec(:,ig2)
3288    !g2 = desc_qibz%gvec(:,ig2)
3289    !ph2 = exp(-j_dpc * two_pi * dot_product(g2, tnon))
3290    ph2 = exp(+j_dpc * two_pi * dot_product(g2, tnon))
3291    do il_g1=1, wq_f%sizeb_local(1)
3292      ig1 = mod(wq_f%loc2grow(il_g1) - 1, desc_qbz%npw) + 1
3293      g1 = desc_qbz%gvec(:,ig1)
3294      !g1 = desc_qibz%gvec(:,ig1)
3295      !ph1 = exp(+j_dpc * two_pi * dot_product(g1, tnon))
3296      ph1 = exp(-j_dpc * two_pi * dot_product(g1, tnon))
3297      wq_f%buffer_cplx(il_g1, il_g2) = wq_i%buffer_cplx(il_g1, il_g2) * ph1 * ph2
3298      if (trev_q == 1) wq_f%buffer_cplx(il_g1, il_g2) = conjg(wq_f%buffer_cplx(il_g1, il_g2))
3299    end do
3300  end do
3301  end associate
3302  end associate
3303 
3304 end subroutine gwr_rotate_wc

m_gwr/gwr_rpa_energy [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_rpa_energy

FUNCTION

  Compute the correlated part of the total energy within ACFDT.

INPUTS

OUTPUT

SOURCE

6100 subroutine gwr_rpa_energy(gwr)
6101 
6102 !Arguments ------------------------------------
6103  class(gwr_t),target,intent(inout) :: gwr
6104 
6105 !Local variables-------------------------------
6106 !scalars
6107  integer,parameter :: master = 0
6108  integer :: my_is, my_iqi, my_it, itau, spin, iq_ibz, ii, ierr, ig, ncut, icut, mat_size
6109  integer :: il_g1, il_g2, ig1, ig2, npw_q, ig0, ncid, ncerr
6110  logical :: q_is_gamma, print_time
6111  real(dp) :: weight, qq_ibz(3), estep, aa, bb, rmsq, ecut_soft, damp, tsec(2)
6112  real(dp) :: cpu_all, wall_all, gflops_all, cpu_q, wall_q, gflops_q, cpu_cut, wall_cut, gflops_cut
6113  complex(dpc) :: vcs_g1, vcs_g2
6114  type(desc_t),pointer :: desc_q
6115  character(len=500) :: msg
6116 !arrays
6117  type(__slkmat_t) :: chi_tmp, dummy_vec, chi_4diag
6118  type(processor_scalapack) :: proc_4diag
6119  real(gwp),allocatable :: eig(:)
6120  real(dp),allocatable :: kin_qg(:), ec_rpa(:), ec_mp2(:), ecut_chi(:)
6121 
6122 ! *************************************************************************
6123 
6124  call gwr%build_chi0_head_and_wings()
6125  call gwr%build_green(free_ugb=.True.)
6126  call gwr%build_tchi()
6127 
6128  ! Compute RPA energy for ncut cutoff energies in order to extrapolate for ecuteps --> oo
6129  ! See also calc_rpa_functional in m_screening_driver
6130  ncut = max(1, gwr%dtset%gwr_rpa_ncut)  ! Usually 5
6131  estep = -gwr%dtset%ecuteps * 0.05_dp
6132 
6133  call cwtime(cpu_all, wall_all, gflops_all, "start")
6134  call timab(1928, 1, tsec)
6135  call wrtout(std_out, sjoin(" Begin computation of RPA energy with gwr_rpa_ncut:", itoa(ncut), " ..."))
6136  ABI_CHECK(gwr%tchi_space == "iomega", sjoin("tchi_space:", gwr%tchi_space, "!= iomega"))
6137 
6138  ABI_CALLOC(ec_rpa, (ncut))
6139  ABI_CALLOC(ec_mp2, (ncut))
6140  ABI_MALLOC(ecut_chi, (ncut))
6141  ecut_chi = arth(gwr%dtset%ecuteps + tol12, estep, ncut)
6142 
6143  ! Polarizability has been summed over spins inside build_tchi.
6144  ! The loop over spins is needed to parallelize the loop over my_iqi if nsppol == 2.
6145  do my_is=1,gwr%my_nspins
6146    spin = gwr%my_spins(my_is)
6147    if (gwr%spin_comm%nproc == 1 .and. spin == 2) cycle
6148 
6149    do my_iqi=1,gwr%my_nqibz
6150      if (gwr%spin_comm%skip(my_iqi)) cycle
6151      print_time = gwr%comm%me == 0 .and. (my_iqi < LOG_MODK .or. mod(my_iqi, LOG_MODK) == 0)
6152      if (print_time) call cwtime(cpu_q, wall_q, gflops_q, "start")
6153 
6154      iq_ibz = gwr%my_qibz_inds(my_iqi); qq_ibz = gwr%qibz(:, iq_ibz)
6155      q_is_gamma = normv(qq_ibz, gwr%cryst%gmet, "G") < GW_TOLQ0
6156      !if (q_is_gamma) then
6157      !  call wrtout([std_out, ab_out], "RPA: Ignoring q==0"); cycle
6158      !end if
6159 
6160      ! iq_ibz might be replicated inside gwr%kpt_comm.
6161      if (.not. gwr%itreat_iqibz(iq_ibz)) cycle
6162 
6163      desc_q => gwr%tchi_desc_qibz(iq_ibz)
6164      ABI_CHECK(desc_q%kin_sorted, "g-vectors are not sorted by |q+g|^2/2 !")
6165      npw_q = desc_q%npw; ig0 = desc_q%ig0
6166 
6167      ABI_MALLOC(kin_qg, (npw_q))
6168      do ig=1,npw_q
6169        kin_qg(ig) = half * normv(qq_ibz + desc_q%gvec(:,ig), gwr%cryst%gmet, "G") ** 2
6170      end do
6171 
6172      do my_it=1,gwr%my_ntau
6173        itau = gwr%my_itaus(my_it)
6174        associate (tchi => gwr%tchi_qibz(iq_ibz, itau, spin))
6175        if (my_it == 1) then
6176          ! Allocate workspace. NB: npw_q is the total number of PWs for this q.
6177          call tchi%copy(chi_tmp)
6178          !ABI_CHECK_IEQ(npw_q, tchi%sizeb_global(1), "npw_q")
6179          ABI_MALLOC(eig, (npw_q))
6180        end if
6181 
6182        do icut=1,ncut
6183          call cwtime(cpu_cut, wall_cut, gflops_cut, "start")
6184 
6185          ! Damp Coulomb kernel in order to have smooth E(V).
6186          ! See also https://www.vasp.at/wiki/index.php/ENCUTGWSOFT
6187          ! and Harl's PhD thesis available at: https://utheses.univie.ac.at/detail/2259
6188          ecut_soft = 0.8_dp * ecut_chi(icut)
6189 
6190          ! TODO: Contribution due to the head for q --> 0 is ignored.
6191          ! This is not optimal but consistent with calc_rpa_functional
6192          do il_g2=1,tchi%sizeb_local(2)
6193            !ig2 = mod(tchi%loc2gcol(il_g2) - 1, desc_q%npw) + 1
6194            ig2 = tchi%loc2gcol(il_g2)
6195            damp = one
6196            !if (kin_qg(ig2) > ecut_soft) then
6197            !  damp = sqrt(half * (one + cos(pi * (kin_qg(ig2) - ecut_soft) / (ecut_chi(icut) - ecut_soft))))
6198            !end if
6199            vcs_g2 = desc_q%vc_sqrt(ig2) * damp
6200            if (q_is_gamma .and. ig2 == ig0) vcs_g2 = zero
6201 
6202            do il_g1=1,tchi%sizeb_local(1)
6203              !ig1 = mod(tchi%loc2grow(il_g1) - 1, desc_q%npw) + 1
6204              ig1 = tchi%loc2grow(il_g1)
6205              damp = one
6206              !if (kin_qg(ig1) > ecut_soft) then
6207              !  damp = sqrt(half * (one + cos(pi * (kin_qg(ig1) - ecut_soft) / (ecut_chi(icut) - ecut_soft))))
6208              !end if
6209              vcs_g1 = desc_q%vc_sqrt(ig1) * damp
6210              if (q_is_gamma .and. ig1 == ig0) vcs_g1 = zero
6211 
6212              chi_tmp%buffer_cplx(il_g1, il_g2) = tchi%buffer_cplx(il_g1, il_g2) * vcs_g1 * vcs_g2
6213            end do
6214          end do
6215 
6216          ! Diagonalize sub-matrix and perform integration in imaginary frequency.
6217          ! Eq (6) in 10.1103/PhysRevB.81.115126
6218          ! NB: have to build chi_tmp inside loop over icut as matrix is destroyed by pzheev.
6219          mat_size = bisect(kin_qg, ecut_chi(icut))
6220 
6221          ! Change size block and, if possible, use 2D rectangular grid of processors for diagonalization
6222          call proc_4diag%init(chi_tmp%processor%comm)
6223          call chi_tmp%change_size_blocs(chi_4diag, processor=proc_4diag)
6224          call chi_4diag%heev("N", "U", dummy_vec, eig, mat_size=mat_size)
6225          call chi_4diag%free()
6226          call proc_4diag%free()
6227 
6228          ! TODO: ELPA
6229          !call compute_eigen_problem(processor, matrix, results, eigen, comm, istwf_k, nev)
6230 
6231          if (xmpi_comm_rank(chi_tmp%processor%comm) == 0) then
6232            weight = gwr%wtq(iq_ibz) * gwr%iw_wgs(itau) / two_pi
6233            do ii=1,mat_size
6234              ec_rpa(icut) = ec_rpa(icut) + weight * (log(one - eig(ii)) + eig(ii))
6235              ! second order Moeller Plesset.
6236              ec_mp2(icut) = ec_mp2(icut) - weight * eig(ii) ** 2 / two
6237              !if (eig(ii) > zero) then
6238              !  write(msg, "(a, es16.8)")"Positive eigenvalue:", eig(ii)
6239              !  ABI_ERROR(msg)
6240              !end if
6241            end do
6242          end if
6243 
6244          write(msg,'(4x,2(a,i0),a)')"icut [", icut, "/", ncut, "]"
6245          call cwtime_report(msg, cpu_cut, wall_cut, gflops_cut)
6246        end do ! icut
6247 
6248        if (my_it == gwr%my_ntau) then
6249          ! Free workspace
6250          call chi_tmp%free()
6251          ABI_FREE(eig)
6252        end if
6253        end associate
6254      end do ! my_it
6255 
6256      ABI_FREE(kin_qg)
6257      if (print_time) then
6258        write(msg,'(4x,2(a,i0),a)')"My iqi [", my_iqi, "/", gwr%my_nqibz, "]"
6259        call cwtime_report(msg, cpu_q, wall_q, gflops_q)
6260      end if
6261    end do ! my_iqi
6262  end do ! my_is
6263 
6264  ! Collect results on the master node.
6265  call xmpi_sum_master(ec_rpa, master, gwr%comm%value, ierr)
6266  call xmpi_sum_master(ec_mp2, master, gwr%comm%value, ierr)
6267 
6268  if (gwr%comm%me == master) then
6269    ! Print results to ab_out.
6270    ! TODO: Add metadata: nband, nqbz...
6271    write(ab_out, "(4a16)")"ecut_chi", "ecut_chi^(-3/2)", "RPA Ec (eV)", "RPA Ec (Ha)"
6272    do icut=ncut,1,-1
6273      write(ab_out, "(*(es16.8))") ecut_chi(icut), ecut_chi(icut) ** (-three/two), ec_rpa(icut) * Ha_eV, ec_rpa(icut)
6274    end do
6275    if (ncut > 1) then
6276      ! Add last line with extrapolated value.
6277      rmsq = linfit(ncut, ecut_chi(:) ** (-three/two), ec_rpa, aa, bb)
6278      write(ab_out, "(2a16,*(es16.8))") "oo", "0", bb * Ha_eV, bb
6279    end if
6280 
6281    ! ======================
6282    ! Add results to GWR.nc
6283    ! ======================
6284    NCF_CHECK(nctk_open_modify(ncid, gwr%gwrnc_path, xmpi_comm_self))
6285    ncerr = nctk_def_dims(ncid, [nctkdim_t("ncut", ncut)], defmode=.True.)
6286    NCF_CHECK(ncerr)
6287 
6288    ncerr = nctk_def_arrays(ncid, [ &
6289      nctkarr_t("ecut_chi", "dp", "ncut"), &
6290      nctkarr_t("ec_rpa_ecut", "dp", "ncut"), &
6291      nctkarr_t("ec_mp2_ecut", "dp", "ncut") &
6292    ])
6293    NCF_CHECK(ncerr)
6294 
6295    ! Write data.
6296    NCF_CHECK(nctk_set_datamode(ncid))
6297    NCF_CHECK(nf90_put_var(ncid, vid("ecut_chi"), ecut_chi))
6298    NCF_CHECK(nf90_put_var(ncid, vid("ec_rpa_ecut"), ec_rpa))
6299    NCF_CHECK(nf90_put_var(ncid, vid("ec_mp2_ecut"), ec_mp2))
6300  end if ! master
6301 
6302  ABI_FREE(ec_rpa)
6303  ABI_FREE(ec_mp2)
6304  ABI_FREE(ecut_chi)
6305 
6306  call cwtime_report(" gwr_rpa_energy:", cpu_all, wall_all, gflops_all)
6307  call timab(1928, 2, tsec)
6308 
6309 contains
6310 integer function vid(vname)
6311   character(len=*),intent(in) :: vname
6312   vid = nctk_idname(ncid, vname)
6313 end function vid
6314 
6315 end subroutine gwr_rpa_energy

m_gwr/gwr_rpr_to_ggp [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_rpr_to_ggp

FUNCTION

  Helper function to FFT a two-point function: F_{r',r} --> F_{g,g'}
  Note that rp_r is destroyed in output

INPUTS

OUTPUT

SOURCE

3155 subroutine gwr_rpr_to_ggp(gwr, desc, rp_r, g_gp)
3156 
3157 !Arguments ------------------------------------
3158  class(gwr_t),intent(in) :: gwr
3159  type(desc_t),intent(in) :: desc
3160  class(__slkmat_t),intent(inout) :: rp_r, g_gp
3161 
3162 !Local variables-------------------------------
3163  integer :: ig2, npwsp, nrsp, col_bsize, ir2, ndat, isign
3164  type(__slkmat_t) :: r_gp, gp_r
3165  character(len=500) :: msg
3166  type(uplan_t) :: uplan_k
3167 
3168 ! *************************************************************************
3169 
3170  ! Allocate intermediate gp_r PBLAS matrix to store F(g',r)
3171  npwsp = desc%npw * gwr%nspinor; nrsp = gwr%g_nfft * gwr%nspinor
3172  ABI_CHECK(block_dist_1d(nrsp, gwr%g_comm%nproc, col_bsize, msg), msg)
3173 
3174  call gp_r%init(npwsp, nrsp, gwr%g_slkproc, desc%istwfk, size_blocs=[-1, col_bsize])
3175 
3176  call uplan_k%init(desc%npw, gwr%nspinor, gwr%uc_batch_size, gwr%g_ngfft, desc%istwfk, &
3177                    desc%gvec, gwpc, gwr%dtset%gpu_option)
3178 
3179  isign = +1 ! This should be ok
3180  !isign = -1
3181 
3182  ! F(r',r) --> F(g',r) and store results in gp_r.
3183  do ir2=1, rp_r%sizeb_local(2), gwr%uc_batch_size
3184    ndat = blocked_loop(ir2, rp_r%sizeb_local(2), gwr%uc_batch_size)
3185    call uplan_k%execute_rg(ndat, rp_r%buffer_cplx(:,ir2), gp_r%buffer_cplx(:,ir2), isign=isign, iscale=0) ! this should be OK
3186  end do
3187 
3188  ! F(g',r) --> F(r,g')
3189  !call gp_r%ptrans("N", r_gp, free=.True.)
3190  ! FIXME: I don't know why by C is needed here.
3191  call gp_r%ptrans("C", r_gp, free=.True.)
3192 
3193  ! F(r,g') --> F(g,g') and store results in g_gp.
3194  do ig2=1, g_gp%sizeb_local(2), gwr%uc_batch_size
3195    ndat = blocked_loop(ig2, g_gp%sizeb_local(2), gwr%uc_batch_size)
3196    call uplan_k%execute_rg(ndat, r_gp%buffer_cplx(:,ig2), g_gp%buffer_cplx(:,ig2), isign=-isign, iscale=0) ! this should be OK
3197  end do
3198 
3199  call uplan_k%free(); call r_gp%free()
3200 
3201 end subroutine gwr_rpr_to_ggp

m_gwr/gwr_run_chi0 [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_run_chi0

FUNCTION

  Driver to compute CHI0 along the imaginary axis.

INPUTS

  [free_ugb]: True if array with empty KS states should freed as soon as possibile. Default: True

OUTPUT

SOURCE

6373 subroutine gwr_run_chi0(gwr, free_ugb)
6374 
6375 !Arguments ------------------------------------
6376  class(gwr_t),intent(inout) :: gwr
6377  logical,optional,intent(in) :: free_ugb
6378 
6379 !Local variables-------------------------------
6380  logical :: free_ugb__
6381 
6382 ! *************************************************************************
6383 
6384  ! Use ugb wavefunctions and the Lehmann representation to compute head/wings and Sigma_x matrix elements.
6385  call gwr%build_chi0_head_and_wings()
6386 
6387  ! Now compute G(itau) from ugb and start the GWR algorithm.
6388  free_ugb__ = .True.; if (present(free_ugb)) free_ugb__ = free_ugb
6389  call gwr%build_green(free_ugb=free_ugb__)
6390  call gwr%build_tchi()
6391 
6392 end subroutine gwr_run_chi0

m_gwr/gwr_run_energy_scf [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_run_energy_scf

FUNCTION

  Compute QP energies within energy-only self-consistent GW approximation
  and minimax meshes along the imaginary axis.

INPUTS

OUTPUT

SOURCE

6411 subroutine gwr_run_energy_scf(gwr)
6412 
6413 !Arguments ------------------------------------
6414  class(gwr_t),intent(inout) :: gwr
6415 
6416 !Local variables-------------------------------
6417  integer,parameter :: master = 0
6418  integer :: units(2)
6419  logical :: converged
6420  character(len=500) :: msg
6421 
6422 ! *************************************************************************
6423 
6424  ! TODO:
6425  ! To implement restart capabilities we need to read scf_iteration, qp_ebands and gwr_task from GWR.nc
6426  ! build_sigmac should be responsible for writing checkpoint data with qp_ebands at each iteration.
6427  units = [std_out, ab_out]
6428 
6429  ABI_CHECK_IEQ(gwr%nkcalc, gwr%nkibz, "For energy-only GW, one should include all k-points in the IBZ")
6430 
6431  select case (gwr%dtset%gwr_task)
6432  case ("EGEW")
6433    converged = .False.
6434    call wrtout(units, " Begin energy-only self-consistency in both G and W (EGEW)")
6435    do while (.not. converged .and. gwr%scf_iteration <= gwr%dtset%gwr_nstep)
6436      call gwr%run_g0w0(free_ugb=.False.)
6437      gwr%scf_iteration = gwr%scf_iteration + 1
6438      call gwr%check_scf_cycle(converged)
6439    end do
6440 
6441  case ("EGW0")
6442    call wrtout(units, " Begin energy-only self-consistency in G (EGW0)")
6443    call gwr%run_g0w0(free_ugb=.False.)
6444    converged = .False.
6445    do while (.not. converged .and. gwr%scf_iteration <= gwr%dtset%gwr_nstep)
6446      gwr%scf_iteration = gwr%scf_iteration + 1
6447      call gwr%build_green(free_ugb=.False.)
6448      call gwr%build_sigxme()  ! NB: This should not change in semiconductors
6449      call gwr%build_sigmac()
6450      call gwr%check_scf_cycle(converged)
6451    end do
6452 
6453  case ("G0EW")
6454    ! This is more difficult to implement as we need to store G0 and eG
6455    ! and then use G only for chi and not in Sigma
6456    call wrtout(units, " Begin energy-only self-consistency in W (G0EW)")
6457    ABI_ERROR("G0WE is not yet implemented")
6458    call gwr%run_g0w0(free_ugb=.False.)
6459    converged = .False.
6460    do while (.not. converged .and. gwr%scf_iteration <= gwr%dtset%gwr_nstep)
6461      gwr%scf_iteration = gwr%scf_iteration + 1
6462      !call gwr%build_green(free_ugb=.False.)
6463      call gwr%build_chi0_head_and_wings()
6464      call gwr%build_tchi()
6465      call gwr%build_wc()
6466      call gwr%build_sigmac()
6467      call gwr%check_scf_cycle(converged)
6468    end do
6469 
6470  case default
6471    ABI_ERROR(sjoin("Invalid gwr_task:", gwr%dtset%gwr_task))
6472  end select
6473 
6474  if (gwr%comm%me == master) then
6475    if (converged) then
6476      write(msg, "(1x,4a,i0,a,f8.3,a)") &
6477        trim(gwr%dtset%gwr_task), " self-consistent loop:", ch10, &
6478        " Convergence achieved at iteration: ", gwr%scf_iteration, &
6479        " with gwr_tolqpe: ",gwr%dtset%gwr_tolqpe * Ha_meV, " (meV)"
6480      call wrtout(units, msg)
6481    else
6482      write(msg, "(1x,4a,f8.3,3a,i0,a)") &
6483        trim(gwr%dtset%gwr_task), " self-consistent loop:", ch10, &
6484        " WARNING: Could not converge with gwr_tolqpe: ",gwr%dtset%gwr_tolqpe * Ha_meV, " (meV)", ch10, &
6485        " after: ", gwr%dtset%gwr_nstep, " steps"
6486      call wrtout(units, msg)
6487    end if
6488  end if
6489 
6490 end subroutine gwr_run_energy_scf

m_gwr/gwr_run_g0w0 [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_run_g0w0

FUNCTION

  Driver to compute QP energies within the G0W0 approximation and minimax meshes along the imaginary axis.

INPUTS

  [free_ugb]: True if array with empty KS states should freed as soon as possibile. Default: True

OUTPUT

SOURCE

6334 subroutine gwr_run_g0w0(gwr, free_ugb)
6335 
6336 !Arguments ------------------------------------
6337  class(gwr_t),intent(inout) :: gwr
6338  logical,optional,intent(in) :: free_ugb
6339 
6340 !Local variables-------------------------------
6341  logical :: free_ugb__
6342 
6343 ! *************************************************************************
6344 
6345  ! Use ugb wavefunctions and the Lehmann representation to compute head/wings and Sigma_x matrix elements.
6346  call gwr%build_chi0_head_and_wings()
6347  call gwr%build_sigxme()
6348 
6349  ! Now compute G(itau) from ugb and start the GWR algorithm.
6350  free_ugb__ = .True.; if (present(free_ugb)) free_ugb__ = free_ugb
6351  call gwr%build_green(free_ugb=free_ugb__)
6352  call gwr%build_tchi()
6353  call gwr%build_wc()
6354  call gwr%build_sigmac()
6355 
6356 end subroutine gwr_run_g0w0

m_gwr/gwr_t [ Types ]

[ Top ] [ m_gwr ] [ Types ]

NAME

 gwr_t

FUNCTION

  This object provides the high-level API used to perform the different steps of the GWR algorithm.

SOURCE

301  type, public :: gwr_t
302 
303   integer :: nsppol = 1, nspinor = -1, nsig_ab = -1, nspden = -1
304   ! Number of independent spin polarizations, number of spinor components and spin densities.
305 
306   integer :: natom = -1
307    ! Number of atoms
308 
309   integer :: usepaw = -1
310    ! 0 if NC pseudos. 1 if PAW is used (not yet supported).
311 
312   integer :: my_nspins = -1
313   ! Number of independent spin polarizations treated by this MPI proc
314 
315   integer :: nkbz = -1, nkibz = -1
316   ! Number of k-points in the BZ/IBZ
317 
318   integer :: my_nkibz = -1, my_nkbz = -1
319   ! Number of k-points in the IBZ/BZ stored by this MPI proc.
320 
321   integer :: uc_batch_size = -1
322   ! Max number of unit cell FFT-transforms done in batch mode.
323 
324   integer :: sc_batch_size = -1
325   ! Max number of supercell-cell FFT-transforms done in batch mode.
326 
327   integer,allocatable :: my_kbz_inds(:)
328   ! (my_nkbz)
329   ! List of k-BZ indices treated by this proc.
330 
331   integer,allocatable :: my_kibz_inds(:)
332   ! (my_nkibz)
333   ! List of k-IBZ indices treated by this proc.
334 
335   integer :: nqbz = -1, nqibz = -1
336   ! Number of q-points in the BZ/IBZ
337 
338   integer :: my_nqibz = -1, my_nqbz = -1
339   ! Number of q-points in the IBZ/BZ stored by this MPI proc.
340 
341   integer,allocatable :: my_qibz_inds(:)
342   ! (my_nqibz)
343   ! List of q-IBZ indices treated by this proc.
344 
345   integer,allocatable :: my_qbz_inds(:)
346   ! (my_nqbz)
347   ! List of q-IBZ indices treated by this proc.
348 
349   integer :: ntau = -1
350   ! Total number of imaginary time points.
351 
352   integer :: my_ntau = -1
353   ! Number of imaginary time/frequency points treated by this MPI rank.
354 
355   integer :: nkcalc
356    ! Number of Sigma_nk k-points computed
357    ! TODO: Should be spin dependent + max_nkcalc
358 
359   integer :: max_nbcalc
360    ! Maximum number of bands computed (max over nkcalc and spin).
361 
362   integer :: nwr = -1
363    ! Number of frequency points along the real axis for Sigma(w) and spectral function A(w)
364    ! Odd number so that the mesh is centered on the KS energy.
365 
366   !real(dp) :: i_sz = huge(one)
367    ! Value of the integration of the Coulomb singularity 4\pi/V_BZ \int_BZ d^3q 1/q^2
368 
369   real(dp) :: wr_step = -one
370    ! Step of the linear mesh along the real axis (Ha units).
371 
372   real(dp) :: q0(3) = GW_Q0_DEFAULT
373    ! The small q for the treatment of q --> 0
374 
375   real(dp),allocatable :: kcalc(:,:)
376    ! kcalc(3, nkcalc)
377    ! List of k-points where the self-energy is computed.
378 
379   logical :: idle_proc = .False.
380   ! True if there are idle procs i.e. if processes in the input_comm have been excluded.
381 
382   !logical :: use_shmem_for_k = .False.
383   !logical :: use_mpi_for_k = .False.
384 
385   integer,allocatable :: bstart_ks(:,:)
386    ! bstart_ks(nkcalc, nsppol)
387    ! Initial KS band index included in self-energy matrix elements for each k-point in kcalc.
388    ! Depends on spin because all degenerate states should be included when symmetries are used.
389 
390   integer,allocatable :: bstop_ks(:,:)
391    ! bstop_ks(nkcalc, nsppol)
392 
393   integer,allocatable :: nbcalc_ks(:,:)
394    ! nbcalc_ks(nkcalc, nsppol)
395    ! Number of bands included in self-energy matrix elements for each k-point in kcalc.
396    ! Depends on spin because all degenerate states should be included when symmetries are used.
397 
398    integer,allocatable :: kcalc2ibz(:,:)
399    !kcalc2ibz(nkcalc, 6))
400    ! Mapping ikcalc --> IBZ as reported by listkk.
401 
402    logical :: use_supercell_for_tchi = .True.
403    ! True if we are using the supercell formalism for tchi
404    ! False if we are using the mixed-space approach with convolutions in k-space.
405 
406    logical :: use_supercell_for_sigma = .True.
407    ! True if we are using the supercell formalism for sigma
408    ! False if we are using the mixed-space approach with convolutions in k-space.
409 
410    integer :: ngkpt(3) = -1, ngqpt(3) = -1
411    ! Number of divisions in k/q meshes.
412 
413    integer,allocatable :: my_spins(:)
414    ! (my_nspins)
415    ! Indirect table giving the spin indices treated by this MPI rank.
416    ! Used only in the collinear case with nsppol = 2.
417 
418    integer,allocatable :: my_itaus(:)
419    ! (my_ntau)
420    ! Indirect table giving the tau indices treated by this MPI rank.
421 
422    integer,allocatable :: tau_master(:)
423    ! (ntau)
424    ! The rank of the MPI proc in tau_comm treating itau.
425 
426    integer, allocatable :: np_qibz(:)
427    ! (nqibz)
428    ! Number of processors in kpt_comm treating iq_ibz
429 
430    integer, allocatable :: np_kibz(:)
431    ! (nkibz)
432    ! Number of processors in kpt_comm treating ik_ibz
433 
434    logical, allocatable :: itreat_ikibz(:)
435    ! (nkibz)
436    ! True if this MPI rank treat ik_ibz
437 
438    logical, allocatable :: itreat_iqibz(:)
439    ! (nqibz)
440    ! True if this MPI rank treats iq_ibz
441 
442    real(dp),allocatable :: tau_mesh(:), tau_wgs(:)
443    ! (ntau)
444    ! Imaginary tau mesh and integration weights.
445 
446    real(dp),allocatable :: iw_mesh(:), iw_wgs(:)
447    ! (ntau)
448    ! Imaginary frequency mesh and integration weights
449 
450    real(dp),allocatable :: cosft_wt(:,:)
451    ! (ntau, ntau)
452    ! weights for cosine transform. (i tau --> i omega)
453 
454    real(dp),allocatable :: cosft_tw(:,:)
455    ! (ntau, ntau)
456    ! weights for sine transform (i iomega --> i tau)
457 
458    real(dp),allocatable :: sinft_wt(:,:)
459    ! (ntau, ntau)
460    ! weights for sine transform (i tau --> i omega)
461 
462    real(dp) :: te_min = -one, te_max = one
463    ! min and Max transition energy in Ha.
464 
465    real(dp) :: ft_max_error(3) = -one
466    ! Max error due to inhomogenous FT.
467 
468    real(dp) :: cosft_duality_error = -one
469    ! Max_{ij} |CT CT^{-1} - I|
470 
471    integer :: green_mpw = -1
472    ! Max number of g-vectors for Green's function over k-points.
473 
474    integer :: tchi_mpw = -1
475    ! Max number of g-vectors for tchi over q-points.
476 
477    !integer :: sigma_mpw = -1
478    ! Max number of g-vectors for Sigma over q-points.
479 
480    integer :: g_ngfft(18) = -1, g_mgfft = -1, g_nfft = -1
481    ! FFT mesh for the Green's function.
482 
483    !integer :: chi_ngfft(18) = -1, chi_mgfft = -1, chi_nfft = -1
484    !integer :: sig_ngfft(18) = -1, sig_mgfft = -1, sig_nfft = -1
485 
486    integer :: mg0(3) = [2, 2, 2]
487    ! Max shifts to account for umklapps.
488 
489    type(desc_t),allocatable :: green_desc_kibz(:)
490    ! (nkibz)
491    ! Descriptor for Green's functions
492 
493    type(desc_t),allocatable :: tchi_desc_qibz(:)
494    ! (nqibz)
495    ! Descriptor for tchi. NB: The g-vectors are sorted by |q+g|^2/2
496 
497    integer,allocatable :: chinpw_qibz(:)
498    ! Number of PWs in tchi for each q-point in the IBZ (available on all procs)
499 
500    !type(desc_t),allocatable :: sigma_desc_kibz(:)
501    ! (nkibz)
502    ! Descriptor for self-energy
503 
504    integer :: coords_stgk(4) = 0
505    ! Cartesian coordinates of this processor in the Cartesian grid.
506 
507    type(xcomm_t) :: comm
508    ! Communicator with all MPI procs involved in the computation
509    ! NB: gwr%comm%value is not necessarly the same as the input_comm
510    ! we may decide to remove some procs from input_comm before createring the Cartesian grid.
511 
512    type(xcomm_t) :: spin_comm
513    ! MPI communicator over spins.
514 
515    type(xcomm_t) :: kpt_comm
516    ! MPI communicator for k/q-point distribution.
517 
518    type(xcomm_t) :: g_comm
519    ! MPI communicator for g/r distribution
520 
521    type(xcomm_t) :: tau_comm
522    ! MPI communicator for imag time distribution
523 
524    type(xcomm_t) :: gtau_comm
525    ! MPI communicator for g/tau 2D subgrid.
526 
527    type(xcomm_t) :: kg_comm
528    ! MPI communicator for g/g 2D subgrid.
529 
530    type(xcomm_t) :: kts_comm
531    ! MPI communicator for tau/kpoint/spin 3D grid
532 
533    type(xcomm_t) :: kgt_comm
534    ! MPI communicator for g/tau/kpoint 3D grid
535 
536    type(dataset_type), pointer :: dtset => null()
537    ! Input variables.
538 
539    type(datafiles_type), pointer :: dtfil => null()
540    ! Names of input/output files and prefixes.
541 
542    type(crystal_t), pointer :: cryst => null()
543    ! Crystal structure.
544 
545    integer :: scf_iteration = 1
546    ! Internal counter used to implement self-consistency
547    ! For the time being, only self-consistency in energies is supported.
548 
549    integer,allocatable :: ks_vbik(:,:)
550    ! (gwr%ks_ebands%nkpt, gwr%ks_ebands%nsppol)
551    ! KS valence band indices.
552 
553    type(ebands_t), pointer :: ks_ebands => null()
554    ! initial KS energies
555 
556    type(gaps_t) :: ks_gaps
557    ! Info on the KS gaps.
558 
559    type(ebands_t) :: qp_ebands
560    ! QP energies
561 
562    type(ebands_t) :: qp_ebands_prev
563    ! QP energies of the previous iteration. Used if self-consistency.
564 
565    type(pseudopotential_type), pointer :: psps => null()
566    ! NC Pseudos data
567 
568    type(pawtab_type), pointer :: pawtab(:) => null()
569    ! PAW data
570 
571    type(mpi_type),pointer :: mpi_enreg => null()
572    ! Sequential mpi_type needed to invoke ABINIT routines requiring it.
573 
574    type(processor_scalapack) :: g_slkproc
575    ! 1D PBLAS grid to block-distribute matrices along columns inside gcomm.
576 
577    type(__slkmat_t),allocatable :: gt_kibz(:,:,:,:)
578    ! (2, nkibz, ntau, nsppol)
579    ! Occupied/Empty Green's function G_k(g,g')
580 
581    type(__slkmat_t),allocatable :: tchi_qibz(:,:,:)
582    ! (nqibz, ntau, nsppol)
583    ! Irreducible polarizability tchi_q(g,g')
584 
585    character(len=10) :: tchi_space = "none"
586    ! "none", "itau", "iomega"
587 
588    type(__slkmat_t),allocatable :: wc_qibz(:,:,:)
589    ! (nqibz, ntau, nsppol)
590    ! Correlated screened Coulomb interaction summed over collinear spins
591    ! Replicated across spin_comm if nsppol == 2.
592 
593    character(len=10) :: wc_space = "none"
594    ! "none", "itau", "iomega"
595 
596    !type(__slkmat_t),allocatable :: em1_qibz(:,:,:)
597    ! Inverse dielectric matrix at omega = 0
598    ! (nqibz, nsppol)
599    ! Replicated across the tau comm and the spin comm if nsppol == 2.
600 
601    type(__slkmat_t),allocatable :: sigc_kibz(:,:,:,:)
602    ! (2, nkibz, ntau, nsppol)
603 
604    character(len=10) :: sigc_space = "none"
605    ! "none", "itau", "iomega"
606 
607    type(__slkmat_t),allocatable :: ugb(:,:) !, nato_ugb(:,:)
608    ! (nkibz, nsppol)
609    ! Fourier components of the KS wavefunctions stored in a PBLAS matrix
610    ! Bands are distributed in the g_comm communicator in a round-robin fashion.
611    ! hence they are REPLICATED over tau_comm as this leads to better scalability in terms of flops.
612    ! Distributing bands inside the 2D gtau_comm, indeed, allows one to reduce memory further
613    ! but then the pzgemm used to build G explodes. Also. tau parallelism is high-level in GWR so it's not a good idea
614    ! to mix it with low-level just to make memory for ugb scale better.
615    ! The size of ugb is negligible when compared to G and Chi.
616 
617    type(processor_scalapack) :: gtau_slkproc
618    ! Scalapack grid with (g,tau) processors
619 
620    integer :: ugb_nband = -1
621    ! Number of bands in ugb.
622 
623    type(vcgen_t) :: vcgen
624    ! Object used to compute Coulomb term vc(q,g)
625 
626    character(len=fnlen) :: gwrnc_path = ABI_NOFILE
627    ! Path to the GWR.nc file with output results.
628 
629    real(dp),allocatable :: kbz(:,:)
630    ! (3, nkbz)
631    ! Reduced coordinates of the k-points in the full BZ.
632 
633    real(dp), contiguous, pointer :: kibz(:,:) => null()
634     ! (3, nkibz)
635     ! Reduced coordinates of the k-points in the IBZ
636 
637    integer,allocatable :: kbz2ibz(:,:)
638     ! (6, nkbz)
639     ! Mapping kBZ to IBZ (symrec conventions)
640 
641    integer,allocatable :: kbz2ibz_symrel(:,:)
642     ! (6, nkbz)
643     ! Mapping kBZ to IBZ (symrel conventions) TODO: To be removed
644 
645    real(dp), contiguous, pointer :: wtk(:) => null()
646     ! (nkibz)
647     ! Weights of the k-points in the IBZ (normalized to one).
648 
649    real(dp),allocatable :: qbz(:,:)
650     ! (3, nqbz)
651     ! Reduced coordinates of the q-points in the full BZ.
652 
653    integer,allocatable :: qbz2ibz(:,:)
654    ! (6, nqbz)
655    ! Mapping qBZ to IBZ (symrec conventions)
656 
657    real(dp),allocatable :: qibz(:,:)
658    ! (3, nqibz)
659    ! Reduced coordinates of the q-points in the IBZ (full symmetry of the system).
660 
661    real(dp),allocatable :: wtq(:)
662    ! (nqibz)
663    ! Weights of the q-points in the IBZ (normalized to one).
664 
665    complex(dp),allocatable :: chi0_head_myw(:,:,:)
666    ! (3,3,my_ntau)
667    ! Head of the irred. polarizability in i.omega space.
668    ! Note that spins have been summed over.
669 
670    complex(dp),allocatable :: chi0_uwing_myw(:,:,:), chi0_lwing_myw(:,:,:)
671    ! (3, npw_chi_gamma, my_ntau)
672    ! Upper wings of the irred. polarizability in i omega space.
673    ! Note that spins have been summed over.
674 
675    type(wfdgw_t) :: kcalc_wfd
676    ! wavefunction descriptor with the KS states where QP corrections are wanted.
677 
678    type(hdr_type) :: wfk_hdr
679    ! header of the WFK file
680 
681    type(melements_t) :: ks_me !, qp_me
682    ! Matrix elements of the different potentials in the KS basis set.
683 
684    type(degtab_t),allocatable :: degtab(:,:)
685    ! (nkcalc, nsppol)
686    ! Table used to average QP results in the degenerate subspace if symsigma == 1
687 
688    integer :: b1gw = -1, b2gw = -1
689    ! b1gw = minval(gwr%bstart_ks); b2gw = maxval(gwr%bstop_ks)
690 
691    logical :: sig_diago
692    ! True if Sigma_ matrices are diagonal in the band indices
693 
694    complex(dp),allocatable :: sigx_mat(:,:,:,:)
695    ! (b1gw:b2gw, ?, nkcalc, nsppol*nsig_ab)
696    ! Matrix elements of <i|\Sigma_x|j>. The second dimension depends on sig_diago
697 
698    !complex(dp),allocatable :: sigc_it_mat(:,:,:,:,:)
699    ! (2, ntau, max_nbcalc, nkcalc, nsppol*nsig_ab))
700    ! Matrix elements of <i|\Sigma_c(itau)|j>. The second dimension depends on sig_diago
701 
702    complex(dp),allocatable :: sigc_iw_mat(:,:,:,:,:)
703    ! Matrix elements of <i|\Sigma_c(i omega)|j>
704    ! (ntau, b1gw:b2gw, ?, nkcalc, nsppol*nsig_ab). The second dimension depends on sig_diago
705 
706    type(pstat_t) :: pstat
707    ! Interface to the /proc/{pid}/status file.
708 
709  contains
710 
711    procedure :: init => gwr_init
712    ! Initialize the object.
713 
714    procedure :: rotate_gpm => gwr_rotate_gpm
715    ! Reconstruct the Green's functions in the BZ from the IBZ.
716 
717    procedure :: gk_to_scbox => gwr_gk_to_scbox
718 
719    procedure :: wcq_to_scbox => gwr_wcq_to_scbox
720 
721    procedure :: get_myk_green_gpr => gwr_get_myk_green_gpr
722     ! G_k(g,g') --> G_k(g',r) for each k in the BZ treated by this MPI proc for given spin and tau.
723 
724    procedure :: get_gkbz_rpr_pm => gwr_get_gkbz_rpr_pm
725    ! Compute G_k(r',r) with (r, r') in the unit cell and k in the full BZ.
726 
727    procedure :: rotate_wc => gwr_rotate_wc
728    ! Reconstruct Wc(q) in the BZ from the IBZ.
729 
730    procedure :: get_myq_wc_gpr => gwr_get_myq_wc_gpr
731    ! W_q(g,g') --> W_q(g',r) for each q in the BZ treated by this MPI procs for given spin and tau.
732 
733    procedure :: get_wc_rpr_qbz => gwr_get_wc_rpr_qbz
734    ! Compute Wc_q(r',r') with q in the BZ
735 
736    procedure :: cos_transform  => gwr_cos_transform
737    ! Inhomogeneous cosine transform.
738 
739    procedure :: malloc_free_mats => gwr_malloc_free_mats
740    ! Allocate/Deallocate matrices for G/tchi/Sigma
741 
742    procedure :: free => gwr_free
743    ! Free memory.
744 
745    procedure :: print => gwr_print
746    ! Print info on the object.
747 
748    procedure :: print_mem => gwr_print_mem
749    ! Print memory required by PBLAS matrices.
750 
751    procedure :: print_trace => gwr_print_trace
752    ! Print trace of matrices for testing purposes.
753 
754    procedure :: load_kcalc_wfd => gwr_load_kcalc_wfd
755    ! Load the KS states for Sigma_nk from the WFK file
756 
757    procedure :: read_ugb_from_wfk => gwr_read_ugb_from_wfk
758    ! Read wavefunctions from WFK file.
759 
760    procedure :: build_green => gwr_build_green
761    ! Build Green's functions in imaginary time from the gwr%ugb matrices stored in memory.
762 
763    procedure :: build_tchi => gwr_build_tchi
764    ! Build the irreducible polarizability
765 
766    procedure :: redistrib_gt_kibz => gwr_redistrib_gt_kibz
767    ! Redistribute/deallocate G_k
768 
769    procedure :: redistrib_mats_qibz => gwr_redistrib_mats_qibz
770    ! Redistribute/deallocate tchi_q or Wc_q
771 
772    procedure :: build_wc => gwr_build_wc
773    ! Build the correlated part of the screened interaction.
774 
775    procedure :: build_sigmac => gwr_build_sigmac
776    ! Build the correlated part of the self-energy GWc
777    ! and compute matrix elements in the KS representation.
778 
779    procedure :: rpa_energy => gwr_rpa_energy
780    ! Compute RPA energy.
781 
782    procedure :: gamma_gw => gwr_gamma_gw
783 
784    procedure :: build_chi0_head_and_wings => gwr_build_chi0_head_and_wings
785    ! Compute head and wings of chi0
786 
787    procedure :: build_sigxme => gwr_build_sigxme
788    ! Compute matrix elements of the exchange part.
789 
790    procedure :: get_u_ngfft => gwr_get_u_ngfft
791    ! Compute FFT mesh from boxcutmin
792 
793    procedure :: run_g0w0 => gwr_run_g0w0
794    ! Compute QP corrections with one-shot G0W0.
795 
796    procedure :: run_chi0 => gwr_run_chi0
797    ! Compute CHI0 only.
798 
799    procedure :: run_energy_scf => gwr_run_energy_scf
800    ! Compute QP corrections with energy-only self-consistent GW
801 
802    procedure :: check_scf_cycle => gwr_check_scf_cycle
803    ! Check SCF cycle for convergence.
804 
805    procedure :: ncwrite_tchi_wc => gwr_ncwrite_tchi_wc
806    ! Write tchi or wc to netcdf file
807 
808  end type gwr_t

m_gwr/gwr_wcq_to_scbox [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  gwr_wcq_to_scbox

FUNCTION

INPUTS

OUTPUT

SOURCE

2629 subroutine gwr_wcq_to_scbox(gwr, sc_ngfft, desc_myqbz, wc_scgvec, my_ir, ndat, &
2630                             wc_gpr, wct_scbox, wct_scbox_win)
2631 
2632 !Arguments ------------------------------------
2633  class(gwr_t),target,intent(in) :: gwr
2634  integer,intent(in) :: sc_ngfft(18)
2635  integer,intent(out) :: wc_scgvec(3, gwr%tchi_mpw)
2636  type(desc_t),intent(inout) :: desc_myqbz(gwr%my_nqbz)
2637  type(__slkmat_t),intent(in) :: wc_gpr(gwr%my_nqbz)
2638  integer,intent(in) :: my_ir, ndat
2639  complex(gwpc),intent(out) :: wct_scbox(product(sc_ngfft(4:6))*gwr%nspinor, gwr%sc_batch_size)
2640  !complex(gwpc),intent(out) :: wct_scbox(:,:)
2641  integer,optional,intent(inout) :: wct_scbox_win
2642 
2643 !Local variables-------------------------------
2644  integer :: my_iqf, iq_bz, idat, iepoch, ii, idat_list(gwr%kpt_comm%nproc) ! gg(3), ig,
2645  !real(dp) :: tsec(2) !, cpu, wall, gflops
2646 
2647 ! *************************************************************************
2648 
2649  !call timab(1930, 1, tsec)
2650 
2651  ! Take the union of (q,g') for q in the BZ. Note gwr%ngqpt instead of gwr%ngkpt.
2652 
2653  if (.not. present(wct_scbox_win)) then
2654    wct_scbox = czero_gw
2655    do my_iqf=1,gwr%my_nqbz
2656      iq_bz = gwr%my_qbz_inds(my_iqf)
2657 
2658 #if 1
2659      call desc_myqbz(my_iqf)%to_scbox(gwr%qbz(:,iq_bz), gwr%ngqpt, sc_ngfft, gwr%nspinor*ndat, &
2660                                       wc_gpr(my_iqf)%buffer_cplx(1,my_ir), wct_scbox)
2661 #else
2662      gg = nint(gwr%qbz(:, iq_bz) * gwr%ngqpt)
2663      associate (desc_q => desc_myqbz(my_iqf))
2664      do ig=1,desc_q%npw
2665        wc_scgvec(:,ig) = gg + gwr%ngqpt * desc_q%gvec(:,ig) ! q + g'
2666      end do
2667      call gsph2box(sc_ngfft, desc_q%npw, gwr%nspinor * ndat, wc_scgvec, &
2668                    wc_gpr(my_iqf)%buffer_cplx(1,my_ir), wct_scbox)
2669      end associate
2670 #endif
2671    end do ! my_iqf
2672 
2673  else
2674    ! Each MPI proc operates on a different idat vector at each epoch
2675    idat_list = cshift([(ii, ii=1,gwr%kpt_comm%nproc)], shift=-gwr%kpt_comm%me)
2676 
2677    do iepoch=1,gwr%kpt_comm%nproc
2678      call xmpi_win_fence(wct_scbox_win)
2679      idat = idat_list(iepoch)
2680      if (idat > ndat) goto 10
2681      if (iepoch == 1) wct_scbox(:,idat) = czero_gw
2682 
2683      do my_iqf=1,gwr%my_nkbz
2684        iq_bz = gwr%my_qbz_inds(my_iqf)
2685 #if 1
2686        call desc_myqbz(my_iqf)%to_scbox(gwr%qbz(:,iq_bz), gwr%ngqpt, sc_ngfft, gwr%nspinor * ndat1, &
2687                                         wc_gpr(my_iqf)%buffer_cplx(1,my_ir+idat-1), wct_scbox(:,idat))
2688 #else
2689        gg = nint(gwr%qbz(:, iq_bz) * gwr%ngqpt)
2690        associate (desc_q => desc_myqbz(my_iqf))
2691        do ig=1,desc_q%npw
2692          wc_scgvec(:,ig) = gg + gwr%ngqpt * desc_q%gvec(:,ig) ! q + g'
2693        end do
2694        call gsph2box(sc_ngfft, desc_q%npw, gwr%nspinor * ndat1, wc_scgvec, &
2695                      wc_gpr(my_iqf)%buffer_cplx(1,my_ir+idat-1), wct_scbox(:,idat))
2696        end associate
2697 #endif
2698      end do ! my_iqf
2699      10 continue
2700      !call xmpi_barrier(gwr%kpt_comm%value)
2701      !IF (.not. MPI_ASYNC_PROTECTS_NONBLOCKING) CALL MPI_F_SYNC_REG(wct_scbox)
2702      call xmpi_win_fence(wct_scbox_win)
2703    end do ! iepoch
2704  end if
2705 
2706  !call timab(1930, 2, tsec)
2707 
2708 end subroutine gwr_wcq_to_scbox

m_gwr/sc_sum [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  sc_sum

FUNCTION

INPUTS

OUTPUT

SOURCE

7962 subroutine sc_sum(sc_shape, uc_ngfft, nspinor, ph1d, k_is_gamma, alpha, sc_data, uc_psi, cout)
7963 
7964 !Arguments ------------------------------------
7965  integer,intent(in) :: sc_shape(3), uc_ngfft(18), nspinor
7966  complex(gwpc),intent(in) :: ph1d(maxval(sc_shape), 3)
7967  logical,intent(in) :: k_is_gamma
7968  complex(gwpc),target,intent(in) :: alpha, uc_psi(uc_ngfft(1)*uc_ngfft(2)*uc_ngfft(3)*nspinor)
7969  complex(gwpc),target,intent(in) :: &
7970     sc_data(uc_ngfft(1)*sc_shape(1)*uc_ngfft(2)*sc_shape(2)*uc_ngfft(3)*sc_shape(3)*nspinor)
7971  complex(gwpc),intent(out) :: cout
7972 
7973 !Local variables-------------------------------
7974  integer :: il1, il2, il3, spinor, uc_n1, uc_n2, uc_n3, ix, iy, iz !, idat
7975  complex(gwpc) :: cphase, phl32, phl3
7976  complex(gwpc),contiguous,pointer :: uc_psi_ptr(:,:,:,:), sc_data_ptr(:,:,:,:,:,:,:)
7977 
7978 ! *************************************************************************
7979 
7980  uc_n1 = uc_ngfft(1); uc_n2 = uc_ngfft(2); uc_n3 = uc_ngfft(3)
7981 
7982  call c_f_pointer(c_loc(uc_psi), uc_psi_ptr, shape=[uc_n1, uc_n2, uc_n3, nspinor])
7983  call c_f_pointer(c_loc(sc_data), sc_data_ptr, &
7984                   shape=[uc_n1, sc_shape(1), uc_n2, sc_shape(2), uc_n3, sc_shape(3), nspinor])
7985 
7986  ABI_CHECK(nspinor == 1, "nspinor 2 not coded")
7987  spinor = 1
7988  cout = zero
7989 
7990  if (k_is_gamma) then
7991    ! Don't need to multiply by e^{ik.L}
7992    do il3=1,sc_shape(3)
7993      do iz=1,uc_n3
7994        do il2=1,sc_shape(2)
7995          do iy=1,uc_n2
7996            do il1=1,sc_shape(1)
7997              do ix=1,uc_n1
7998                cout = cout + uc_psi_ptr(ix, iy, iz, spinor) * sc_data_ptr(ix, il1, iy, il2, iz, il3, spinor)
7999              end do
8000            end do
8001          end do
8002        end do
8003      end do
8004    end do
8005 
8006  else
8007    ! Need to multiply by e^{ik.L}
8008    do il3=1,sc_shape(3)
8009      phl3 = ph1d(il3, 3)
8010      do iz=1,uc_n3
8011        do il2=1,sc_shape(2)
8012          phl32 = phl3 * ph1d(il2, 2)
8013          do iy=1,uc_n2
8014            do il1=1,sc_shape(1)
8015              cphase = phl32 * ph1d(il1, 1)  ! e^{ik.L}
8016              do ix=1,uc_n1
8017                cout = cout + cphase * uc_psi_ptr(ix, iy, iz, spinor) * sc_data_ptr(ix, il1, iy, il2, iz, il3, spinor)
8018              end do
8019            end do
8020          end do
8021        end do
8022      end do
8023    end do
8024  end if
8025 
8026  cout = alpha * cout
8027 
8028 end subroutine sc_sum

m_gwr/sig_braket_ur [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  sig_braket_ur

FUNCTION

    Integrate self-energy matrix elements in the unit cell.

INPUTS

OUTPUT

SOURCE

6048 subroutine sig_braket_ur(sig_rpr, nfftsp, ur_glob, sigm_pm)
6049 
6050 !Arguments ------------------------------------
6051  integer,intent(in) :: nfftsp
6052  type(__slkmat_t),intent(in) :: sig_rpr(2,2)
6053  complex(gwpc),intent(in) :: ur_glob(nfftsp)
6054  complex(gwpc),intent(out) :: sigm_pm(2)
6055 
6056 !Local variables-------------------------------
6057  integer :: ipm, ir1, il_r1 !, ierr
6058  complex(gwpc),allocatable :: loc_cwork(:)
6059 
6060 ! *************************************************************************
6061 
6062  ! (r',r) with r' local and r-index PBLAS-distributed.
6063 
6064  sigm_pm = czero_gw
6065  do ipm=1,2
6066    associate (rp_r => sig_rpr(1,ipm))
6067    ! Integrate over r'
6068    !ABI_CHECK_IEQ(nfftsp, rp_r%sizeb_local(1), "First dimension should be local to each MPI proc!")
6069    ABI_MALLOC(loc_cwork, (rp_r%sizeb_local(2)))
6070    loc_cwork(:) = matmul(transpose(rp_r%buffer_cplx), ur_glob)
6071    ! TODO
6072    !call xgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc )
6073    ! Integrate over r. Note complex conjugate.
6074    do il_r1=1,rp_r%sizeb_local(2)
6075      ir1 = rp_r%loc2gcol(il_r1)
6076      sigm_pm(ipm) = sigm_pm(ipm) + conjg(ur_glob(ir1)) * loc_cwork(il_r1)
6077    end do
6078    ABI_FREE(loc_cwork)
6079    end associate
6080  end do
6081 
6082 end subroutine sig_braket_ur

m_gwr/write_notations [ Functions ]

[ Top ] [ m_gwr ] [ Functions ]

NAME

  write_notations

FUNCTION

  Write the meaning of the different columns.

SOURCE

6008 subroutine write_notations(units)
6009  integer,intent(in) :: units(:)
6010  integer :: ii, unt
6011 
6012  do ii=1,size(units)
6013    unt = units(ii)
6014    write(unt,"(a)")repeat("=", 80)
6015    write(unt,"(a)")" QP results (energies in eV)"
6016    write(unt,"(a)")" Notations:"
6017    write(unt,"(a)")"     E0: Kohn-Sham energy"
6018    write(unt,"(a)")"     <VxcDFT>: Matrix elements of Vxc[n_val] without non-linear core correction (if any)"
6019    write(unt,"(a)")"     SigX: Matrix elements of Sigma_x"
6020    write(unt,"(a)")"     SigC(E0): Matrix elements of Sigma_c at E0"
6021    write(unt,"(a)")"     Z: Renormalization factor"
6022    write(unt,"(a)")"     E-E0: Difference between the QP and the KS energy."
6023    write(unt,"(a)")"     E-Eprev: Difference between QP energy at iteration i and i-1"
6024    write(unt,"(a)")"     E: Quasi-particle energy"
6025    write(unt,"(a)")"     Occ(E): Occupancy of QP state"
6026    !write(unt,"(a)")"     SE1(eKS): Real part of the self-energy computed at the KS energy, SE2 for imaginary part."
6027    write(unt,"(a)")" "
6028    write(unt,"(a)")" "
6029  end do
6030 end subroutine write_notations