layouts_config_type Derived Type

type, public, extends(abstract_config_type) :: layouts_config_type

Used to represent the input configuration of layouts


Contents

Source Code


Components

Type Visibility Attributes Name Initial
logical, public :: exist = .false.

Does the related namelist exist in the target input file?

integer, public :: index = 0

Used to hold the specific index of numbered namelists

logical, public :: skip_read = .false.

Do we want to skip the read step in init?

logical, public :: skip_broadcast = .false.

Do we want to skip the broadcast step in init?

logical, public :: fft_measure_plan = .false.

If false then fftw will use heuristics to determine the best fft plan. If true then timing measurements will be made to determine an optimal plan. When true it can take somewhat longer to initialise the fft plans.

logical, public :: fft_use_wisdom = .true.

Try to load and save wisdom about fftw plans to fft_wisdom_file. This can speed up the fft initialisation when running new cases with the same grid sizes as previous runs.

character(len=run_name_size), public :: fft_wisdom_file = 'default'

Location of fftw wisdom file, if left as default, this is set to run_name//'.fftw_wisdom', unless overriden by the environment variable GK_FFTW3_WISDOM. If set to anything other than default, overrides GK_FFTW3_WISDOM.

logical, public :: gf_local_fields = .false.

If true then perform initial decomposition setup related to the gf_local field approach, setting up the gf_lo decomposition. This is forced to false if the number of processors is less than naky * ntheta0. See also the field_option input of fields_knobs.

logical, public :: intmom_sub = .false.

When set to true use sub-communicators in the velocity space integration routines associated with taking moments of the distribution function. The sub-communicator involves all processors with a given xys part of the domain (i.e. the same range in theta0, ky and species dimensions). As such this is forced to false if one or more of these three dimensions are not split "nicely" (typically meaning if we're not using an appropriate sweetspot). Can provide a small performance improvement when true in certain cases.

logical, public :: intspec_sub = .false.

When set to true use sub-communicators in the velocity space integration routines associated with taking species summed moments of the distribution function. The sub-communicator involves all processors with a given xy part of the domain (i.e. the same range in theta0 and ky dimensions). As such this is forced to false if one or more of these two dimensions are not split "nicely" (typically meaning if we're not using an appropriate sweetspot). Can provide a small performance improvement when true in certain cases.

character(len=5), public :: layout = 'lxyes'

This string determines how the distributed dimensions (k)x, (k)y, l(ambda), e(nergy) and s(pecies) are laid out in (global) memory. The rightmost dimensions are parallelised first, with the leftmost dimension being most local. This can strongly impact performance and the sweetspots suggested by ingen.

Valid options are:

  • 'lxyes'
  • 'xyles'
  • 'yxles'
  • 'lexys'
  • 'lyxes'

The optimal choice depends on the type of simulation being run. It is typically expensive to parallelise x in simulations using the box kt_grids type with linked boundary conditions, including all nonlinear simulations, so xyles is a good choice for these cases. Furthermore for nonlinear cases we must Fourier transform in x and y, so again xyles of yxles are good options. For collisional cases, especially those using the le_lo layout, can benefit from using lexys. Collisional nonlinear simulations therefore have two/three competing choices and it is advisable to test both (remembering that the most suitable number of processors may also change when the layout is changed).

logical, public :: local_field_solve = .false.

Can strongly affect initialization time on some parallel computers. Recommendation: Set true on computers with slow communication networks. It's probably worth trying changing this on your favourite machine to see how much difference it makes for you.

real, public :: max_unbalanced_xxf = 0.0

Sets maximum allowable fractional imbalance between the two different blocksizes used in the xxf_lo decomposition used in the nonlinear term calculation if unbalanced_xxf is true. See Adrian Jackson's DCSE report for more details.

real, public :: max_unbalanced_yxf = 0.0

Sets maximum allowable fractional imbalance between the two different blocksizes used in the yxf_lo decomposition used in the nonlinear term calculation if unbalanced_yxf is true. See Adrian Jackson's DCSE report for more details.

integer, public :: nproc_e_lo = 0

The number of processors to use in e_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

integer, public :: nproc_g_lo = 0

The number of processors to use in g_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

integer, public :: nproc_le_lo = 0

The number of processors to use in le_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

integer, public :: nproc_lz_lo = 0

The number of processors to use in lz_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

integer, public :: nproc_xxf_lo = 0

The number of processors to use in xxf_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

integer, public :: nproc_yxf_lo = 0

The number of processors to use in yxf_lo layout. Capped to number of processors in comm world. If not set (<=0) defaults to global nproc.

logical, public :: opt_local_copy = .false.

Setting to true enables optimising redistribute code, used in FFTs for evaluating nonlinear terms, that avoids indirect addressing. This can introduces worthwhile savings in nonlinear GS2 simulations at lower core counts. See Adrian Jackson's DCSE report for more details.

logical, public :: opt_redist_nbk = .true.

Set to true to use non-blocking communications in the redistribute routines. This is generally more performant but has been observed to be slower in one or two rare cases.

logical, public :: opt_redist_persist = .false.

Set to true to use persistent (non-blocking) communications in the redistribute routines. Requires opt_redist_nbk to be true. Can help improve scaling efficiency at large core counts.

logical, public :: opt_redist_persist_overlap = .false.

Set to true to try to overlap the mpi and local parts of the gather/scatter routines. Should only be used with opt_redist_persist. This is typically not seen to have any impact on performance. See optimising your runs for more details.

logical, public :: simple_gf_decomposition = .true.

When in gf_lo, if there are fewer points than processors , then assign the points to the first and leave the rest of the processors empty

logical, public :: unbalanced_xxf = .false.

Setting to true allows GS2 to adopt a more flexible domain decomposition of the xxf data decomposition (used in nonlinear FFTs). By default GS2 allocates each 1 task with the same uniform blocksize in xxf_lo, one task may have a smaller block of data, and other tasks may be empty. There is no attempt to keep both x and y as local as possible, and sometimes large 1 data transfers are required to map from xxf to yxf and vice-versa during FFTs. With unbalanced_xxf = .true., two slightly different blocksizes are chosen in order to keep both x and y as local as possible, and avoid this potentially large 1 communication overhead. The level of imbalance is limited by max_unbalanced_xxf.

Note ingen can provide data on the imbalance and communication required.

See Adrian Jackson's DCSE report for more details.

logical, public :: unbalanced_yxf = .false.

Setting to true allows GS2 to adopt a more flexible domain decomposition of the yxf data decomposition (used in nonlinear FFTs). By default GS2 allocates each 1 task with the same uniform blocksize in yxf_lo, one task may have a smaller block of data, and other tasks may be empty. There is no attempt to keep both x and y as local as possible, and sometimes large 1 data transfers are required to map from xxf to yxf and vice-versa during FFTs. With unbalanced_yxf = .true., two slightly different blocksizes are chosen in order to keep both x and y as local as possible, and avoid this potentially large 1 communication overhead. The level of imbalance is limited by max_unbalanced_yxf.

Note ingen can provide data on the imbalance and communication required.

See Adrian Jackson's DCSE report for more details.


Type-Bound Procedures

procedure, public, :: is_initialised => is_initialised_generic

procedure, public, :: init => init_generic

  • private subroutine init_generic(self, name, requires_index, index)

    Fully initialise the config object

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(inout) :: self
    character(len=*), intent(in), optional :: name
    logical, intent(in), optional :: requires_index
    integer, intent(in), optional :: index

procedure, public, :: setup => setup_generic

  • private subroutine setup_generic(self, name, requires_index, index)

    Do some standard setup/checking

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(inout) :: self
    character(len=*), intent(in), optional :: name
    logical, intent(in), optional :: requires_index
    integer, intent(in), optional :: index

procedure, public, :: write_namelist_header

  • private subroutine write_namelist_header(self, unit)

    Write the namelist header for this instance

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self
    integer, intent(in) :: unit

procedure, public, :: get_name => get_name_generic

  • private function get_name_generic(self)

    Returns the namelist name. Not very useful at the moment but may want to do more interesting things in the future

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self

    Return Value character(len=CONFIG_MAX_NAME_LEN)

procedure, public, :: get_requires_index => get_requires_index_generic

  • private function get_requires_index_generic(self)

    Returns the requires_index value. Allows access whilst keeping the variable private

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self

    Return Value logical

procedure, public, nopass :: write_namelist_footer

  • private subroutine write_namelist_footer(unit)

    Write the namelist footer

    Arguments

    Type IntentOptional Attributes Name
    integer, intent(in) :: unit
  • private subroutine write_key_val_string(key, val, unit)

    Writes a {key,val} pair where the value is of type character

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: key
    character(len=*), intent(in) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_real(key, val, unit)

    Writes a {key,val} pair where the value is of type real

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: key
    real, intent(in) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_complex(key, val, unit)

    Writes a {key,val} pair where the value is of type complex

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: key
    complex, intent(in) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_integer(key, val, unit)

    Writes a {key,val} pair where the value is of type integer

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: key
    integer, intent(in) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_logical(key, val, unit)

    Writes a {key,val} pair where the value is of type logical

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: key
    logical, intent(in) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_real_array(self, key, val, unit)

    Writes a {key,val} pair where the value is of type real array

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self
    character(len=*), intent(in) :: key
    real, intent(in), dimension(:) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_complex_array(self, key, val, unit)

    Writes a {key,val} pair where the value is of type complex array

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self
    character(len=*), intent(in) :: key
    complex, intent(in), dimension(:) :: val
    integer, intent(in) :: unit
  • private subroutine write_key_val_integer_array(self, key, val, unit)

    Writes a {key,val} pair where the value is of type integer array

    Arguments

    Type IntentOptional Attributes Name
    class(abstract_config_type), intent(in) :: self
    character(len=*), intent(in) :: key
    integer, intent(in), dimension(:) :: val
    integer, intent(in) :: unit

procedure, public :: read => read_layouts_config

  • private subroutine read_layouts_config(self)

    Reads in the layouts_knobs namelist and populates the member variables

    Arguments

    Type IntentOptional Attributes Name
    class(layouts_config_type), intent(inout) :: self

procedure, public :: write => write_layouts_config

  • private subroutine write_layouts_config(self, unit)

    Writes out a namelist representing the current state of the config object

    Arguments

    Type IntentOptional Attributes Name
    class(layouts_config_type), intent(in) :: self
    integer, intent(in), optional :: unit

procedure, public :: reset => reset_layouts_config

  • private subroutine reset_layouts_config(self)

    Resets the config object to the initial empty state

    Arguments

    Type IntentOptional Attributes Name
    class(layouts_config_type), intent(inout) :: self

procedure, public :: broadcast => broadcast_layouts_config

  • private subroutine broadcast_layouts_config(self)

    Broadcasts all config parameters so object is populated identically on all processors

    Arguments

    Type IntentOptional Attributes Name
    class(layouts_config_type), intent(inout) :: self

procedure, public, nopass :: get_default_name => get_default_name_layouts_config

  • private function get_default_name_layouts_config()

    Gets the default name for this namelist

    Arguments

    None

    Return Value character(len=CONFIG_MAX_NAME_LEN)

procedure, public, nopass :: get_default_requires_index => get_default_requires_index_layouts_config

Source Code

  type, extends(abstract_config_type) :: layouts_config_type
     ! namelist : layouts_knobs
     ! indexed : false
     !> If false then fftw will use heuristics to determine the best
     !> fft plan. If true then timing measurements will be made to
     !> determine an optimal plan. When true it can take somewhat
     !> longer to initialise the fft plans.
     !>
     !> @warning If true then results will not be exactly reproducible
     !> as the choice of the optimal plan can vary from run to run
     !> when timing different approaches. This is the main reason why
     !> the default is false here.
     logical :: fft_measure_plan = .false.
     !> Try to load and save wisdom about fftw plans to
     !> `fft_wisdom_file`. This can speed up the fft initialisation
     !> when running new cases with the same grid sizes as previous
     !> runs.
     logical :: fft_use_wisdom = .true.
     !> Location of fftw wisdom file, if left as default, this is set to
     !> run_name//'.fftw_wisdom', unless overriden by the environment
     !> variable `GK_FFTW3_WISDOM`. If set to anything other than default,
     !> overrides `GK_FFTW3_WISDOM`.
     character(len = run_name_size) :: fft_wisdom_file = 'default'
     !> If true then perform initial decomposition setup related to
     !> the `gf_local` field approach, setting up the `gf_lo`
     !> decomposition. This is forced to false if the number of
     !> processors is less than `naky * ntheta0`. See also the
     !> `field_option` input of [[fields_knobs]].
     logical :: gf_local_fields = .false.
     !> When set to true use sub-communicators in the velocity space
     !> integration routines associated with taking moments of the
     !> distribution function. The sub-communicator involves all
     !> processors with a given `xys` part of the domain (i.e. the
     !> same range in theta0, ky and species dimensions). As such this
     !> is forced to false if one or more of these three dimensions
     !> are not split "nicely" (typically meaning if we're not using
     !> an appropriate sweetspot). Can provide a small performance
     !> improvement when true in certain cases.
     !>
     !> @note These sub-communicators affect calls to
     !> [[integrate_moment]] with complex variables. By default there
     !> is no gather of data from other procs so the integration
     !> result may only be known for the local `xys` block. This could
     !> cause a problem for diagnostics which want to write the full
     !> array.  The optional argument `full_arr` can override this,
     !> forcing the full array to be known. This is only a concern if
     !> the optional argument `all` is also passed.
     logical :: intmom_sub = .false.
     !> When set to true use sub-communicators in the velocity space
     !> integration routines associated with taking species summed
     !> moments of the distribution function. The sub-communicator
     !> involves all processors with a given `xy` part of the domain
     !> (i.e. the same range in theta0 and ky dimensions). As such
     !> this is forced to false if one or more of these two dimensions
     !> are not split "nicely" (typically meaning if we're not using
     !> an appropriate sweetspot). Can provide a small performance
     !> improvement when true in certain cases.
     logical :: intspec_sub = .false.
     !> This string determines how the distributed dimensions (k)`x`,
     !> (k)`y`, `l`(ambda), `e`(nergy) and `s`(pecies) are laid out in
     !> (global) memory. The rightmost dimensions are parallelised
     !> first, with the leftmost dimension being most local. This can
     !> strongly impact performance and the sweetspots suggested by
     !> [[ingen]].
     !>
     !> Valid options are:
     !>
     !> - 'lxyes'
     !> - 'xyles'
     !> - 'yxles'
     !> - 'lexys'
     !> - 'lyxes'
     !>
     !> The optimal choice depends on the type of simulation being
     !> run.  It is typically expensive to parallelise `x` in
     !> simulations using the `box` kt_grids type with linked boundary
     !> conditions, including all nonlinear simulations, so `xyles` is
     !> a good choice for these cases.  Furthermore for nonlinear
     !> cases we must Fourier transform in `x` and `y`, so again
     !> `xyles` of `yxles` are good options. For collisional cases,
     !> especially those using the `le_lo` layout, can benefit from
     !> using `lexys`. Collisional nonlinear simulations therefore
     !> have two/three competing choices and it is advisable to test both
     !> (remembering that the most suitable number of processors may
     !> also change when the layout is changed).
     !>
     !> @todo Consider changing the default to either `xyles` or `lexys`.
     character(len = 5) :: layout = 'lxyes'
     !> Can strongly affect initialization time on some parallel
     !> computers.  Recommendation: Set true on computers with slow
     !> communication networks.  It's probably worth trying changing
     !> this on your favourite machine to see how much difference it
     !> makes for you.
     !>
     !> @note This only impacts simulations with a `field_option` of
     !> `implicit`.
     !>
     !> @todo investigate if this setting is still helpful on current
     !> machines and if we can determine heuristics for when to enable
     !> it.
     logical :: local_field_solve = .false.
     !> Sets maximum allowable fractional imbalance between the two
     !> different blocksizes used in the `xxf_lo` decomposition used
     !> in the nonlinear term calculation if `unbalanced_xxf` is true.
     !> See [Adrian Jackson's DCSE
     !> report](https://bitbucket.org/gyrokinetics/wikifiles/raw/HEAD/CMR/GS2_Final_report_NAG_Version_v1.0.pdf)
     !> for more details.
     real :: max_unbalanced_xxf = 0.0
     !> Sets maximum allowable fractional imbalance between the two
     !> different blocksizes used in the `yxf_lo` decomposition used
     !> in the nonlinear term calculation if `unbalanced_yxf` is true.
     !> See [Adrian Jackson's DCSE
     !> report](https://bitbucket.org/gyrokinetics/wikifiles/raw/HEAD/CMR/GS2_Final_report_NAG_Version_v1.0.pdf)
     !> for more details.
     real :: max_unbalanced_yxf = 0.0
     !> The number of processors to use in e_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_e_lo = 0
     !> The number of processors to use in g_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_g_lo = 0
     !> The number of processors to use in le_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_le_lo = 0
     !> The number of processors to use in lz_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_lz_lo = 0
     !> The number of processors to use in xxf_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_xxf_lo = 0
     !> The number of processors to use in yxf_lo layout. Capped to number of processors
     !> in comm world. If not set (<=0) defaults to global nproc.
     integer :: nproc_yxf_lo = 0
     !> Setting to true enables optimising redistribute code, used in
     !> FFTs for evaluating nonlinear terms, that avoids indirect
     !> addressing. This can introduces worthwhile savings in
     !> nonlinear GS2 simulations at lower core counts.  See [Adrian
     !> Jackson's DCSE
     !> report](https://bitbucket.org/gyrokinetics/wikifiles/raw/HEAD/CMR/GS2_Final_report_NAG_Version_v1.0.pdf)
     !> for more details.
     logical :: opt_local_copy = .false.
     !> Set to true to use non-blocking communications in the
     !> redistribute routines. This is generally more performant but
     !> has been observed to be slower in one or two rare cases.
     logical :: opt_redist_nbk = .true.
     !> Set to true to use persistent (non-blocking) communications in
     !> the redistribute routines. Requires `opt_redist_nbk` to be
     !> true. Can help improve scaling efficiency at large core
     !> counts.
     logical :: opt_redist_persist = .false.
     !> Set to true to try to overlap the mpi and local parts of the
     !> gather/scatter routines. Should only be used with
     !> `opt_redist_persist`. This is typically not seen to have any
     !> impact on performance. See [optimising your
     !> runs](https://bitbucket.org/gyrokinetics/gs2/wiki/Optimising_your_runs)
     !> for more details.
     logical :: opt_redist_persist_overlap = .false.
     !> When in `gf_lo`, if there are fewer points \(n\) than processors \(P\),
     !> then assign the points to the first \(n\) and leave the rest of the
     !> processors empty
     logical :: simple_gf_decomposition = .true.
     !> Setting to true allows GS2 to adopt a more flexible domain
     !> decomposition of the xxf data decomposition (used in nonlinear
     !> FFTs). By default GS2 allocates each MPI task with the same
     !> uniform blocksize in `xxf_lo`, one task may have a smaller
     !> block of data, and other tasks may be empty. There is no
     !> attempt to keep both x and y as local as possible, and
     !> sometimes large MPI data transfers are required to map from
     !> xxf to yxf and vice-versa during FFTs. With `unbalanced_xxf =
     !> .true.`, two slightly different blocksizes are chosen in order
     !> to keep both x and y as local as possible, and avoid this
     !> potentially large MPI communication overhead. The level of
     !> imbalance is limited by `max_unbalanced_xxf`.
     !>
     !> Note [[ingen]] can provide data on the imbalance and
     !> communication required.
     !>
     !> See [Adrian Jackson's DCSE
     !> report](https://bitbucket.org/gyrokinetics/wikifiles/raw/HEAD/CMR/GS2_Final_report_NAG_Version_v1.0.pdf)
     !> for more details.
     logical :: unbalanced_xxf = .false.
     !> Setting to true allows GS2 to adopt a more flexible domain
     !> decomposition of the yxf data decomposition (used in nonlinear
     !> FFTs). By default GS2 allocates each MPI task with the same
     !> uniform blocksize in yxf_lo, one task may have a smaller block
     !> of data, and other tasks may be empty. There is no attempt to
     !> keep both x and y as local as possible, and sometimes large
     !> MPI data transfers are required to map from xxf to yxf and
     !> vice-versa during FFTs. With `unbalanced_yxf = .true.`, two
     !> slightly different blocksizes are chosen in order to keep both
     !> x and y as local as possible, and avoid this potentially large
     !> MPI communication overhead. The level of imbalance is limited
     !> by `max_unbalanced_yxf`.
     !>
     !> Note [[ingen]] can provide data on the imbalance and
     !> communication required.
     !>
     !> See [Adrian Jackson's DCSE
     !> report](https://bitbucket.org/gyrokinetics/wikifiles/raw/HEAD/CMR/GS2_Final_report_NAG_Version_v1.0.pdf)
     !> for more details.
     logical :: unbalanced_yxf = .false.
   contains
     procedure, public :: read => read_layouts_config
     procedure, public :: write => write_layouts_config
     procedure, public :: reset => reset_layouts_config
     procedure, public :: broadcast => broadcast_layouts_config
     procedure, public, nopass :: get_default_name => get_default_name_layouts_config
     procedure, public, nopass :: get_default_requires_index => get_default_requires_index_layouts_config     
  end type layouts_config_type