ParallelMapConfigurations

ParallelMapConfigurations(task_function, configurations, processes_per_configuration, log_filenames=None, scf_restart_step_length=PhysicalQuantity(0.1, Ang), interleaved_split=False, master_only=True)

Applies a function on a list of configurations in parallel.

Parameters:
  • task_function (function) – The function to apply to each configuration. It takes an AtomicConfiguration and the index of the configuration in the configurations array as its arguments. The function can, optionally, return data that will be saved and send to all processes upon return from ParallelMapConfigurations. Note: The return type needs to be able to be sent over MPIself. This currently includes most basic datatypes (e.g. numbers, lists, dicts, numpy arrays) as well as ATK configurations. A pickling error will be raised if the type is not picklable.
  • configurations (list of configurations (MoleculeConfiguration | BulkConfiguration | SurfaceConfiguration | DeviceConfiguration)) – list of configurations with attached calculators that is passed to the task_function.
  • processes_per_configuration (int) – The number of MPI processes that will be used to call task_function for each configuration.
  • log_filenames (list of strings) – A list of filenames to log each calculation to. It must be the same length as the configurations argument. If None is given, then all logging will be performed to stdout.
    Default: All calculations are logged to stdout
  • interleaved_split (bool) – When True, the tasks are distributed amongst the processes groups in an interleaved way. If False, the tasks are distributed in a contiguous manner.
    Default: False
  • master_only (bool) – Controls if the master process should be the only rank allowed to write to the log.
    Default: True
Returns:

A list containing the results of applying the task_function to each configuration.

Return type:

list

Usage Examples

Calculate the potential energy curve for a hydrogen molecule in parallel. This example uses TotalEnergy to calculate the potential energy. The task_function function is passed to ParallelMapConfigurations to perform each individual energy calculation. Each time task_function is called, it calculates the total energy, saves the TotalEnergy analysis object to a file, and then returns the total energy. ParallelMapConfigurations collects the returned energies into an array that is returned when the calculations complete. At the end of the script, a table of internuclear distances and energies is printed to the screen.

# Make a list to hold the configurations.
configurations = []

# Loop over a list of distances between 0.3 and 5.0 Angstrom.
distances = numpy.linspace(0.3, 5.0, 20)
for distance in distances:
    # Define elements
    elements = [Hydrogen, Hydrogen]

    # Define coordinates
    cartesian_coordinates = [[ 0.0, 0.0, 0.0 ],
                             [ distance, 0.0, 0.0 ]]*Angstrom

    # Set up configuration
    molecule_configuration = MoleculeConfiguration(
        elements=elements,
        cartesian_coordinates=cartesian_coordinates
        )

    # Define a calculator
    molecule_configuration.setCalculator(LCAOCalculator())

    # Add the configuration to the list of configurations.
    configurations.append(molecule_configuration)

def task_function(configuration, index):
    # Compute the total energy.
    total_energy = TotalEnergy(configuration)
    # Save the result to a file.
    nlsave('total_energy_%i.hdf5'%index, configuration)
    # Return the calculated total energy.
    return total_energy.evaluate()

# Define a list of filenames to save the logging output from each calculation to.
filenames = [ 'total_energy_%i.log' % i for i in range(len(configurations)) ]

# Calculate the energy of each configuration. Each calculation will use 2 MPI processes.
energies = ParallelMapConfigurations(
    task_function,
    configurations,
    processes_per_configuration=2,
    filenames=filenames,
)

# Only print on the master process. This prevents the table from being printed multiple times.
if processIsMaster():
    print('%10s %12s' % ('distance', 'energy'))
    for i in range(len(configurations)):
        print('%10.4f %12.3e' % (distances[i], energies[i].inUnitsOf(eV)))


parallelmapconfigurations.py

Notes

It is important to properly coordinate the total number of MPI processes, the processes_per_configuration argument, and the number of configurations. When ParallelMapConfigurations is calledm the MPI processes are divided up into groups of MPI processes no larger than processes_per_configuration. For example, if there are 8 MPI processes and processes_per_configuration=2, then 4 groups will be made. If there are 8 MPI Processes and processes_per_configuration=3 then two groups of 3 process and one goup of 2 processes will be made.

The caller is responsible for choosing settings that guarantee that all groups of MPI processes handle approximately the same number of configurations. Otherwise, the parallel performance will not be ideal due to load balancing issues. This typically means that the total number of groups should divide evenly into the total number of configurations.

Ideally, one would pick processes_per_configuration to be the largest number of processes that a single DFT calculation runs efficiently on. This generally depends on a number of variables including the number of atoms, basis set size, computer hardware, etc. Then, the total number of MPI processes should be an integer multiple of processes_per_configuration.

This function can be used with ATK-ForceField calculators as well. However, ATK-ForceField does not currently make use of MPI. This means that processes_per_configuration should always be set to 1 in order not to have idle processes.

See also, Notes.