ParallelMapConfigurations(task_function, configurations, processes_per_configuration, log_filenames=None, scf_restart_step_length=PhysicalQuantity(0.1, Ang), interleaved_split=False, master_only=True)¶
Applies a function on a list of configurations in parallel.
- task_function (function) – The function to apply to each configuration. It takes an AtomicConfiguration and the index of the configuration in the configurations array as its arguments. The function can, optionally, return data that will be saved and send to all processes upon return from ParallelMapConfigurations. Note: The return type needs to be able to be sent over MPIself. This currently includes most basic datatypes (e.g. numbers, lists, dicts, numpy arrays) as well as ATK configurations. A pickling error will be raised if the type is not picklable.
- configurations (list of configurations
DeviceConfiguration)) – list of configurations with attached calculators that is passed to the task_function.
- processes_per_configuration (int) – The number of MPI processes that will be used to call task_function for each configuration.
- log_filenames (list of strings) – A list of filenames to log each calculation to. It must be
the same length as the configurations argument. If None is
given, then all logging will be performed to stdout.
All calculations are logged to stdout
- interleaved_split (bool) – When True, the tasks are distributed amongst the processes
groups in an interleaved way. If False, the tasks are
distributed in a contiguous manner.
- master_only (bool) – Controls if the master process should be the only
rank allowed to write to the log.
A list containing the results of applying the task_function to each configuration.
Calculate the potential energy curve for a hydrogen molecule in parallel. This example uses TotalEnergy to calculate the potential energy. The
task_function function is passed to
ParallelMapConfigurations to perform each individual energy calculation. Each time
task_function is called, it calculates the total energy, saves the TotalEnergy analysis object to a file, and then returns the total energy.
ParallelMapConfigurations collects the returned energies into an array that is returned when the calculations complete. At the end of the script, a table of internuclear distances and energies is printed to the screen.
# Make a list to hold the configurations. configurations =  # Loop over a list of distances between 0.3 and 5.0 Angstrom. distances = numpy.linspace(0.3, 5.0, 20) for distance in distances: # Define elements elements = [Hydrogen, Hydrogen] # Define coordinates cartesian_coordinates = [[ 0.0, 0.0, 0.0 ], [ distance, 0.0, 0.0 ]]*Angstrom # Set up configuration molecule_configuration = MoleculeConfiguration( elements=elements, cartesian_coordinates=cartesian_coordinates ) # Define a calculator molecule_configuration.setCalculator(LCAOCalculator()) # Add the configuration to the list of configurations. configurations.append(molecule_configuration) def task_function(configuration, index): # Compute the total energy. total_energy = TotalEnergy(configuration) # Save the result to a file. nlsave('total_energy_%i.hdf5'%index, configuration) # Return the calculated total energy. return total_energy.evaluate() # Define a list of filenames to save the logging output from each calculation to. filenames = [ 'total_energy_%i.log' % i for i in range(len(configurations)) ] # Calculate the energy of each configuration. Each calculation will use 2 MPI processes. energies = ParallelMapConfigurations( task_function, configurations, processes_per_configuration=2, filenames=filenames, ) # Only print on the master process. This prevents the table from being printed multiple times. if processIsMaster(): print('%10s %12s' % ('distance', 'energy')) for i in range(len(configurations)): print('%10.4f %12.3e' % (distances[i], energies[i].inUnitsOf(eV)))
It is important to properly coordinate the total number of MPI processes, the
processes_per_configuration argument, and the number of configurations. When
ParallelMapConfigurations is calledm the MPI processes are divided up into groups of MPI processes no larger than
processes_per_configuration. For example, if there are 8 MPI processes and
processes_per_configuration=2, then 4 groups will be made. If there are 8 MPI Processes and
processes_per_configuration=3 then two groups of 3 process and one goup of 2 processes will be made.
The caller is responsible for choosing settings that guarantee that all groups of MPI processes handle approximately the same number of configurations. Otherwise, the parallel performance will not be ideal due to load balancing issues. This typically means that the total number of groups should divide evenly into the total number of configurations.
Ideally, one would pick
processes_per_configuration to be the largest number of processes that a single DFT calculation runs efficiently on. This generally depends on a number of variables including the number of atoms, basis set size, computer hardware, etc. Then, the total number of MPI processes should be an integer multiple of
This function can be used with ATK-ForceField calculators as well.
However, ATK-ForceField does not currently make use of MPI.
This means that
processes_per_configuration should always be set to 1 in order not
to have idle processes.
See also, Notes.