CrystalStructurePrediction

class CrystalStructurePrediction(initial_population, number_of_generations, selection_pressure=2.0, number_of_elites=None, number_to_promote=1, heredity_probability=0.6, permutation_probability=0.0, mutation_probability=0.4, sigma_lattice=0.7, max_forces=PhysicalQuantity(0.01, eV/Ang), max_stress=PhysicalQuantity(0.1, GPa), max_steps=1000, max_step_length=PhysicalQuantity(0.2, Ang), external_pressure=None, rng=None, write_population=True, population_format=None, log_filename_prefix='generation_', permutation_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.PermutationOperator'>, heredity_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.HeredityOperator'>, mutation_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.MutationOperator'>)

Function to perform a crystal sturcture search by applying a genetic algorithm on a given initial population.

Parameters:
  • initial_population (list of individuals) – A list of initial BulkConfigurations
  • number_of_generations (int) – The number of times the population should be evolved.
  • selection_pressure (float) – The pressure applied during the selection of the parents from the population. The higher the pressure the less likely it is to pick an individual with lower fitness.
    Default: 2.0
  • number_of_elites (int) – Only consider the number_of_elites fittest individuals when making the selection.
    Default: Consider all individuals during selection.
  • number_to_promote (int) – The number of most fittest individuals to promote to the next generation.
    Default: 1
  • heredity_probability (float) – The unnormalized probabilty of using the heredity operator.
    Default: 0.6
  • permutation_probability (float) – The unnormalized probabilty of using the permutation operator. The permutation operator will only be used when there is more than one type of an element in the system.
    Default: 0.0
  • mutation_probability (float) – The probabilty of using the mutation operator.
    Default: 0.4
  • sigma_lattice (float) – The standard deviation of the random strain applied by the mutation operator.
    Default: 0.7
  • max_forces (PhysicalQuantity in units of force) – The maximum forces determining when the optimization should stop.
    Default: 0.01*eV/Ang
  • max_stress (PhysicalQuantity in units of pressure) – The maximum stress determining when the optimization should stop.
    Default: 0.1*GPa
  • max_steps (uint) – The maximum number of steps for local optimization.
    Default: 1000
  • max_step_length (PhysicalQuantity in units of length) – The maximum step length the local optimizer may take.
    Default: 0.2*Ang
  • external_pressure (PhysicalQuantity in units of pressure) – The target external pressure of the system.
    Default: 0*GPa
  • rng (numpy.random.RandomState) – The random number generated used to generate the random number stream.
    Default: A new random number generator (seeded from OS entropy pool)
  • write_population (bool) – Enables saving the configuration of each individual configuration during the crystal structure prediction.
    Default: True
  • population_format (str) – Sets the storage format for populations.
    Default: hdf5
  • log_filename_prefix (str) – The filename prefix to use for logging the individual optimizations.
    Default: generation_
evolvePopulation(number_of_generations)

Evolves the current population for number_of_generations generations.

Parameters:number_of_generations (int) – The number of times the genetic algorithm is applied to the population.
fitnesses()

Return the fitnesses of the current population. Sorted by descending fitness.

Returns:a list of fitnesses
Return type:list
population()

Return the current population.

Returns:a population
Return type:list of individuals
symmetries()

Return the international symbol for each individual of the current population. Sorted by descending fitness.

Returns:a list of international symbols
Return type:list of str

Usage Examples

Determine the global minimum energy crystal structure for TiO2.

# -------------------------------------------------------------
# Initial Population
# -------------------------------------------------------------

initial_population = generateInitialPopulation(
    elements=elements,
    population_size=10,
    calculator=calculator,
    volume=206.29*Angstrom**3,
    max_forces=0.01*eV/Angstrom,
    max_stress=0.1*GPa,
    max_steps=1000,
    max_step_length=0.2*Angstrom,
    external_pressure=0.0*GPa,
    rng=rng,
    log_filename_prefix="initial_population_",
    )

# -------------------------------------------------------------
# Crystal Structure Prediction
# -------------------------------------------------------------

global_optimization = CrystalStructurePrediction(
    initial_population,
    number_of_generations=16,
    heredity_probability=0.85,
    mutation_probability=0.1,
    permutation_probability=0.05,
    number_to_promote=2,
    number_of_elites=int(0.6*population_size),
    selection_pressure=3.0,
    population_format='hdf5',
    max_forces=0.005 * eV/Angstrom,
    )

csp_TiO2.py

Notes

The goal of crystal structure prediction is to explore the state space of a compound and locate the structure with the maximum fitness for a given chemical composition. In QuantumATK, the fitness function is set to the negative enthalpy, i.e. the algorithm converges to the structure with the minimal enthalpy (or energy if no external pressure is applied). The primary input to the calculation is the chemical formula, which determines the number of atoms in the unit cell. It is important that enough atoms are present in the unit cell to represent the crystal structure. Just because the correct stoichiometric ratio is used, does not ensure that it is possible to represent the minimum energy structure.

Algorithm Details

The crystal structure prediction algorithm in QuantumATK uses a genetic algorithm. The genetic algorithm begins with an initial population of structures provided by the user. The initial population can be generated using the generateInitialPopulation function. Each step of the algorithm involves generating the next population of structures from the previous one. This evolution process is done in the following manner:

  1. Promote the number_to_promote fittest individuals to the next generation.
  2. Loop until there are population_size individuals in the new generation:
  1. Pick which genetic operation should be performed. Each operation is given a user defined probability of being selected.
  2. Select one (mutation and permutation operator) or two parent structures (heredity operator). The selection probability is weighted by the fitness of the parental structures.
  3. Execute the genetic operation and create a new structure.
  1. Perform a local optimization of each individual of the population.

There are four types of genetic operators available:

  • Promotion: this operation is always performed at the beginning of the evolution process to ensure that the best number_to_promote individuals are kept.
  • Mutation: perturb the lattice vectors and atomic positions of the chosen configuration randomly.
  • Permutation: swap the position of atoms of differing elements.
  • Heredity: combine two parental structures to a new one. A random cut plane is used to divide the atoms of the two parent structures; all atoms below the plane in one structure are combined with the atoms above the plane in the other structure to create a new “child” structure. Care is taken to ensure that the total number of each element is not changed.

The algorithm stops after evolving the population number_of_generations times. Given the stochastic nature of the algorithm, this stopping criterion does not ensure that one has found the global solution. The stability of the search should be checked at least with respect to the chosen seed of the pseudo random number generator (rng), the initial_population and number_of_generations.

After generating a new population, each individual of the population is away from a local minimum in enthalpy, hence a geometry optimization must be performed to converge to a close local minimum. Since the global minimum is also a local minimum, one is guaranteed to find it for sufficiently large numbers of number_of_generations. These geometry (local) optimizations comprise nearly all of the computational work during crystal structure prediction.

Note

One key difference between a genetic algorithm and a random search is that the selection of the parent structure(s), on which the genetic operations are performed, is biased by the fitness of that structure. The higher the fitness (i.e. the lower the calculated enthalpy in this implementation), the more likely it is to pick this structure. The skewness of the probability distribution is altered by selection_pressure. A large value of selection_pressure makes it less likely to select a configuration with a lower fitness.

Restart support

A crystal structure prediction run can be restarted / resumed from a previous point in the following manner:

CSP = CrystalStructurePrediction(
    ...
    )

# Save the current state of the CSP object.
nlsave('my_file.hdf5', CSP, object_id='my_run')

...

# Read back a stored CSP object. The last population is restored
# and all given properties / attributes are set up.
CSP = nlread('my_file.hdf5', object_id='my_run')

# Run 10 more iterations of the genetic algorithm.
CSP.evolvePopulation(10)

# Run 5 more iterations of the genetic algorithm.
CSP.evolvePopulation(5)

Analyzing Results

If write_population is True then after each step of the genetic algorithm the current population will be written to disk. The filename will start with log_filename_prefix and include the current generation number. For example, the initial population will be saved as generation_0.hdf5 and after the first step a new file generation_1.hdf5 will be saved. The file format can be either nc for NetCDF or hdf5 for HDF5, by default HDF5 is used. The crystal structures contained in these files can be viewed in QuantumATK using the Viewer.