LBFGS

class LBFGS(memory_size=20, finite_difference=PhysicalQuantity(0.01, Ang), line_search=True)

Constructor for the L-BFGS optimizer method.

Parameters:
  • memory_size (int) – The maximum number of previous steps that will be stored in memory in order to approximate the Hessian matrix.
    Default: 20
  • finite_difference (PhysicalQuantity with length units) – The finite different step length used to approximate the curvature along the initial descent direction.
    Default: 0.01*Angstrom
  • line_search (bool) – Controls if a line search is used.
    Default: True if the objective function supports line searches

Usage Examples

Optimize the geometry of a water molecule using the LBFGS algorithm.

# Define elements
elements = [Oxygen, Hydrogen, Hydrogen]

# Define coordinates
cartesian_coordinates = [[  0.0,  -1.70000000e-05,   1.20198000e-01],
                         [  0.0,   7.59572000e-01,  -4.86714000e-01],
                         [  0.0,  -7.59606000e-01,  -4.86721000e-01]]*Angstrom

# Set up configuration
molecule_configuration = MoleculeConfiguration(
    elements=elements,
    cartesian_coordinates=cartesian_coordinates
    )

# Define a calculator
molecule_configuration.setCalculator(LCAOCalculator())

# Perform optimization using the LBFGS algorithm.
OptimizeGeometry(molecule_configuration, optimizer_method=LBFGS())


lbfgs.py

Notes

LBFGS is the recommended optimizer to use in QuantumATK. It should have superior performance to the FIRE optimizer for nearly every optimization problem.

While there is not an official steepest descent optimizer in QuantumATK, the LBFGS optimizer can be used as a steepest descent optimizer if the memory_size is set to zero.

Algorithm Details

The value of the objective function is not explicitly minimized (unless a line search is used), instead, the optimizer minimizes the forces in the system to find a stationary point. This is important because it allows for geometry optimizations, minimizations of nudged elastic bands (using the projected forces), and saddle point optimization (using min-mode projected forces) to be performed from one general optimizer.

When the forces are minimized directly (e.g. in a nudged elastic band optimization), it is not possible to use a line-search algorithm to determine the step size. Instead the curvature along the current step direction is estimated and the step length is chosen to minimize a one-dimensional quadratic system.

Without the aid of a line-search, special care must be taken to ensure that the approximate Hessian remains positive definite (all positive eigenvalues) or the optimization will not converge to a minimum. If a step is taken that would include a negative curvature into the memory, the entire history is reset and a gradient descent step is taken along the force.

The memory is also reset in the case that a step size larger than the maximum step length is proposed. This criterion indicates that the current coordinates are far away from a stationary point. Due to this, the algorithm behaves as the gradient descent method in the non-quadratic region of the potential energy surface (i.e. far from stationary points). This is the ideal behavior because attempting to minimize the approximate quadratic problem when the underlying true Hessian matrix is rapidly changing gives poor results.

Some objective functions support line searches. When possible, a back-tracking line search will be used to ensure reduction in the function value at each step. The step size typically results in a reduction in the function value. So it is only when a “bad” step is taken (i.e. one that increases the function value) that the back-tracking line search is used. There is a hard maximum on the number of steps the line search algorithm will take to prevent getting “stuck”. In the case that a decrease in function value can not be found the line search will give up and take a steepest descent step and reset the L-BFGS memory.

Details about this algorithm can be found in [LN89].

[LN89]D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528, 1989. doi:10.1007/BF01589116.