Main Content

lbfgsState

State of limited-memory BFGS (L-BFGS) solver

Since R2023a

    Description

    An lbfgsState object stores information about steps in the L-BFGS algorithm.

    The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Use the L-BFGS algorithm for small networks and data sets that you can process in a single batch.

    Use lbfgsState objects in conjunction with the lbfgsupdate function to train a neural network using the L-BFGS algorithm.

    Creation

    Description

    solverState = lbfgsState creates an L-BFGS state object with a history size of 10 and an initial inverse Hessian factor of 1.

    example

    solverState = lbfgsState(Name=Value) sets the HistorySize and InitialInverseHessianFactor properties using one or more name-value arguments.

    example

    Properties

    expand all

    L-BFGS State

    Number of state updates to store, specified as a positive integer. Values between 3 and 20 suit most tasks.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    After creating the lbfgsState object, this property is read-only.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    This property is read-only.

    Initial value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

    To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation Bkm1λkI, where m is the history size, the inverse Hessian factor λk is a scalar, and I is the identity matrix. The algorithm then stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

    The initial inverse hessian factor is the value of λ0.

    For more information, see Limited-Memory BFGS.

    After creating the lbfgsState object, this property is read-only.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

    To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation Bkm1λkI, where m is the history size, the inverse Hessian factor λk is a scalar, and I is the identity matrix. The algorithm then stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

    For more information, see Limited-Memory BFGS.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Since R2023b

    This property is read-only.

    Norm of the initial gradients, specified as a dlarray scalar or [].

    If the state object is the output of the lbfgsupdate function, then InitialGradientsNorm is the first value that the GradientsNorm property takes. Otherwise, InitialGradientsNorm is [].

    Since R2024b

    Initial step size, specified as one of these values:

    • [] — Do not use an initial step size to determine the initial Hessian approximation.

    • "auto" — Determine the initial step size automatically. The software uses an initial step size of s0=12W0+0.1, where W0 are the initial learnable parameters of the network.

    • Positive real scalar — Use the specified value as the initial step size s0.

    If InitialStepSize is "auto" or a positive real scalar, then the software approximates the initial inverse Hessian using λ0=s0J(W0), where λ0 is the initial inverse Hessian factor and J(W0) denotes the gradients of the loss with respect to the initial learnable parameters. For more information, see Limited-Memory BFGS.

    Step history, specified as a cell array.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    Data Types: cell

    Gradients difference history, specified as a cell array.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    Data Types: cell

    History indices, specified as a row vector.

    HistoryIndices is a 1-by-HistorySize vector, where StepHistory(i) and GradientsDifferenceHistory(i) correspond to iteration HistoryIndices(i).

    For more information, see Limited-Memory BFGS.

    Data Types: double

    Iteration Information

    This property is read-only.

    Loss, specified as a dlarray scalar, a numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then Loss is the first output of the loss function that you pass to the lbfgsupdate function. Otherwise, Loss is [].

    This property is read-only.

    Gradients, specified as a dlarray object, a numeric array, a cell array, a structure, a table, or [].

    If the state object is the output of the lbfgsupdate function, then Gradients is the second output of the loss function that you pass to the lbfgsupdate function. Otherwise, Gradients is [].

    This property is read-only.

    Additional loss function outputs, specified as a cell array.

    If the state object is the output of the lbfgsupdate function, then AdditionalLossFunctionOutputs is a cell array containing additional outputs of the loss function that you pass to the lbfgsupdate function. Otherwise, AdditionalLossFunctionOutputs is a 1-by-0 cell array.

    Data Types: cell

    This property is read-only.

    Norm of the step, specified as a dlarray scalar, numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then StepNorm is the norm of the step that the lbfgsupdate function calculates. Otherwise, StepNorm is [].

    This property is read-only.

    Norm of the gradients, specified as a dlarray scalar, a numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then GradientsNorm is the norm of the second output of the loss function that you pass to the lbfgsupdate function. Otherwise, GradientsNorm is [].

    This property is read-only.

    Status of the line search algorithm, specified as "", "completed", or "failed".

    If the state object is the output of the lbfgsupdate function, then LineSearchStatus is one of these values:

    • "completed" — The algorithm finds a learning rate that satisfies the LineSearchMethod and MaxNumLineSearchIterations options that the lbfgsupdate function uses.

    • "failed" — The algorithm fails to find a learning rate that satisfies the LineSearchMethod and MaxNumLineSearchIterations options that the lbfgsupdate function uses.

    Otherwise, LineSearchStatus is "".

    This property is read-only.

    Method solver uses to find a suitable learning rate, specified as "weak-wolfe", "strong-wolfe", "backtracking", or "".

    If the state object is the output of the lbfgsupdate function, then LineSearchMethod is the line search method that the lbfgsupdate function uses. Otherwise, LineSearchMethod is "".

    This property is read-only.

    Maximum number of line search iterations, specified as a nonnegative integer.

    If the state object is the output of the lbfgsupdate function, then MaxNumLineSearchIterations is the maximum number of line search iterations that the lbfgsupdate function uses. Otherwise, MaxNumLineSearchIterations is 0.

    Data Types: double

    Examples

    collapse all

    Create an L-BFGS solver state object.

    solverState = lbfgsState
    solverState = 
      LBFGSState with properties:
    
                 InverseHessianFactor: 1
                          StepHistory: {}
           GradientsDifferenceHistory: {}
                       HistoryIndices: [1x0 double]
    
       Iteration Information
                                 Loss: []
                            Gradients: []
        AdditionalLossFunctionOutputs: {1x0 cell}
                        GradientsNorm: []
                             StepNorm: []
                     LineSearchStatus: ""
    
      Show all properties
    
    

    Read the transmission casing data from the CSV file "transmissionCasingData.csv".

    filename = "transmissionCasingData.csv";
    tbl = readtable(filename,TextType="String");

    Convert the labels for prediction to categorical using the convertvars function.

    labelName = "GearToothCondition";
    tbl = convertvars(tbl,labelName,"categorical");

    To train a network using categorical features, convert the categorical predictors to categorical using the convertvars function by specifying a string array containing the names of all the categorical input variables.

    categoricalPredictorNames = ["SensorCondition" "ShaftCondition"];
    tbl = convertvars(tbl,categoricalPredictorNames,"categorical");

    Loop over the categorical input variables. For each variable, convert the categorical values to one-hot encoded vectors using the onehotencode function.

    for i = 1:numel(categoricalPredictorNames)
        name = categoricalPredictorNames(i);
        tbl.(name) = onehotencode(tbl.(name),2);
    end

    View the first few rows of the table.

    head(tbl)
        SigMean     SigMedian    SigRMS    SigVar     SigPeak    SigPeak2Peak    SigSkewness    SigKurtosis    SigCrestFactor    SigMAD     SigRangeCumSum    SigCorrDimension    SigApproxEntropy    SigLyapExponent    PeakFreq    HighFreqPower    EnvPower    PeakSpecKurtosis    SensorCondition    ShaftCondition    GearToothCondition
        ________    _________    ______    _______    _______    ____________    ___________    ___________    ______________    _______    ______________    ________________    ________________    _______________    ________    _____________    ________    ________________    _______________    ______________    __________________
    
        -0.94876     -0.9722     1.3726    0.98387    0.81571       3.6314        -0.041525       2.2666           2.0514         0.8081        28562              1.1429             0.031581            79.931            0          6.75e-06       3.23e-07         162.13             0    1             1    0          No Tooth Fault  
        -0.97537    -0.98958     1.3937    0.99105    0.81571       3.6314        -0.023777       2.2598           2.0203        0.81017        29418              1.1362             0.037835            70.325            0          5.08e-08       9.16e-08         226.12             0    1             1    0          No Tooth Fault  
          1.0502      1.0267     1.4449    0.98491     2.8157       3.6314         -0.04162       2.2658           1.9487        0.80853        31710              1.1479             0.031565            125.19            0          6.74e-06       2.85e-07         162.13             0    1             0    1          No Tooth Fault  
          1.0227      1.0045     1.4288    0.99553     2.8157       3.6314        -0.016356       2.2483           1.9707        0.81324        30984              1.1472             0.032088             112.5            0          4.99e-06        2.4e-07         162.13             0    1             0    1          No Tooth Fault  
          1.0123      1.0024     1.4202    0.99233     2.8157       3.6314        -0.014701       2.2542           1.9826        0.81156        30661              1.1469              0.03287            108.86            0          3.62e-06       2.28e-07         230.39             0    1             0    1          No Tooth Fault  
          1.0275      1.0102     1.4338     1.0001     2.8157       3.6314         -0.02659       2.2439           1.9638        0.81589        31102              1.0985             0.033427            64.576            0          2.55e-06       1.65e-07         230.39             0    1             0    1          No Tooth Fault  
          1.0464      1.0275     1.4477     1.0011     2.8157       3.6314        -0.042849       2.2455           1.9449        0.81595        31665              1.1417             0.034159            98.838            0          1.73e-06       1.55e-07         230.39             0    1             0    1          No Tooth Fault  
          1.0459      1.0257     1.4402    0.98047     2.8157       3.6314        -0.035405       2.2757            1.955        0.80583        31554              1.1345               0.0353            44.223            0          1.11e-06       1.39e-07         230.39             0    1             0    1          No Tooth Fault  
    

    Extract the training data.

    predictorNames = ["SigMean" "SigMedian" "SigRMS" "SigVar" "SigPeak" "SigPeak2Peak" ...
        "SigSkewness" "SigKurtosis" "SigCrestFactor" "SigMAD" "SigRangeCumSum" ...
        "SigCorrDimension" "SigApproxEntropy" "SigLyapExponent" "PeakFreq" ...
        "HighFreqPower" "EnvPower" "PeakSpecKurtosis" "SensorCondition" "ShaftCondition"];
    XTrain = table2array(tbl(:,predictorNames));
    numInputFeatures = size(XTrain,2);

    Extract the targets and convert them to one-hot encoded vectors.

    TTrain = tbl.(labelName);
    TTrain = onehotencode(TTrain,2);
    numClasses = size(TTrain,2);

    Convert the predictors and targets to dlarray objects with format "BC" (batch, channel).

    XTrain = dlarray(XTrain,"BC");
    TTrain = dlarray(TTrain,"BC");

    Define the network architecture.

    numHiddenUnits = 32;
    
    layers = [
        featureInputLayer(numInputFeatures)
        fullyConnectedLayer(16)
        layerNormalizationLayer
        reluLayer
        fullyConnectedLayer(numClasses)
        softmaxLayer];
    
    net = dlnetwork(layers);

    Define the modelLoss function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

    The lbfgsupdate function requires a loss function with the syntax [loss,gradients] = f(net). Create a variable that parameterizes the evaluated modelLoss function to take a single input argument.

    lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);

    Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.

    solverState = lbfgsState( ...
        HistorySize=3, ...
        InitialInverseHessianFactor=1.1);

    Train the network a maximum of 200 iterations. Stop training early when the norm of the gradients or steps are smaller than 0.00001. Print the training loss every 10 iterations.

    maxIterations = 200;
    gradientTolerance = 1e-5;
    stepTolerance = 1e-5;
    
    iteration = 0;
    
    while iteration < maxIterations
        iteration = iteration + 1;
        [net, solverState] = lbfgsupdate(net,lossFcn,solverState);
    
        if iteration==1 || mod(iteration,10)==0
            fprintf("Iteration %d: Loss: %d\n",iteration,solverState.Loss);
        end
    
        if solverState.GradientsNorm < gradientTolerance || ...
                solverState.StepNorm < stepTolerance || ...
                solverState.LineSearchStatus == "failed"
            break
        end
    end
    Iteration 1: Loss: 9.343236e-01
    Iteration 10: Loss: 4.721475e-01
    Iteration 20: Loss: 4.678575e-01
    Iteration 30: Loss: 4.666964e-01
    Iteration 40: Loss: 4.665921e-01
    Iteration 50: Loss: 4.663871e-01
    Iteration 60: Loss: 4.662519e-01
    Iteration 70: Loss: 4.660451e-01
    Iteration 80: Loss: 4.645303e-01
    Iteration 90: Loss: 4.591753e-01
    Iteration 100: Loss: 4.562556e-01
    Iteration 110: Loss: 4.531167e-01
    Iteration 120: Loss: 4.489444e-01
    Iteration 130: Loss: 4.392228e-01
    Iteration 140: Loss: 4.347853e-01
    Iteration 150: Loss: 4.341757e-01
    Iteration 160: Loss: 4.325102e-01
    Iteration 170: Loss: 4.321948e-01
    Iteration 180: Loss: 4.318990e-01
    Iteration 190: Loss: 4.313784e-01
    Iteration 200: Loss: 4.311314e-01
    

    Model Loss Function

    The modelLoss function takes as input a neural network net, input data X, and targets T. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

    function [loss, gradients] = modelLoss(net, X, T)
    
    Y = forward(net,X);
    loss = crossentropy(Y,T);
    gradients = dlgradient(loss,net.Learnables);
    
    end

    Algorithms

    expand all

    References

    [1] Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.

    Version History

    Introduced in R2023a

    expand all

    Go to top of page