Main Content

estimateMAP

Class: HamiltonianSampler

Estimate maximum of log probability density

Syntax

xhat = estimateMAP(smp)
[xhat,fitinfo] = estimateMAP(smp)
[xhat,fitinfo] = estimateMAP(___,Name,Value)

Description

xhat = estimateMAP(smp) returns the maximum-a-posteriori (MAP) estimate of the log probability density of the Monte Carlo sampler smp.

[xhat,fitinfo] = estimateMAP(smp) returns additional fitting information in fitinfo.

[xhat,fitinfo] = estimateMAP(___,Name,Value) specifies additional options using one or more name-value pair arguments. Specify name-value pair arguments after all other input arguments.

Input Arguments

expand all

Hamiltonian Monte Carlo sampler, specified as a HamiltonianSampler object.

estimateMAP estimates the maximum of the log probability density specified in smp.LogPDF.

Use the hmcSampler function to create a sampler.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'IterationLimit',100,'StepTolerance',1e-5 estimates the MAP point using an iteration limit of 100 and a step size convergence tolerance of 1e-5.

Initial point to start optimization from, specified as a numeric column vector with the same number of elements as the StartPoint property of the sampler smp.

Example: 'StartPoint',randn(size(smp.StartPoint))

Maximum number of optimization iterations, specified as a positive integer.

Example: 'IterationLimit',100

Verbosity level of Command Window output during function maximization, specified as 0 or a positive integer.

  • With the value set to 0, estimateMAP displays no details on the optimization.

  • With the value set to a positive integer, estimateMAP displays convergence information at each iteration.

Convergence Information

HeadingMeaning
FUN VALUEObjective function value.
NORM GRADNorm of the gradient of the objective function.
NORM STEPNorm of the iterative step, meaning the distance between the previous point and the current point.
CURVOK means the weak Wolfe condition is satisfied. This condition is a combination of sufficient decrease of the objective function and a curvature condition.
GAMMAInner product of the step times the gradient difference, divided by the inner product of the gradient difference with itself. The gradient difference is the gradient at the current point minus the gradient at the previous point. Gives diagnostic information on the objective function curvature.
ALPHAStep direction multiplier, which differs from 1 when the algorithm performed a line search.
ACCEPTYES means the algorithm found an acceptable step to take.

Example: 'VerbosityLevel',1

Relative gradient convergence tolerance, specified as a positive scalar.

Let tau = max(1,min(abs(f),infnormg0)), where f is the current objective function value and infnormg0 is the initial gradient norm. If the norm of the objective function gradient is smaller than tau times the 'GradientTolerance' value, then the maximization is considered to have converged to a local optimum.

Example: 'GradientTolerance',1e-4

Step size convergence tolerance, specified as a positive scalar.

If the proposed step size is smaller than the 'StepTolerance' value, then the maximization is considered to have converged to a local optimum.

Example: 'StepTolerance',1e-5

Output Arguments

expand all

MAP point estimate, returned as a numeric vector of the same size as smp.StartPoint.

Fitting information for the MAP computation, returned as a structure with these fields:

Field

Description

Iteration

Iteration indices from 0 through the final iteration.

Objective

Negative log probability density at each iteration. The MAP point is computed by minimizing the negative log probability density. You can check that the final values are all similar, indicating that the function optimization has converged to a local optimum.

Gradient

Gradient of the negative log probability density at the final iteration.

Data Types: struct

Examples

expand all

Create a Hamiltonian Monte Carlo sampler for a normal distribution and estimate the maximum-a-posteriori (MAP) point of the log probability density.

First, save a function normalDistGrad on the MATLAB® path that returns the multivariate normal log probability density and its gradient (normalDistGrad is defined at the end of this example). Then, call the function with arguments to define the logpdf input argument to the hmcSampler function.

means = [1;-1];
standevs = [1;0.3];
logpdf = @(theta)normalDistGrad(theta,means,standevs);

Choose a starting point and create the HMC sampler.

startpoint = zeros(2,1);
smp = hmcSampler(logpdf,startpoint);

Estimate the MAP point (the point where the probability density has its maximum). Show more information during optimization by setting the 'VerbosityLevel' value to 1.

[xhat,fitinfo] = estimateMAP(smp,'VerbosityLevel',1);
 o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|====================================================================================================|
|   ITER   |   FUN VALUE   |  NORM GRAD  |  NORM STEP  |  CURV  |    GAMMA    |    ALPHA    | ACCEPT |
|====================================================================================================|
|        0 |  6.689460e+00 |   1.111e+01 |   0.000e+00 |        |   9.000e-03 |   0.000e+00 |   YES  |
|        1 |  4.671622e+00 |   8.889e+00 |   2.008e-01 |    OK  |   9.006e-02 |   2.000e+00 |   YES  |
|        2 |  9.759850e-01 |   8.268e-01 |   8.215e-01 |    OK  |   9.027e-02 |   1.000e+00 |   YES  |
|        3 |  9.158025e-01 |   7.496e-01 |   7.748e-02 |    OK  |   5.910e-01 |   1.000e+00 |   YES  |
|        4 |  6.339508e-01 |   3.104e-02 |   7.472e-01 |    OK  |   9.796e-01 |   1.000e+00 |   YES  |
|        5 |  6.339043e-01 |   3.668e-05 |   3.762e-03 |    OK  |   9.599e-02 |   1.000e+00 |   YES  |
|        6 |  6.339043e-01 |   2.488e-08 |   3.333e-06 |    OK  |   9.015e-02 |   1.000e+00 |   YES  |

         Infinity norm of the final gradient = 2.488e-08
              Two norm of the final step     = 3.333e-06, TolX   = 1.000e-06
Relative infinity norm of the final gradient = 2.488e-08, TolFun = 1.000e-06
EXIT: Local minimum found.

To further check that the optimization has converged to a local minimum, plot the fitinfo.Objective field. This field contains the values of the negative log density at each iteration of the function optimization. The final values are all very similar, so the optimization has converged.

fitinfo
fitinfo = struct with fields:
    Iteration: [7x1 double]
    Objective: [7x1 double]
     Gradient: [2x1 double]

plot(fitinfo.Iteration,fitinfo.Objective,'ro-');
xlabel('Iteration');
ylabel('Negative log density');

Display the MAP estimate. It is indeed equal to the means variable, which is the exact maximum.

xhat
xhat = 2×1

    1.0000
   -1.0000

means
means = 2×1

     1
    -1

The normalDistGrad function returns the logarithm of the multivariate normal probability density with means in Mu and standard deviations in Sigma, specified as scalars or columns vectors the same length as startpoint. The second output argument is the corresponding gradient.

function [lpdf,glpdf] = normalDistGrad(X,Mu,Sigma)
Z = (X - Mu)./Sigma;
lpdf = sum(-log(Sigma) - .5*log(2*pi) - .5*(Z.^2));
glpdf = -Z./Sigma;
end

Tips

  • First create a Hamiltonian Monte Carlo sampler using the hmcSampler function, and then use estimateMAP to estimate the MAP point.

  • After creating an HMC sampler, you can tune the sampler, draw samples, and check convergence diagnostics using the other methods of the HamiltonianSampler class. Using the MAP estimate as a starting point in the tuneSampler and drawSamles methods can lead to more efficient tuning and sampling. For an example of this workflow, see Bayesian Linear Regression Using Hamiltonian Monte Carlo.

Algorithms

  • estimateMAP uses a limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton optimizer to search for the maximum of the log probability density. See Nocedal and Wright [1].

References

[1] Nocedal, J. and S. J. Wright. Numerical Optimization, Second Edition. Springer Series in Operations Research, Springer Verlag, 2006.

Version History

Introduced in R2017a