Main Content

## Parallel Bayesian Optimization

### Optimize in Parallel

Running Bayesian optimization in parallel can save time. Running in parallel requires Parallel Computing Toolbox™. `bayesopt` performs parallel objective function evaluations concurrently on parallel workers.

To optimize in parallel:

• `bayesopt` — Set the `UseParallel` name-value pair to `true`. For example,

`results = bayesopt(fun,vars,'UseParallel',true);`
• Fit functions — Set the `UseParallel` field of the `HyperparameterOptimizationOptions` structure to `true`. For example,

```Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('UseParallel',true))```

### Parallel Bayesian Algorithm

The parallel Bayesian optimization algorithm is similar to the serial algorithm, which is described in Bayesian Optimization Algorithm. The differences are:

• `bayesopt` assigns points to evaluate to the parallel workers, generally one point at a time. `bayesopt` calculates on the client to determine which point to assign.

• After `bayesopt` evaluates the initial random points, it chooses points to evaluate by fitting a Gaussian process (GP) model. To fit a GP model while some workers are still evaluating points, `bayesopt` imputes a value to each point that is still on a worker. The imputed value is the mean of the GP model value at the points it is evaluating, or some other value as specified by the `bayesopt` ``` 'ParallelMethod' ``` name-value pair. For parallel optimization of fit functions, `bayesopt` uses the default `ParallelMethod` imputed value.

• After `bayesopt` assigns a point to evaluate, and before it computes a new point to assign, it checks whether too many workers are idle. The threshold for active workers is determined by the `MinWorkerUtilization` name-value pair. If too many workers are idle, then `bayesopt` assigns random points, chosen uniformly within bounds, to all idle workers. This step causes the workers to be active more quickly, but the workers have random points rather than fitted points. If the number of idle workers does not exceed the threshold, then `bayesopt` chooses a point to evaluate as usual, by fitting a GP model and maximizing the acquisition function.

Note

Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results.

### Settings for Best Parallel Performance

Fit functions have no special settings for better parallel performance. In contrast, several `bayesopt` settings can help to speed an optimization.

#### Solver Options

Setting the `GPActiveSetSize` option to a smaller value than the default (`300`) can speed the solution. The cost is potential inaccuracy in the points that `bayesopt` chooses to evaluate, because the GP model of the objective function can be less accurate than with a larger value. Setting the option to a larger value can result in a more accurate GP model, but requires more time to create the model.

Setting the `ParallelMethod` option to `'max-observed'` can lead `bayesopt` to search more widely for a global optimum. This choice can lead to a better solution in less time. However, the default value of `'clipped-model-prediction'` is often best.

Setting the `MinWorkerUtilization` option to a large value can result in higher parallel utilization. However, this setting causes more completely random points to be evaluated, which can lead to less accurate solutions. A large value, in this context, depends on how many workers you have. The default is `floor(0.8*N)`, where `N` is the number of parallel workers. Setting the option to a lower value can give lower parallel utilization, but with the benefit of higher quality points.

#### Placing the Objective Function on Workers

You can place an objective function on the parallel workers in one of three ways. Some have better performance, but require a more complex setup.

1. Automatic If you give a function handle as the objective function, `bayesopt` sends the handle to all the parallel workers at the beginning of its run. For example,

```load ionosphere splits = optimizableVariable('splits',[1,100],'Type','integer'); minleaf = optimizableVariable('minleaf',[1,100],'Type','integer'); fun = @(params)kfoldLoss(fitctree(X,Y,'Kfold',5,... 'MaxNumSplits',params.splits,'MinLeaf',params.minleaf)); results = bayesopt(fun,[splits,minleaf],'UseParallel',true);```

This method is effective if the handle is small, or if you run the optimization only once. However, if you plan to run the optimization several times, you can save time by using one of the other two techniques.

2. Parallel constant If you plan to run an optimization several times, save time by transferring the objective function to the workers only once. This technique is especially effective when the function handle incorporates a large amount of data. Transfer the objective once by setting the function handle to a `parallel.pool.Constant` (Parallel Computing Toolbox) construct, as in this example.

```load ionosphere splits = optimizableVariable('splits',[1,100],'Type','integer'); minleaf = optimizableVariable('minleaf',[1,100],'Type','integer'); fun = @(params)kfoldLoss(fitctree(X,Y,'Kfold',5,... 'MaxNumSplits',params.splits,'MinLeaf',params.minleaf)); C = copyFunctionHandleToWorkers(fun); results1 = bayesopt(C,[splits,minleaf],'UseParallel',true); results2 = bayesopt(C,[splits,minleaf],'UseParallel',true,... 'MaxObjectiveEvaluations',50); results3 = bayesopt(C,[splits,minleaf],'UseParallel',true,... 'AcquisitionFunction','expected-improvement');```

In this example, `copyFunctionHandleToWorkers` sends the function handle to the workers only once.

3. Create objective function on workers If you have a great deal of data to send to the workers, you can avoid loading the data in the client by using `spmd` (Parallel Computing Toolbox) to load the data on the workers. Use a `Composite` (Parallel Computing Toolbox) with `parallel.pool.Constant` to access the distributed objective functions.

```% makeFun is at the end of this script spmd fun = makeFun(); end % ObjectiveFunction is now a Composite. Get a parallel.pool.Constant % that refers to it, without copying it to the client: C = parallel.pool.Constant(fun); % You could also use the line % C = parallel.pool.Constant(@MakeFun); % In this case, you do not use spmd % Call bayesopt, passing the Constant splits = optimizableVariable('splits', [1 100]); minleaf = optimizableVariable('minleaf', [1 100]); bo = bayesopt(C,[splits minleaf],'UseParallel',true); function f = makeFun() load('ionosphere','X','Y'); f = @fun; function L = fun(Params) L = kfoldLoss(fitctree(X,Y, ... 'KFold', 5,... 'MaxNumSplits',Params.splits, ... 'MinLeaf', Params.minleaf)); end end```

In this example, the function handle exists only on the workers. The handle never appears on the client.

### Differences in Parallel Bayesian Optimization Output

When `bayesopt` runs in parallel, the Bayesian optimization output includes these differences.

• Iterative Display — Iterative display includes a column showing the number of active workers. This is the number after `bayesopt` assigns a job to the next worker.

• Plot Functions

• Objective Function Model plot (`@plotObjectiveModel`) shows the pending points (those points executing on parallel workers). The height of the points depends on the `ParallelMethod` name-value pair.

• Elapsed Time plot (`@plotElapsedTime`) shows the total elapsed time with the label Real time and the total objective function evaluation time, summed over all workers, with the label Objective evaluation time (all workers). Objective evaluation time includes the time to start a worker on a job.

## See Also

(Parallel Computing Toolbox) | (Parallel Computing Toolbox)

Download ebook