Issues with reproducibility in multistart with parallelization
12 views (last 30 days)
Show older comments
I am running several model fits (1139) using multistart with parallelization. I first ran with 50 start points (including my initial guess). I then wanted to re-run with 150 start points, to compare the reduction in fval. So, when running the first 50, I batched my fits manually across a handful of nodes, and saved the rng state to a mat file. When running the 150, I batched the fits in the same way, except for a set of about 100 where that node was unavailable, and loaded the rng state.
To test the effect of the node (I think I read the node influences the number generator, and was curious what would happen with reproducibility) I ran that set of 100 fits across two nodes, loading the same state each time. In this case, I got identical outputs, identical function values.
However, I did not get good reproducibility from the 50 start point run to the 150 start point run. 27% of the 1139 fits were worse (had higher function values in a minimization problem) than the 50 start point fits. I also found that of the 1139 fits, 17% had greater than 1% higher fval, and 3% had 10% greater fval - I thought maybe its rounding or something, but this seems pretty high.
What am I missing? How can I make these fits reproducible?
0 Comments
Answers (1)
Matt J
on 2 Sep 2025 at 20:53
Edited: Matt J
on 2 Sep 2025 at 20:56
I don't see why you would expect agreement between a 50-point multistart and a 150-point multistart. Only if both versions succeed in finding the global minimum would the results be guaranteed to agree.
4 Comments
Matt J
on 3 Sep 2025 at 1:33
You are quite welcome, but when/if you are convinced this is the correct answer please accept-click it.
See Also
Categories
Find more on Global or Multiple Starting Point Search in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!