Can I vectorize fitglm to process many regression models instead of using a for loop

2 views (last 30 days)
I am trying to process over a million logistic regressions through fitglm, and I am currently using a for loop to do this, which is taking a very long time.
I have a situation where I have 38 explanatory variables in my data set but only want to fit my model on 6 variables at a time, and therefore would like to process a fitted model for all possible combinations of 6 variables to be chosen from the 38 total (works out to 38 choose 6, which is around 2.7 million models).
I am curious if there is a way to vectorize the fitglm function to avoid using a lengthy for loop and iterating over all possible model combinations.
Here is my code (resp contains my responses, allExplanatoryVars contains all observations (rows) for all 38 explanatory variables (columns), variableChoices is a matrix that holds references for all possible subsets of size 6 out of the 38 total. For example, row 1 contains the values 1,2,3,4,5,6. Row 2 contains the values 1,2,3,4,5,7. And the final row of the matrix contains the values 33,34,35,36,37,38. When called, it is retrieving those specific columns within the allExplanatoryVars matrix, for each i).
% Fit the logistic regression models
for i = 1:N
subSelection = allExplanatoryVars(:, variableChoices(i,:));
mdlLogistic = fitglm(subSelection,resp,'Distribution','binomial','Link','logit');
%Store the coefficients in a results matrix for each i.
modelCoefficients(i,1) = mdlLogistic.Coefficients.Estimate;
end
Is it possible to perform this task in a quicker way than looping through all i iterations? For example, can you use a vectorized approach on subSelection and resp as they are used within the fitglm function?
Thank you so much for anyone's help on this!

Answers (1)

Anagha Mittal
Anagha Mittal on 11 Sep 2024
Hi,
Unfortunately, vectorizing "fitglm" directly for such a large number of model fits isn't feasible because "fitglm" is inherently iterative and must fit each model separately.
However, to perform this task in a quicker way you may use "parfor" and "parpool" to enable parallel computation instead of using "for" loop. Below is an example:
N = size(variableChoices, 1);
modelCoefficients = zeros(N, 7); % Assuming 7 coefficients (including intercept) per model (change as needed)
parpool('local');
% Fit models in parallel
parfor i = 1:N
subSelection = allExplanatoryVars(:, variableChoices(i,:));
mdlLogistic = fitglm(subSelection, resp, 'Distribution', 'binomial', 'Link', 'logit');
% Store the coefficients (you can store other necessary statistics as needed)
modelCoefficients(i, :) = mdlLogistic.Coefficients.Estimate;
end
delete(gcp('nocreate'));
For more information on "parfor" and "parpool", refer to the following documentation link:
Hope this helps!

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!