regress and/or fitlm with more than 1000 dummies
2 views (last 30 days)
Show older comments
I am trying to run a regression model on a dataset with about 600,000 observations and 1008 dummies. I am using fitlm but Matlab crashes or runs out of memory. I tried to save memory space defining the dummies as logical but without success. Do you think I still have some hope or should I just give up? Thank you for your help.
3 Comments
Brendan Hamm
on 6 Jul 2016
One thing you may consider is using fitlm with a table of predictors. In this manner you can simply have one of the columns be a categorical predictor variable and MATLAB will handle the dummy variables for you (including the dummy variable trap concern). There is no guarantee here that this will solve your problems, but I would consider it.
You are using almost 5 GB just for your data. For this reason computing things like the hat matrix will be computationally intense and require more data to be stored in memory. Furthermore fitlm stores a lot of extra data as well which means you may try another method of regression. polyfit could be helpful in that we don't have all of the extra statistics computed for us. Another option is to use a gradient based method (likely cgs) as the iterative algorithms take less computation at each step. This will also take a sparse matrix which can further reduce memory requirements.
Answers (0)
See Also
Categories
Find more on Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!