Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?
1 view (last 30 days)
Show older comments
I wish to calculate several statistics (Spectra, Correlation Functions, etc.) of ~400 files with 6e6 doubles per file and afterwards average over all files to get average spectra, correlation functions, etc. To make things fast, I try to use mapreduce on a parallel cluster. This works like a charm as long as there are relatively few files (~100), but with a larger amount of files I get this error message:
Error using parallel.mapreduce.KeyValueOutputStore/addmulti (line 63)
Error in adding keys and values.
Error in Analysis20190708>Analysis (line 115)
addmulti(intermKVStore, {'StatNames'}, {Stats});
Error in parallel.internal.pool.deserialize>@(data,info,intermKVStore)Analysis(data,Parameters,info,intermKVStore)
Error in mapreduce (line 116)
outds = execMapReduce(mrcer, ds, mapfun, reducefun, parsedStruct);
Error in Analysis20190708 (line 72)
outDS = mapreduce(ds, mapper, @reduceAnalysis,inpool);
Caused by:
The database /tmp/filename/TaskOutput7.db is full. (database or disk is full)
The message occurs after around 50% of the map phase is done, but later when I reduce the size of the result vectors (less frequently sampled spectra for example). I checked with an admin and the /tmp indeed has very limited free space.
The question is now: How do I tell MATLAB to store these intermediate(?) files to a different location with more storage
0 Comments
Accepted Answer
More Answers (0)
See Also
Categories
Find more on Standard File Formats in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!