.mat vs .txt or .csv
Show older comments
Hi everyone. It's my first question here, there I go. I've done a lot of image processing of very large set of images and save some numerical results (particles, centroids, axis,...whatever), and I wonder which of the storing options would be the best (or just the fastest, simplest, less memory consumption... not really sure what optimize) for future post-pocressing: 1) Store the data in .txt/.csv in a single (appending columns, maybe of different lengths), or several files; 2) use cell arrays and store in .mat file.
In terms of portability I guess it's best the former one, but ¿is it the fastest and less memory consumption, too?
6 Comments
Stephen23
on 7 Sep 2017
The .mat file is the best: it is designed for storing MATLAB data.
Walter Roberson
on 7 Sep 2017
You could also consider xlsx for portability but not necessarily for efficiency
per isakson
on 8 Sep 2017
Edited: per isakson
on 8 Sep 2017
I cannot answer your question with the name of a file format. The best way to store "some numerical results (particles, centroids, axis,...whatever)" depends on many factors.
- "very large set of images" is that more like one thousand or hundred thousand images?
- Your profile says Univ. Madrid. Are you about to start an academic study on "image processing" with the goal to publish the result?
- "future post-processing" Will that require random access to "results" from the entire "set of images"? How do you need to "query/access" the set of "results"?
- Over a how long period of time will you work on this study? Will you forget details?
Am I totally off in my speculations?
Javier Antonio Soler Romero
on 8 Sep 2017
Edited: Javier Antonio Soler Romero
on 8 Sep 2017
Thank you all.
It's about thousands of images per experiment (cell migration assays, wound-healing, fluorescence microscopy).
I am research staff, so my work will be published someday, I hope. I've been working for 2 years in the field (collective motion, cellular forces, mechanotransduction, regularization of inverse problems), but I am beginning to standarize all the work done so far, using same algorithms and same experimental conditions (experiments were pre-analysed before but I am looking at more variables now).
Postprocessing involves statistics (morphology, density, proliferation/apoptosis rates,..) and modelling (non linear PDEs, continuum and discrete models) and whatever it comes to mind in the future.
The problem is so far it has been a bit chaotic, some experients were analysed with fiji-imagej, others with home-made scripts in matlab... and it's time to unify them, translate scripts into functions...
So I am dealing with several options... not easy to choose the best one, or the fittest.. I need to make up my mind.
Thanks again
By the way, I am the same guy that asked the question first. I had an account before but I had to do an institutional one (for licensing)
Walter Roberson
on 8 Sep 2017
For each image you do some processing and save the results. You then do more phases of work, which presumably do not need the original images. For example if you were doing neural network work, then your first phase might have been feature extraction, and your second phase could then involve investigation of feature reduction and various parameters of neural network training.
Now, for the sake of reproduction of your work, you would either be making the original images available to researchers or you would be publishing where to obtain them.
But if the images themselves are not required for the phases after that, then you might consider publishing the results of the first stage processing, so that others could start from there and investigate. As long as the images themselves are available to researchers, this would not strictly be necessary as long as you published exactly how the first stage was done, but one could imagine that it could be convenient.
Now, if that kind of publishing of the intermediate data is of significant interest to you, then you need to consider portability, arguing towards csv or xlsx or netcdf or hdf5. But if you are not planning to do that, then efficiency becomes more important, arguing towards binary files or .mat files or databases.
per isakson
on 13 Sep 2017
Edited: per isakson
on 13 Sep 2017
The first time I read your comment I understood that "I am beginning to standardize all the work done so far" included not only your own work. However, after a second reading I believe it's your own work.
You'll find a lot if you google for "management scientific data". E.g this four minute video on sharing data.
IMO: You should make a little study on "management of scientific data" focused on your scenario. Write user stories, make small experiments, what is the practice in your field, find good examples of your colleagues (include other univ.), compile a report.
There are many contributors here who work with "scientific data". Maybe a question with the title "management of scientific data" would catch their eye. But the question itself must be more specific.
writing, reading, sharing, archiving
Answers (1)
Javi
on 20 Sep 2017
0 votes
1 Comment
Guillaume
on 20 Sep 2017
As far as memory is concerned a text format is the worst. Converting text to/from numbers is not particularly fast either.
While the csv format is portable and easily read by humans, one of its major shortcoming is that it is only suitable for storing one matrix (Otherwise, you end up having to define a syntax format for the file and it's no longer portable).
Whenever I process scientific data, I not only store the result of that processing but also all the inputs and configuration switches that were used to obtain that result, the software version of any code that was used, etc, so that when a few months/years later questions are asked about the results I can always go back and reproduce them exactly as they were generated. All of that I store in a mat file.
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!