Regarding "Saving the data"?

Dear All,
I am getting data from NETCDF files as follows:
time=double(netcdf.getAtt(ncFileId,netcdf.getConstant('NC_GLOBAL'),'time'));
data1=...
dat2=..
..
etc
And then save it like this:
data=[time data1 data2..];
data=double(data);
And then it will save as follows:
save(matfile,'data')
But, the data that came out from this is not completed. I mean the size of each matfile that save in this way is around 80k. While it should be ~3000-5000k. So, I am missing some data.
Does anyone knows how to resolve my problem?
I would greatly appreciate your help, Ara

 Accepted Answer

Walter Roberson
Walter Roberson on 17 Jan 2017
You cannot rely on file size to tell you whether data is missing. Current MATLAB .mat files may contain compressed data. You should use "whos -file" on the file to look to see what variables are in the file and their sizes.

20 Comments

I did use"whos",
159984 struct
and I also checked data from workspace. I have all the variable that I want but when I plot all of the 365 days it shows a little data. So, I checked the size with those that I downloaded from profile data and save it exactly like this.
p=netcdf.inqVarID(ncFileId,'time');
time=netcdf.getVar(ncFileId,p);
Where I save it exactly like above (global data). It comes with many data. That is why I am thinking perhaps the way I am saving the data is not suitable for Global data (?).
I have save all the data in one matfile. Is that possible the loop that I wrote is not correct?
for k = 1:365
matFile = sprintf('2013.%03d.mat', k);
if exist( matFile, 'file' )
data = load( matFile, 'data' ) ;
fprintf( 'Load data from MAT-File : %s ..\n', matFile ) ;
else
fprintf( ' MAT-File is not exsist : %s ..\n', matFile ) ;
end
end
and then save it as follows:
matFile= 'Year-2011';save (matFile, 'data');
I think I am not reading all the 365 days, right?
Would you please tell me where the problem is?
You are correct, that loop overwrites data each iteration of the loop.
for k = 1:365
matFile = sprintf('2013.%03d.mat', k);
if exist( matFile, 'file' )
data_struct = load( matFile, 'data' );
all_data(:, k) = data_struct.data;
fprintf( 'Load data from MAT-File : %s ..\n', matFile ) ;
else
fprintf( ' MAT-File is not exsist : %s ..\n', matFile ) ;
end
end
Thank you Walter. But, I am getting this error Subscripted assignment dimension mismatch. here is the size data for one day
3555x8 double
But, it does not go through the loop to read all 365 days.
Do you have any idea?
Is that possible to read all the matfile in one shot and then save it in another matfile for whole year? In that way I have a data contains all 365 days so that I can easily analyse it without reading through all files each time.
all_data(:, :, k) = data_struct.data;
The code already loops over all of the files and stores all 365 days of information into the same array, all_data . I did not know anything about the size of the data in the files and had to guess at the shape.
Sorry for providing incomplete information, and thanks for your guess. But, it gets same error. I do not think this goes through the loop because when I read only one file the output is data= 3555x8 double
Name Size Bytes Class Attributes
Data 1x1 227696 struct
whos all_data
Name Size Bytes Class Attributes
all_data 3555x8 227520 double
and here is for all 365 days
after reading through the loop 1:365..so this is only the last day 365 whos data
Name Size Bytes Class Attributes
data 1x1 159984 struct
from each matfile I have a "gpsdata", which is structure. Please see bellows the arrangment..Year, day, time, etc
*2013 365 0 9.4 0.0 68.3 139.8 762.0*
Can we concatenate vertically each file? It can read first file and keep it and then concatenate the rest in that one?
Please let me know if I can provide more information.
I did not pre-allocate the output because I did not know if the input data size was fixed. The code I provided creates 3555 x 8 x number of files read, and if you only read one file that would be 3555 x 8 x 1 which MATLAB would automatically show as 3555 x 8.
for k = 365 : -1 : 1 %this has the effect of pre-allocating
matFile = sprintf('2013.%03d.mat', k);
if exist( matFile, 'file' )
data_struct = load( matFile, 'data' );
all_data(:, :, k) = data_struct.data;
all_gpsdata(:, :, k) = data_struct.gpsdata;
fprintf( 'Loaded data from MAT-File : %s ..\n', matFile ) ;
else
fprintf( ' MAT-File is not exist : %s ..\n', matFile ) ;
end
end
I understand, no worries. Also, thanks, Walter. But, it does not work again. Here is the error:
Subscripted assignment dimension mismatch.
Cannot display summaries of variables with more than 524288 elements.
It shows nothing that allow me to use the data.
Is that possible to read each file and go to another one and at the end append all the data in one matFile?
for k = 365 : -1 : 1 %this has the effect of pre-allocating
matFile = sprintf('2013.%03d.mat', k);
if exist( matFile, 'file' )
data_struct = load( matFile, 'data' );
all_data{k} = data_struct.data;
all_gpsdata{k} = data_struct.gpsdata;
fprintf( 'Loaded data from MAT-File : %s ..\n', matFile ) ;
else
fprintf( ' MAT-File is not exist : %s ..\n', matFile ) ;
end
end
At the end of this you would have cell arrays all_data and all_gpsdata that contain all of the data read from the files. If all of the entries read are exactly the same length, then you can construct an array of output from them:
nd = ndims(all_data{1});
try
all_data_matrix = cat(nd+1, all_data{:});
catch ME
fprintf('Your data entries are not all the same size!\n');
all_data_matrix = [];
end
nd = ndims(all_gpsdata{1});
try
all_gpsdata_matrix = cat(nd+1, all_gpsdata{:});
catch ME
fprintf('Your gpsdata entries are not all the same size!\n');
all_gpsdata_matrix = [];
end
THANK YOU, Walter. You are the best and creative. It works perfectly.
Can I ask another question? I have all variable in all_gpsdata. I transpose and came out like this:
3555x8 double
5403x8 double
5084x8 double
5188x8 double
4563x8 double
4214x8 double
4772x8 double
4590x8 double
Now, I want to plot all the data in column 4 vs column 3, for example. To get these variables for all 365 days. They are not exactly in the same length all my matfiles. So, the array of output did not construct.
Sorry to bother you again.
hold on
cellfun(@(data) plot(data(:,3), data(:,4)), all_gpsdata);
That would be for creating 365 lines. If you wanted to create one single line that had all of the information, then it would look more like
data = vertcat(all_gpsdata{:});
plot(data(:,3), data(:,4))
This assumes that all of the entries have the same number of columns.
Wow, thanks. Just using the "cellfun" I got this. Please see the figure; Can we have something like "scatter plot" to avoid this lines?
Index exceeds matrix dimensions.
hold on
pointsize = 10;
arrayfun(@(IDX) scatter(all_gpsdata{IDX}(:,3), all_gpsdata{IDX}(:,4), pointsize, IDX*ones(size(all_gpsdata{IDX},1),1)), 1:length(all_gpsdata), 'Uniform', 0);
hold off
This will color-code each line according to the day number.
If you ended up with some matrices that do not have enough columns, then you need something like
mask = cellfun(@(C) size(C,2), all_gpsdata) < 4;
agps = all_gpsdata;
agps(mask) = {NaN(1,4)};
hold on
pointsize = 10;
arrayfun(@(IDX) scatter(agps{IDX}(:,3), agps{IDX}(:,4), pointsize, IDX*ones(size(agps{IDX},1),1)), 1:length(agps), 'Uniform', 0);
hold off
Wow, it looks nice! Though my PC and my MATLAB all hanged out;forced me to restart it several times. So, it could not plot all the days; it shows again "Index matrix dimensions".(?)
Walter, I believe you can solve this problem of mine as well. So, please let me share it with you. I have two type of these matFiles, where I loaded those files as well and the size came out as "1.5G", where it save in another matfile. Now, Its very difficult to load it and plot things as MATLAB shows "out of memory". But, my problem is the "time" in these files come as a second. So, column vector of "time" started from 1 to 9000. Now, I need to match data so that read (let's say column 7) as a function of time (0-24hrs). Do you have any suggestion?
You did help me alot; I really appreciate it.
If you are using a new enough version of MATLAB (might require R2016b) you could look at timetable()
Yes, exactly I am using R2016b. Can you please explain more. What should I need to know about the timetable?
Ara
Ara on 20 Jan 2017
Edited: Ara on 20 Jan 2017
Thank you for useful documents/links. It helps me to understand. Can I ask you one question? I understand that I can get timetable so that I can arrange time based on hour, which is great. For this, I need to arrange other data based on that as well.
This means that:
Time=seconds(1:3600)'; % this will give me data in one hour. Am I right?
T=timetable(Time, [data in column 1], [data in column 2],...,[data in column 8], 'Variable Names', {'A', 'B',...'H'});
But, my problem is how get 3600 data in column A, as an example, as well? I would know the first 3600 from time represents an hour and then I need to get one hour data, too. So that I can hourly do the analysis.
Can you please give me a hint?
Thank you for all of your great help,
retime(T, 'hourly', 'mean') perhaps

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!