Aggregate multiple CSV files as an average for each cell individually and result in one CSV file with different rows in each source file.
3 views (last 30 days)
Show older comments
sherwan Najm
on 16 Sep 2020
Commented: sherwan Najm
on 23 Sep 2020
Dear all,
Is it possible to take an average based on the rows number of multiple CSV files (3,4,5 ..n number of CSV files) for each cell individually in a new CSV file with the same variables' names and structures. , please?
- The files are in one folder, for instance, E:\test\
- The number of files is changeable, so, maybe 2 or 3 ... n number of csv files.
- The number of columns and rows is also changing but all the files have the same numbers of columns with different number of rows at calculation time, i.e. file test1.csv, test2.csv, and test3.csv, has 3 columns and different rows. When the number of columns is changed, all the columns in the files will be change for example 13 or 15 column. The rows are different in each file.
So, the numbers of columns are changable but it is same numbers for each file and the number of rows are different.
For example, if we have the following data as shown in the pictures:
and the rsult will be as the following picture:
So, in the resulting picture, the average of the first two rows (#0 and #1) is divided by 3 because these two rows are existing in all the source files. The average of the second two rows (#2 and #3) is divided by 2 because these two rows are existing only in two of the source's files. The last two rows (#4 and #5) are divided by 1 because only one source has these two rows.
I have got this Code thankfully from Mr. Asad (Mehrzad) Khoddamfor, which solve my question in the link, which the columns and the rows should be same in the source files.
The code is:
%
% read folder
%
files = ls ('d:\test\*.csv');
for i = 1: size(files, 1)
file = strtrim(files(i,:));
m = readmatrix(['d:\test\' file]);
for r = 1:size(m,1)
if all(isnan(m(r,:)))
m= m(1:r-1,2:end);
break;
end
end
for r = 2:size(m,2)
if all(isnan(m(:,r)))
m= m(1:end,1:r-1);
break;
end
end
if all(isnan(m(:,1)))
m= m(:,2:end);
end
%m
if i==1
mp = zeros(size(m,1), size(m,2), size(files,1));
tb = readtable(['d:\test\' file]);
end
mp(:,:,i) = m;
end
tb(:,2:end) = num2cell(mean(mp,3));
%
% write table, using tblwrite
%
The source files are attached.
Respectfully Yours,
Sherwan
0 Comments
Accepted Answer
Mohith Kulkarni
on 22 Sep 2020
Hi Sherwan,
Here is the solution to your problem. I have made changes to the solution you posted.
files = ls ('E:\test\*.csv');
%part 1
rows_in_each_file = zeros(1,size(files,1));
for i = 1: size(files, 1)
file = files(i,:);
m = readmatrix(['E:\test\' file]);
%storing the number of rows in each file in an array
size(m,1);
for r = 1:size(m,1)
rows = r;
if all(isnan(m(r,:)))
rows = r-1;
break;
end
end
rows_in_each_file(i) = rows;
if i==1
columns = size(m,2)-1;
for c = 2:size(m,2)
if all(isnan(m(:,c)))
columns = c-1;
end
end
end
end
max_rows = max(rows_in_each_file);
%counting how much to divide for each row
div = zeros(1,max_rows);
for i = 1:size(files,1)
div(1:rows_in_each_file(i)) = div(1:rows_in_each_file(i)) + 1;
end
%part2 computing the output
outputmat = zeros(max_rows,columns);
for i = 1: size(files, 1)
file = files(i,:);
m = readmatrix(['E:\test\' file]);
for r = 1:size(m,1)
if all(isnan(m(r,:)))
m= m(1:r-1,2:end);
break;
end
end
for r = 2:size(m,2)
if all(isnan(m(:,r)))
m= m(1:end,1:r-1);
break;
end
end
if all(isnan(m(:,1)))
m= m(:,2:end);
end
%appending zeros at the end to add to output
if size(m,1) < max(rows_in_each_file)
m = [m;zeros(max_rows - rows_in_each_file(i),columns)];
elseif size(m,1) == max_rows
tb = readtable(['E:\test\' file]);
end
outputmat = outputmat + m;
if
end
%final average
outputmat = outputmat./div';
tb(:,2:end) = num2cell(outputmat);
writetable(tb,'result.csv')
The solution is divided into two parts, the first part is to find out the number of rows in each file to figure out which rows to consider for finding the average and the second part is similar to the posted solution but in this case the element wise division is used to find the average.
More Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!