Averaging subsets of a large dataset

2 views (last 30 days)
Hello Masters of Matlab!
I have a number of large files with datasets arranged like this:
ID Latitude Longitude Date GTDHZZ01 GWDR WINDXMOD RTpeak ADEPZZ01
1 23.1 59.2 1980-01-01T01:00:00 0.32238 307.827 2.1734 2.1334 1040.99
2 23.1 59.2 1980-01-01T02:00:00 0.34786 316.644 2.1512 2.4206 1040.99
3 23.1 59.2 1980-01-01T03:00:00 0.36029 324.41 2.1287 2.7464 1040.99
4 23.1 59.2 1980-01-01T04:00:00 0.36796 330.54 2.1059 2.7464 1040.99
5 23.1 59.2 1980-01-01T05:00:00 0.3734 334.466 2.0826 2.7464 1040.99
6 23.1 59.2 1980-01-01T06:00:00 0.37837 337.616 2.0587 2.7464 1040.99
7 23.1 59.2 1980-01-01T07:00:00 0.38272 339.767 1.8865 2.7464 1040.99
8 23.1 59.2 1980-01-01T08:00:00 0.38738 341.707 1.7042 3.1161 1040.99
9 23.1 59.2 1980-01-01T09:00:00 0.39092 343.277 1.5135 3.1161 1040.99
10 23.1 59.2 1980-01-01T10:00:00 0.39571 344.519 1.3159 3.1161 1040.99
Each of the files contains data from multiple locations. I'm trying to create a code that would enable me to calculate the average values of columns 5 to 9 (GTDHZZ01 to ADEPZZ01) for each location (whenever latitude, longitude or both change).
Any help would be much appreciated.
Many thanks.

Accepted Answer

Peng Li
Peng Li on 5 Aug 2020
You can load this datasheet to matlab using readtable. after that, you have a table variable that allows easier manipulation.
To achieve your goal, you can use findgroups based on Latitude and Longitude, and use splitapply to calculate the mean of the columns you need based on the groups you determined. see example below:
tbl = readtable('test.txt');
[grp, res] = findgroups(tbl(:, ["Latitude", "Longitude"]));
propNames = tbl.Properties.VariableNames(5:end);
newPropN = propNames + "_avg";
res{:, newPropN} = splitapply(@(x) nanmean(x, 1), tbl{:, 5:end}, grp);
res =
1×7 table
Latitude Longitude GTDHZZ01_avg GWDR_avg WINDXMOD_avg RTpeak_avg ADEPZZ01_avg
________ _________ ____________ ________ ____________ __________ ____________
23.1 59.2 0.3707 332.08 1.9121 2.7634 1041
  1 Comment
Maja Zdulska
Maja Zdulska on 5 Aug 2020
Thank you! It's much faster and simpler than everything I've tried.

Sign in to comment.

More Answers (1)

Sudheer Bhimireddy
Sudheer Bhimireddy on 5 Aug 2020
It can be achieved by getting the indices of unique values for the latitude and longitude once you sort the matrix based on them. From there you can average between the indices.
% sort your matrix based on lat and lon
% this will sort the marix first by lat and then by lon
sorted_matrix = sortrows(your_matrix,[2 3]);
% run the unique for your lat and long rows
% 'rows' tells the code to look for any change in rows
[unique_values, unique_val_indices] = unique(sorted_matrix(:,2:3),'rows');
% get number of unique values to initialize your average array
n_unique_values = size(unique_val_indices);
% initialize average array
GTDHZZ01_Average(n_unique_values) = 0;
% run the loop for except the last unique value
for i=1:n_unique_values-1
avg_start = unique_val_indices(i);
avg_end = unique_val_indices(i+1)-1;
GTDHZZ01_Average(i) = average(your_matrix(avg_start:avg_end,5));
end
% estimate the average for the last unique value
% this way you wont end up with error when the last row itself is unique
GTDHZZ01_Average(n_unique_values) = average(your_matrix(unique_val_indices(end):end,5));
Hope this helps.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!