Find average of multiples lines

Hi All,
Having more trouble than I should finding the average line from 7 records. Each record is basically [T, (a-b)], where T is a common time interval (0:0.1:30) and (a-b) is the difference between two data sets (a-b, a-c, a-d, etc.).
I've designated each "difference" as such: diff, diff2, diff3, diff4, diff5, diff6, diff7. I am easily able to plot all of these records on the same plot. However, each method to find the mean of these lines fails. SO far, I have used mean([diff diff2, etc.],7). How do I go about finding the average for each common time point and then plot that?
Thanks.

7 Comments

I'm a little confused how you have your data organized. Do you have a bunch of single difference/time sets?
[T(1),diff] [T(2),diff2] [T(3),diff3]
Or do you have a set of differences for each time point?
[T, diff diff2 diff3 diff4...]
I'm sure it might be simpler than your problem is, but if you do have the second one, what about taking the average at each time point?
Vince Clementi
Vince Clementi on 1 Feb 2018
Edited: Vince Clementi on 1 Feb 2018
Hi Bob,
To answer your question, it is the latter: A single common time scale for all 7 difference records.
I've made some progress since posting this. I've compiled my data into a 7x301 matrix. There are a fair amount of NaNs in here. I can now take the average of the matrix, but it seems to only average parts of the matrix where there is a value in all 7 rows. Is there a way for the average function to average all of the data, even with NaN present?
Thanks!
Are you trying to look for a single average value for the entire array? I thought you were trying to find an average line from the seven other lines?
Again, just to make sure I understand your organization correctly, your 7x301 is organized such that each row is a diff, and each column is a unique time?
Apologies for any confusion in my explanation. But, exactly. I am trying to find the average line from the seven lines, and I've organized the data in a 7x301 array.
No worries about any confusion, I'm just making sure I understand so I actually answer your question.
Ok, so what about adding a check to each column to confirm how many values exist?
for i = 1:301;
if isnan(array(:,i) ~= 0;
nansum = sum(isnan(array(:,i)));
average = sum(array(:,i))/(7-nansum);
else
average = sum(array(:,i))/7;
end
end
I can't guarantee it will work, but that should go through and check each column for NaN values. It will then add up the number of NaN values and reduce the average you are dividing by for each time slot.
I can't say that gives the actual average of the line, so it might be better to go through and fill in the NaN values with interpolations from their lines. This might help give a better average by having all seven values for each time step.
Thanks! I'll give it a shot.
Let me know if it works, and I can post as an official answer, or if you have some trouble shooting you can post your own answer.

Sign in to comment.

 Accepted Answer

Regarding the follow-up question found in the comments (NaN affecting the mean):
Use the 'omitnan' flag of the mean function (R2015a and newer):
%%%I wouldn't use "diff" as a variable name - it's a function
avg = mean([diff1, diff2, diff3, diff4, diff5, diff6, diff7],2,'omitnan');

2 Comments

That's pretty good, definitely gets rid of having to do a manual check.
I think there is a function called nanmean that will do the same too

Sign in to comment.

More Answers (0)

Categories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!