How to find the difference between two plots while the dimension of the matrices are different?

Question

Xingjian Zhao on 19 Nov 2020

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/652893-how-to-find-the-difference-between-two-plots-while-the-dimension-of-the-matrices-are-different

Edited: Walter Roberson on 25 Nov 2020

I'm currently testing the NTC thermistors' characteristics, from -40~150Celcuis; so i have two set of data; the first one is the "expected Value" from my calculation & Simulation, Which is the V-T (Voltage-Temperature)graph of the NTC thermistor, and the temperature value is integer, from -40 to 150 and each temperature has only one voltage value.

But the second one is the data i got from actually testing, the temperature value contains decimal points, for each degree, it can have up to 20numbers between one degree, and it can have up to 10 numbers for each of the decimal numbers.

So how can i find the "maximum deviation" "Maximum difference" between these two plots with different dimension of the matrices, which is [191*1] and [9656*1]

191 for -40~150 degree celcius in integer, 11657 for -40~150 degree celcius contain decimal numbers.

T_Cal=NTCVTCVS{:,6};
V_Toff_Estimate=NTCVTCVS{:,7};
T_Test=NTCVTCVS{:,3};
V_Toff_Test=NTCVTCVS{:,4};
figure
hold on
plot(T_Cal, V_Toff_Estimate,'b')
plot(T_Test, V_Toff_Test,'r')
legend('Estimate','Test');

I would like to know at which point, the Estimate and Test Value have the largest deviation, but theset two matrices have two different dimensions, so can someone please help me with that?

I've done a lot of reserching, and there are some people giving ideas but not really work on mine.

Fill most of the data with NAH? to make these two matrices in the same size so that matlab can do the calculation

Try using interp1? im not familiar with that command and i am getting errors on that

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 19 Nov 2020

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/652893-how-to-find-the-difference-between-two-plots-while-the-dimension-of-the-matrices-are-different#answer_549278

subplot(1,2,1)
plot(x1, y1, 'b', x2, y2, 'r')
subplot(1,2,2)
minx = max( min(x1), min(x2) );
maxx = min( max(x1), max(x2) );
x1_mask = x1 >= minx & x1 <= maxx;
x1_common = x1(x1_mask);
y1_common = y1(x1_mask);
x2_mask = x2 >= minx & x2 <= maxx;
x2_common = x2(x2_mask);
y2_common = y2(x2_mask);
common_x = union( x1_common, x2_common );   %find all x values that overlap
y1_interp = interp1(x1_common, y1_common, common_x);  %interpolate first to common x
y2_interp = interp1(x2_common, y2_common, common_x);  %interpolate second to common x
y_diff = y1_interp - y2_interp; %calculate the difference
plot(common_x, y_diff)     %plotting the difference

15 Comments
Show 13 older commentsHide 13 older comments

Walter Roberson on 23 Nov 2020

I am having computer problems so I cannot develop full code at the moment.

Call the experimental voltages EV and experimental temperatures ET

use the third output of ismembertol() applied to the experimental temperatures ET to create a grouping variable, G. Use the first output, call it NET, as representative values for each group, Nomimal Experimental Temperatures.

This will have done the equivalent of findgroups() except with floating point tolerance as you cannot trust that every identical-looking floating point value in your data really is bit-for-bit identical.

now

mins = accumarray(G(:), EV(:), @min);
maxes = accumarray(G(:), EV(:), @max);

You now have a vector NET of Nomimal experimental temperatures, and a vector mins that has the smallest observed voltage for that temperature and a vector maxes that has the largest observed voltage for the temperature.

Now

estimated_voltages = interp1(Simulated temperatures, Simulated voltages, NET);
min_deviance = abs(estimated_voltages - mins)n
max_deviance = abs(maxes - estimated_voltages) ;
worst_deviance = max(min_deviance, max_deviance); 
plot(NET, worst_deviance) ;

If you need to know which is the worst deviance for each temperature then you should have said so...

Walter Roberson on 23 Nov 2020

Edited: Walter Roberson on 25 Nov 2020

T_Test = NTCVTCVS{:,3};
V_Toff_Test = NTCVTCVS{:,4};
T_Cal = NTCVTCVS{:,6};
V_Toff_Estimate = NTCVTCVS{:,7};
%prepare experimental values, getting rid of nan
ET = T_Test;             %experimental temperatures
EV = V_Toff_Test;        %experimental voltages
%do not want to assume they are the same length
%extend shorter with nan
if length(ET) > length(EV)
    EV(end:length(ET)) = nan;
end
if length(EV) > length(ET)
    ET(end:length(EV)) = nan;
end
%get rid of nan
mask = any(isnan([ET, EV]), 2);
ET(mask) = [];
EV(mask) = [];
%prepare simulated values, getting rid of nan
ST = T_Cal;             %simulated temperature
SV = V_Toff_Estimate;   %simulated voltage
%do not want to assume they are the same length
%extend shorter with nan
if length(ST) > length(SV)
    SV(end:length(ST)) = nan;
end
if length(SV) > length(ST)
    ST(end:length(SV)) = nan;
end
mask = any(isnan([ST, SV]), 2);
ST(mask) = [];
SV(mask) = [];
%now do the work
[NET, ~, G] = uniquetol(ET);
mins = accumarray(G(:), EV(:), [], @min);
maxes = accumarray(G(:), EV(:), [], @max);
estimated_voltages = interp1(ST, SV, NET);
min_deviance = abs(estimated_voltages - mins);
max_deviance = abs(maxes - estimated_voltages) ;
worst_deviance = max(min_deviance, max_deviance); 
plot(NET, worst_deviance) ;

Walter Roberson on 23 Nov 2020

With regards to findgroups and uniquetol:

format short
rng(655321);
random_integer_data = randi(10, 1, 20);
disp(random_integer_data)
     4     4     2     9     8     4     8    10     7     2     2     8     3     5     1     6    10     2     5     4
[G, ID] = findgroups(random_integer_data);
for idx = 1 : length(ID)
    S = sprintf('integer value %d occurs at index: ', ID(idx));
    S = [S, sprintf('%d, ', find(G==idx))];
    fprintf('%s\n', S);
end
integer value 1 occurs at index: 15, 
integer value 2 occurs at index: 3, 10, 11, 18, 
integer value 3 occurs at index: 13, 
integer value 4 occurs at index: 1, 2, 6, 20, 
integer value 5 occurs at index: 14, 19, 
integer value 6 occurs at index: 16, 
integer value 7 occurs at index: 9, 
integer value 8 occurs at index: 5, 7, 12, 
integer value 9 occurs at index: 4, 
integer value 10 occurs at index: 8, 17, 

In each case, each value (such as 4) exactly matches every other value that looks like a 4.

But

random_float_data = random_integer_data + 1e-14*(rand(size(random_integer_data)) < 1/3);
disp(random_float_data)
  Columns 1 through 18

    4.0000    4.0000    2.0000    9.0000    8.0000    4.0000    8.0000   10.0000    7.0000    2.0000    2.0000    8.0000    3.0000    5.0000    1.0000    6.0000   10.0000    2.0000

  Columns 19 through 20

    5.0000    4.0000
[G, ID] = findgroups(random_float_data);
for idx = 1 : length(ID)
    S = sprintf('floating value %g occurs at index: ', ID(idx));
    S = [S, sprintf('%d, ', find(G==idx))];
    fprintf('%s\n', S);
end
floating value 1 occurs at index: 15, 
floating value 2 occurs at index: 3, 10, 11, 18, 
floating value 3 occurs at index: 13, 
floating value 4 occurs at index: 6, 20, 
floating value 4 occurs at index: 1, 2, 
floating value 5 occurs at index: 14, 
floating value 5 occurs at index: 19, 
floating value 6 occurs at index: 16, 
floating value 7 occurs at index: 9, 
floating value 8 occurs at index: 5, 7, 12, 
floating value 9 occurs at index: 4, 
floating value 10 occurs at index: 17, 
floating value 10 occurs at index: 8, 

The 4.0000 entries at index 1, 2, 6, 20 all look the same, so why where they not all grouped together by findgroups?

Answer: just because they look the same to the display does not mean they are the same:

fprintf('%.16g, %.16g, %.16g, %.16g\n', random_float_data([1,2,6,20]));
4.00000000000001, 4.00000000000001, 4, 4

See, not really equal.

Why does this matter? It matters because those several -39.1 that you have might all look the same, but that does not guarantee that they are the same. Any time you are working with floating point numbers, you should assume that numbers that look the same might actually be different unless you are sure one is a copy of the other.

ismembertol() and uniquetol() exist to find approximate equality to within tolerance:

[ID, ~, G] = uniquetol(random_float_data);
for idx = 1 : length(ID)
    S = sprintf('floating value %g approximately occurs at index: ', ID(idx));
    S = [S, sprintf('%d, ', find(G==idx))];
    fprintf('%s\n', S);
end
floating value 1 approximately occurs at index: 15, 
floating value 2 approximately occurs at index: 3, 10, 11, 18, 
floating value 3 approximately occurs at index: 13, 
floating value 4 approximately occurs at index: 1, 2, 6, 20, 
floating value 5 approximately occurs at index: 14, 19, 
floating value 6 approximately occurs at index: 16, 
floating value 7 approximately occurs at index: 9, 
floating value 8 approximately occurs at index: 5, 7, 12, 
floating value 9 approximately occurs at index: 4, 
floating value 10 approximately occurs at index: 8, 17, 

Walter Roberson on 25 Nov 2020

We are doing comparison between two data sets, including using the time entries of one of them as locations to interpolate at. The comparison will not work well if any of the entries are nan, and interpolation will fail if any of the times being interpolated at are nan.

Now, when we use computation to arrive at the numbers we are using, a nan output is not especially likely. It can happen in some cases of overflow. It can happen if you interpolate out of range. So not never but not common .

But we are not using computation to arrive at the numbers. We are reading the numbers from an excel file. And when you read from excel files, you get nan in any entry that was not entered as a proper number, and you get nan where there are columns that are blank when other columns are not blank, and you get nan if you ask to read more data than exists in the file. When you read from an excel file that a user might have edited by hand, you should assume that the user made a mistake in the file and that you will have nan entries in the data you read in.

Because of that, I take steps to detect and remove rows that have nan in them.

I also do not assume that ET and EV are the same length. In context of the code I posted where they are derived from columns of NTCVTCVS then they would be the same length, but I did not want to assume that you would retain the same way of providing that data.

The code

if length(ET) > length(EV)
    EV(end:length(ET)) = nan;
end
if length(EV) > length(ET)
    ET(end:length(EV)) = nan;
end

is pre-occupied with ensuring that ET and EV are the same length. If one of the two is shorter, that one is extended with nan to match the length of the other one. If you were sure, due to the way you get the input, that the two are the same length, then you could skip those lines.

mask = any(isnan([ET, EV]), 2);
ET(mask) = [];
EV(mask) = [];

is detecting nan entries. Those might be due to a shorter variable having been extended to match the size of a longer variable, or they might be due to there having been bad data in the file, or they might be due to empty data in the file, or due to asking for more data than was in the file. For example if the file had

and you ask to read 6 rows, then the data that would be read in would be

1 10
2 nan
3 30
4 nan
nan 50
nan nan

and you would be wanting to get rid of all rows except the 1 and 3 rows. any(isnan(),2) is true if there is at least one nan in the row -- so in this example it would be

false
true
false
true
true
true

and the entries that are true would correspond to entries you need to remove from the variables, which you do by using the MATLAB idiom of assigning literal [] to the entries.

Walter Roberson on 25 Nov 2020

Edited: Walter Roberson on 25 Nov 2020

We now have a vector of times, ET, that might have exact duplicates, and might have almost-exact duplicates that differ in the last few bits. And we want to know which of the entries are exact or near duplicates of each other. We can do that by asking uniquetol(ET) .

%{
[NET, ~, G] = uniquetol(ET);
%}

The first output of uniquetol() is an ascending list of values that all of the values in ET are "close enough" to. So if there was a -38.4 visible in the file, then it just might happen to be

format long g

t = sym(-38.4, 'f'), vpa(t, 50), double(t)

t =

ans =

-38.4

which is the closest representable number to decimal -384/10 . Notice that the last of those outputs does not mean that the value is exactly -38.4, just that it rounds to that to display precision; the actual internal value is the one shown on the line above that.

But perhaps instead the actual value is

format long g

t2 = sym(-38.4*(1+eps), 'f'), vpa(t2, 50), double(t2)

t2 =

ans =

-38.4

which you can see displays the same in decimal to the display precision, but the actual internal value is not the same. t and t2 are adjacent representable numbers in IEEE 754 double precision; double precision cannot exactly represent any value between the two (such as 384/10 exactly).

When you look at your display and see -38.4 then you do not know which of these two numbers is the one that is actually stored. unique([t, t2]) would give two values because it knows that the two are not exactly equal. uniquetol() would say that the two are "close enough" to match.

Now, when you do the uniquetol(), it would emit one of those two numbers as the "representative" number for the cluster near -38.4, but it would take a lot more study to be able to figure out ahead of time exactly which number it would choose as representing the cluster. one of them will be put into the NET output in

%{
[NET, ~, G] = uniquetol(ET);
%}

(assuming, that is, that it these are what exactly is stored, rather than some other value.)

So NET will be a list of values that each of the other entries clusters "close enough" to.

The third output, G, will be an integer vector with length the same as ET that indicates, for each entry in ET, which of the cluster values in NET that the ET value was "close enough" to. So for example G(2) == 7 would indicate that ET(2) was considered to be "close enough" to NET(7) to be part of the same cluster.

In the context of your data: all of the entries that are "close" to -38.4 would have the same value as each other in their G locations, and all of the entries that are "close" to -38.5 would have the same value as each other in their G locations, and so on. All of the values "close" to -38.4 might show up as group #7 for example.

The tricky part of the code is the two lines after that:

%Note that I corrected these two lines back in the posted code
mins = accumarray(G(:), EV(:), [], @min);
maxes = accumarray(G(:), EV(:), [], @max);

accumarray() is a very handy function that is worth spending time studying. In this context, for each different G value, it accumulates a list of corresponding EV values, and once each of the lists is built up, it takes min() or (second call) max() of each list individually. So for example the 7th output in the vector would be

mask = G == 7;
mins(7) = min(EV(mask));
mask = G == 8;
mins(8) = min(EV(mask));

and so on. So for all of the time values that are "close enough" to -38.4 it would hunt through the corresponding experimental voltage values and find min() of them for the mins assignment, and max() of them for the maxes assignment. You are left with two vectors, each having one entry for each cluster of times, one of the values being min() of the experimental voltages for each time, and the other being max() of the experimental voltages for each time.

After that the code takes all of the nominal (representative) times and interpolates the simulated values (the ones calculated only at integer times) for each representative time. The step after that finds the difference between the interpolated value and the minimum experimental value for that cluster of times, and then goes for the difference between the interpolated value and the maximum experimental value for that cluster of times. This because you are interested in the deviance for each cluster of times -- the furthest experimental data away from the projected simulated value.

Sign in to comment.

How to find the difference between two plots while the dimension of the matrices are different?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

15 Comments
Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to find the difference between two plots while the dimension of the matrices are different?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

15 Comments Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

15 Comments
Show 13 older commentsHide 13 older comments