# How to find the difference between two plots while the dimension of the matrices are different?

2 views (last 30 days)
Xingjian Zhao on 19 Nov 2020
Edited: Walter Roberson on 25 Nov 2020
I'm currently testing the NTC thermistors' characteristics, from -40~150Celcuis; so i have two set of data; the first one is the "expected Value" from my calculation & Simulation, Which is the V-T (Voltage-Temperature)graph of the NTC thermistor, and the temperature value is integer, from -40 to 150 and each temperature has only one voltage value.
But the second one is the data i got from actually testing, the temperature value contains decimal points, for each degree, it can have up to 20numbers between one degree, and it can have up to 10 numbers for each of the decimal numbers.
So how can i find the "maximum deviation" "Maximum difference" between these two plots with different dimension of the matrices, which is [191*1] and [9656*1]
191 for -40~150 degree celcius in integer, 11657 for -40~150 degree celcius contain decimal numbers.
T_Cal=NTCVTCVS{:,6};
V_Toff_Estimate=NTCVTCVS{:,7};
T_Test=NTCVTCVS{:,3};
V_Toff_Test=NTCVTCVS{:,4};
figure
hold on
plot(T_Cal, V_Toff_Estimate,'b')
plot(T_Test, V_Toff_Test,'r')
legend('Estimate','Test');
I would like to know at which point, the Estimate and Test Value have the largest deviation, but theset two matrices have two different dimensions, so can someone please help me with that?
I've done a lot of reserching, and there are some people giving ideas but not really work on mine.
Fill most of the data with NAH? to make these two matrices in the same size so that matlab can do the calculation
Try using interp1? im not familiar with that command and i am getting errors on that

Walter Roberson on 19 Nov 2020
subplot(1,2,1)
plot(x1, y1, 'b', x2, y2, 'r')
subplot(1,2,2)
minx = max( min(x1), min(x2) );
maxx = min( max(x1), max(x2) );
x1_mask = x1 >= minx & x1 <= maxx;
x2_mask = x2 >= minx & x2 <= maxx;
common_x = union( x1_common, x2_common ); %find all x values that overlap
y1_interp = interp1(x1_common, y1_common, common_x); %interpolate first to common x
y2_interp = interp1(x2_common, y2_common, common_x); %interpolate second to common x
y_diff = y1_interp - y2_interp; %calculate the difference
plot(common_x, y_diff) %plotting the difference

Xingjian Zhao on 24 Nov 2020
Man, thank you so much! appreciate, I finally get what i want, but i'm a little confused on the code,
if length(ET) > length(EV)
EV(end:length(ET)) = nan;
end
if length(EV) > length(ET)
ET(end:length(EV)) = nan;
end
%get rid of nan
and
[NET, ~, G] = uniquetol(ET);
could you please expain how these code work? i don't quite understand why are we using these code and how these code works, especially this part
[NET, ~, G] = uniquetol(ET);
Walter Roberson on 25 Nov 2020
We are doing comparison between two data sets, including using the time entries of one of them as locations to interpolate at. The comparison will not work well if any of the entries are nan, and interpolation will fail if any of the times being interpolated at are nan.
Now, when we use computation to arrive at the numbers we are using, a nan output is not especially likely. It can happen in some cases of overflow. It can happen if you interpolate out of range. So not never but not common .
But we are not using computation to arrive at the numbers. We are reading the numbers from an excel file. And when you read from excel files, you get nan in any entry that was not entered as a proper number, and you get nan where there are columns that are blank when other columns are not blank, and you get nan if you ask to read more data than exists in the file. When you read from an excel file that a user might have edited by hand, you should assume that the user made a mistake in the file and that you will have nan entries in the data you read in.
Because of that, I take steps to detect and remove rows that have nan in them.
I also do not assume that ET and EV are the same length. In context of the code I posted where they are derived from columns of NTCVTCVS then they would be the same length, but I did not want to assume that you would retain the same way of providing that data.
The code
if length(ET) > length(EV)
EV(end:length(ET)) = nan;
end
if length(EV) > length(ET)
ET(end:length(EV)) = nan;
end
is pre-occupied with ensuring that ET and EV are the same length. If one of the two is shorter, that one is extended with nan to match the length of the other one. If you were sure, due to the way you get the input, that the two are the same length, then you could skip those lines.
is detecting nan entries. Those might be due to a shorter variable having been extended to match the size of a longer variable, or they might be due to there having been bad data in the file, or they might be due to empty data in the file, or due to asking for more data than was in the file. For example if the file had
1 10
2 N/A
3 30
4
50
and you ask to read 6 rows, then the data that would be read in would be
1 10
2 nan
3 30
4 nan
nan 50
nan nan
and you would be wanting to get rid of all rows except the 1 and 3 rows. any(isnan(),2) is true if there is at least one nan in the row -- so in this example it would be
false
true
false
true
true
true
and the entries that are true would correspond to entries you need to remove from the variables, which you do by using the MATLAB idiom of assigning literal [] to the entries.
Walter Roberson on 25 Nov 2020
We now have a vector of times, ET, that might have exact duplicates, and might have almost-exact duplicates that differ in the last few bits. And we want to know which of the entries are exact or near duplicates of each other. We can do that by asking uniquetol(ET) .
%{
[NET, ~, G] = uniquetol(ET);
%}
The first output of uniquetol() is an ascending list of values that all of the values in ET are "close enough" to. So if there was a -38.4 visible in the file, then it just might happen to be
format long g
t = sym(-38.4, 'f'), vpa(t, 50), double(t)
t =
ans =
ans =
-38.4
which is the closest representable number to decimal -384/10 . Notice that the last of those outputs does not mean that the value is exactly -38.4, just that it rounds to that to display precision; the actual internal value is the one shown on the line above that.
But perhaps instead the actual value is
format long g
t2 = sym(-38.4*(1+eps), 'f'), vpa(t2, 50), double(t2)
t2 =
ans =
ans =
-38.4
which you can see displays the same in decimal to the display precision, but the actual internal value is not the same. t and t2 are adjacent representable numbers in IEEE 754 double precision; double precision cannot exactly represent any value between the two (such as 384/10 exactly).
When you look at your display and see -38.4 then you do not know which of these two numbers is the one that is actually stored. unique([t, t2]) would give two values because it knows that the two are not exactly equal. uniquetol() would say that the two are "close enough" to match.
Now, when you do the uniquetol(), it would emit one of those two numbers as the "representative" number for the cluster near -38.4, but it would take a lot more study to be able to figure out ahead of time exactly which number it would choose as representing the cluster. one of them will be put into the NET output in
%{
[NET, ~, G] = uniquetol(ET);
%}
(assuming, that is, that it these are what exactly is stored, rather than some other value.)
So NET will be a list of values that each of the other entries clusters "close enough" to.
The third output, G, will be an integer vector with length the same as ET that indicates, for each entry in ET, which of the cluster values in NET that the ET value was "close enough" to. So for example G(2) == 7 would indicate that ET(2) was considered to be "close enough" to NET(7) to be part of the same cluster.
In the context of your data: all of the entries that are "close" to -38.4 would have the same value as each other in their G locations, and all of the entries that are "close" to -38.5 would have the same value as each other in their G locations, and so on. All of the values "close" to -38.4 might show up as group #7 for example.
The tricky part of the code is the two lines after that:
%Note that I corrected these two lines back in the posted code
mins = accumarray(G(:), EV(:), [], @min);
maxes = accumarray(G(:), EV(:), [], @max);
accumarray() is a very handy function that is worth spending time studying. In this context, for each different G value, it accumulates a list of corresponding EV values, and once each of the lists is built up, it takes min() or (second call) max() of each list individually. So for example the 7th output in the vector would be
and so on. So for all of the time values that are "close enough" to -38.4 it would hunt through the corresponding experimental voltage values and find min() of them for the mins assignment, and max() of them for the maxes assignment. You are left with two vectors, each having one entry for each cluster of times, one of the values being min() of the experimental voltages for each time, and the other being max() of the experimental voltages for each time.
After that the code takes all of the nominal (representative) times and interpolates the simulated values (the ones calculated only at integer times) for each representative time. The step after that finds the difference between the interpolated value and the minimum experimental value for that cluster of times, and then goes for the difference between the interpolated value and the maximum experimental value for that cluster of times. This because you are interested in the deviance for each cluster of times -- the furthest experimental data away from the projected simulated value.

R2019b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!