Determine row vector out of matrix with most evenly spaced and distributed values

Question

Ahmad on 5 May 2024

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/2115301-determine-row-vector-out-of-matrix-with-most-evenly-spaced-and-distributed-values

Edited: John D'Errico on 5 May 2024

I have calculated results in a matrix with 100x20 values. Now I want to find the one row vector out of this matrix where the values (2:19) are most evenly spaced between these two boundary values (2) and (19). First and last value of the row don't need to be considered. Boundary values are different per each row, but in a similar range. Rows are already sorted.

Example with less values:

M(2,2:8) = [2.1 2.5 2.9 3.2 3.5 3.7 5.8 8.9] would be a bad one

M(20,2:8) = [2.0 2.9 3.7 4.4 5.3 6.2 7.0 7.9] would be a better one

Does someone have a good idea, how to do that?

1 Comment
Show -1 older commentsHide -1 older comments

Torsten on 5 May 2024

Edited: Torsten on 5 May 2024

So you search for a mathematical expression to measure "most evenly spaced" ?

Do your numbers have to cover a certain interval "most evenly" ?

And why do you think M(2,2:8) is worse than M(20,2:8) ?

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 5 May 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2115301-determine-row-vector-out-of-matrix-with-most-evenly-spaced-and-distributed-values#answer_1452526

Edited: John D'Errico on 5 May 2024

Open in MATLAB Online

Simple. Sort of. But you need to define what equal spacing means to you, and how you will measure the deviation from equal spacing. I'll make up a simple array.

A = sort(randn(12,10),2)
A = 12x10
   -1.3531   -0.2857   -0.1567    0.0245    0.2589    0.3649    0.5003    0.6332    0.9667    1.0794
   -1.7414   -1.7294   -1.0535   -0.1502   -0.0227    0.2560    0.6755    0.8847    0.9662    0.9917
   -1.5032   -0.5306   -0.3986   -0.2131    0.0775    0.2356    0.2688    1.0318    1.8646    2.0014
   -1.0865   -0.9003   -0.8249   -0.8086   -0.6752   -0.6246   -0.0647   -0.0051    0.1303    1.4062
   -1.2505   -0.4619   -0.3885    0.3655    0.5612    0.7090    0.9345    1.1863    1.2119    1.6337
   -1.0211   -0.9281   -0.5193    0.5549    1.0672    1.1005    1.9367    1.9414    1.9444    2.0160
   -1.6603   -1.6308   -0.9662   -0.6644    0.0338    0.0644    0.1684    0.3133    0.8591    1.3865
   -1.8653   -0.5368   -0.3114   -0.1596    0.2480    0.4386    0.7890    0.8707    1.0441    1.3907
   -2.2163   -1.9454   -1.1586   -0.6006   -0.1521   -0.0530    1.2626    1.6694    2.1450    2.4063
   -1.0122   -0.8694   -0.7798   -0.6724   -0.4959   -0.2683    0.1921    0.6581    1.3302    1.3377
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

So each row of A is increasing in sequence. But some of those rows are probably more uniformly spaced. Start by using diff.

Adiff = diff(A,[],2)
Adiff = 12x9
0674    0.1290    0.1812    0.2344    0.1060    0.1354    0.1329    0.3335    0.1127
0120    0.6758    0.9033    0.1275    0.2787    0.4194    0.2093    0.0815    0.0254
9726    0.1321    0.1854    0.2906    0.1581    0.0333    0.7630    0.8328    0.1368
1863    0.0754    0.0162    0.1334    0.0507    0.5599    0.0596    0.1354    1.2759
7886    0.0734    0.7540    0.1957    0.1478    0.2255    0.2518    0.0256    0.4218
0930    0.4087    1.0742    0.5123    0.0333    0.8362    0.0047    0.0030    0.0716
0295    0.6646    0.3018    0.6982    0.0306    0.1040    0.1449    0.5458    0.5274
3285    0.2254    0.1518    0.4075    0.1906    0.3504    0.0817    0.1734    0.3466
2710    0.7867    0.5581    0.4485    0.0991    1.3155    0.4068    0.4757    0.2613
1428    0.0897    0.1074    0.1765    0.2276    0.4604    0.4660    0.6720    0.0075
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

We now have a list of differences. Those differences are the stride between each consecutive pair of numbers. Now it is your turn to make a decision.

T = table(min(Adiff,[],2), ...
   max(Adiff,[],2), ...
   std(Adiff,[],2)./median(Adiff,2), ...
   max(Adiff,[],2)./min(Adiff,[],2), ...
   kurtosis(Adiff,[],2));
T.Properties.VariableNames = {'Min stride','Max stride','Norm std','Max/Min','Kurtosis'}
T = 12x5 table
    Min stride    Max stride    Norm std    Max/Min    Kurtosis
    __________    __________    ________    _______    ________

      0.10599       1.0674       2.2729     10.071      6.3726 
     0.012005      0.90333       1.4766     75.245      2.5505 
     0.033257      0.97261       1.9425     29.246       1.696 
      0.01624       1.2759       3.0606     78.564      5.1301 
     0.025609      0.78865       1.2392     30.796      2.2152 
    0.0029964       1.0742        4.289     358.51       2.219 
     0.029534      0.69815      0.90632     23.639      1.3493 
     0.081688       1.3285       1.6771     16.263      6.0903 
     0.099122       1.3155      0.80054     13.272      3.8627 
    0.0074845      0.67203       1.2515      89.79      2.2265 
     0.019656      0.99879       2.9615     50.812      2.6513 
      0.01194      0.89351       1.5652     74.835      3.0905 

Is a row where some of those strides are REALLY tiny a bad thing? Is a row where ONE of those strides is really large bad? Which is worse? Maybe it might be the standard deviation, normalized by the median stride (column 3). Or (column 4) possibly you might decide to look at the ratio of the largest stride, divided by the smallest stride.

In this array, the 5th row would seem to be the best in terms of normalized standard deviation of the strides (1.2392), since it has a relatively small normalized standard deviation of the strides. But if we compare the maximum stride divided by the min stride, then row 1 is better. Or perhaps kurtosis is a good measure here, since it measures how heavy are the tails of a distribution. (And kurtosis is automatically normalized.)

Arguably you do want to normalize those strides by their median or average value, since if you doubled all of the numbers in one row, then you might not want it to be measured as worse. That is...

A = sort(rand(1,10));
A = [A;2*A;3*A];
Adiff = diff(A,[],2)
Adiff = 3x9
    0.1191    0.0139    0.0566    0.0480    0.1060    0.3093    0.0800    0.0120    0.0703
    0.2382    0.0278    0.1131    0.0960    0.2121    0.6187    0.1600    0.0241    0.1405
    0.3573    0.0417    0.1697    0.1440    0.3181    0.9280    0.2401    0.0361    0.2108
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

All three rows should be, to me at least, as being identical in their spacings in terms of goodness. And that suggests you want to normalize things in some way.

Adiffnorm = Adiff./median(Adiff,2)
Adiffnorm = 3x9
    1.6951    0.1980    0.8051    0.6830    1.5092    4.4028    1.1390    0.1713    1.0000
    1.6951    0.1980    0.8051    0.6830    1.5092    4.4028    1.1390    0.1713    1.0000
    1.6951    0.1980    0.8051    0.6830    1.5092    4.4028    1.1390    0.1713    1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Now each of those rows are seen to be equivalent, as I think they should. In the end though, you need to define what is good or bad. The simple example you gave might seem to say it all, but it does not.

2 Comments
Show NoneHide None

Ahmad on 5 May 2024

Thanks John, your proposals help a lot to find a good way for my evaluation!

John D'Errico on 5 May 2024

Edited: John D'Errico on 5 May 2024

Open in MATLAB Online

Excellent. Odds are there are many measures you could use. My thoughts would be to take some of your data. Maybe plot it. For example, a set with a constant stride would show up as a perfectly linear plot.

A = sort(randn(12,10),2)

A = 12x10

-1.6724 -1.0698 -0.9837 -0.9073 -0.4877 -0.3854 -0.2887 -0.1449 0.2297 1.0068 -0.9884 -0.5906 -0.5593 -0.3321 -0.1977 0.0239 0.4641 0.6662 0.7083 2.1204 -1.5585 -0.8932 -0.7908 -0.7702 -0.7138 -0.5487 -0.2845 0.2091 0.2583 1.9670 -2.8522 -0.8535 -0.5428 -0.4398 0.1563 0.1574 0.2039 0.3910 0.9089 1.7343 -2.1003 -1.4418 -1.1302 -0.9000 -0.8716 -0.5205 0.0245 0.2864 0.3186 2.2926 -1.5674 -0.9739 -0.8680 -0.6090 -0.5666 0.3659 0.5800 0.7947 1.0988 1.4289 -1.5809 -1.5215 -0.8630 -0.3208 -0.2255 -0.1635 0.3308 0.9677 1.3600 1.4122 -2.3213 -1.1710 -0.7835 -0.6771 -0.2087 -0.2076 0.8963 1.4512 1.6049 3.7787 -1.0800 -0.4816 -0.3866 -0.1896 0.0485 0.0797 0.1745 0.4604 0.5195 1.5534 -2.2878 -1.1352 -0.6783 -0.1652 0.0592 0.1126 0.8074 1.1636 1.4416 1.9326

<mw-icon class=""></mw-icon>

plot(1:10,A,'-')

And maybe that does not help any. But it suggests another idea. The correlation coefficient of such a relationship would be as large as possible. In this case, since the stride would always be positive, the correlation would be +1.

C = corr([1:10;A]');
C(1,2:end)
ans = 1x12
    0.9616    0.9351    0.8847    0.9089    0.9329    0.9857    0.9857    0.9537    0.9392    0.9796    0.9379    0.9730
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

We want the row of A with the highest correlation coefficient, when compared to a perfectly linear sequence.

[cmax,ind] = max(C(1,2:end))
cmax = 0.9857
ind = 6

So in this case, the curve that is most nearly perfectly linear would be the 4th one.

plot(diff(A(ind,:)))

What I see there is a set that has a fairly uniform set of strides, with one or two outliers. The problem is, in this case, there is one large outlier in stride near the middle of the row. And that will turn out to be of very weak influence on the correlation coefficient. As such, I'd suggest this idea of a correlation coefficient is probably a poor one, since had that outlier in striede been near the beginning or end of the row, it would change the result.

As I said, there would be many different schemes you could use. And no particular one would be perfect, since we don't have a mathematical definition of what is best. Anyway, I'd be looking for one of the other schemes I suggested, since correlation coefficient will not be robust.

Sign in to comment.

Determine row vector out of matrix with most evenly spaced and distributed values

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Determine row vector out of matrix with most evenly spaced and distributed values

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None