How to quickly group numerical data without giving bin sizes
Show older comments
I am trying to find an efficient and quick way to group numerical data. In short, I have several paths towards a particular pixel, and these paths consist of rays of slightly different lengths (as any ray that crosses the pixel anywhere is valid for a path). These paths can therefore be considered groups of rays. I want to differentiate the paths by their (average) length and select the path that contains the largest amount of rays, or, in other words, identify the groups and select the largest group.
Importantly though, I do not just need the length, but also an index to identify one ray, e.g. the "middle" one of the group. (Say I have an array of size 10, and the first 7 and last 3 elements form 2 groups. I would like to identify the groups, then, out of the 7 elements of the larger group, I would like to get the index of the 4th element as the "middle".)
My current solution is to round the ray lengths (to third decimal, as the pixel size is on the millimeter scale) and use the "mode" function, however, this is both inefficient (because I want to do this column-wise for a matrix that also contains NaN that I would like to ignore) and in some cases inaccurate. For example:
array = [0.2248 0.2249 0.2250 0.2251 0.2399 0.2400 0.2401];
array2 = round(array,2);
mode(array2)
Of course it would be logical to group the first four entries and the last three, but the rounding operation is ill-suited when the values vary around the .5. I have used to Histogram function to plot examples in my code and it groups the entries in a satisfactory way, however, I actively do not want to have the plot itself, I just need the grouping, and the histogram function seems to have a rather large overhead for this purpose (as this operation has to be performed thousands of times for a proper run of the program). The discretize function unfortunately needs me to give it an explicit number of bins, i.e. I would need to have an a priori idea of the groups.
Is there any function that can efficiently do this, or are there suggestions for a better way to do it myself than "mode"?
Accepted Answer
More Answers (0)
Categories
Find more on Noncentral t Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!