I have a 2,000 row dataset representing measured data. I would like to create 60 partitions of 30 measurements (discarding the 200 outliers or "worst contributors") with the partitions created to minimize the differences in the sum of the measurements in each partition. Or, perhaps more simply, I want each partition to have as close as possible to the same sum of the measurements in partition.
My first attempt was based on a random sampling approach, which was inefficient as expected. I am considering a histogram-based approach for my next attempt, but wanted to sample the community for ideas or best practices first.