Minimize difference between two populations.

Question

Tony Elias on 2 Oct 2017

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/359352-minimize-difference-between-two-populations

Commented: John D'Errico on 2 Oct 2017

Hi, I have this problem: I have a dataset composed by 2 vectors, each composed by 100 elements. I need to find the "best fit" of 50 values for each vector in order to minimize the difference between them.

I think I should set up a minimization problem but i don't know how to achieve it in matlab...

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 2 Oct 2017

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/359352-minimize-difference-between-two-populations#answer_284003

Edited: John D'Errico on 2 Oct 2017

Open in MATLAB Online

If I understand you correctly, you have two vectors, call them A and B.

N = 20;
A = randn(N,1);
B = randn(N.1);

Now, you want to extract subsets sA and sB of each vector, each of length J, SUCH THAT

norm(A(sA) - B(sB))

is minimized. YIKES!

This is a highly combinatorial problem. Finding a truly optimal solution will be nasty. If N is at all large, the problem will become highly computational.

For example, suppose that N=20 with J = 10. So you want to reduce each set by 50%. Effectively, this requires us to sort through nchoosek(20,10) possible subsets of A, but also nchoosek(20,10) subsets of B.

Worse, we are not just comparing those subsets in any arbitrary order. So we need to consider all permutations of one of those subsets. So the size of the solution space is...

nchoosek(N,J)^2*factorial(J)
N = 20;
J = 10;
nchoosek(N,J)^2*factorial(J)
ans =
      1.23868287980237e+17

So the search space is immense and combinatorial. You cannot use a classical optimization tool. fmincon is out. It cannot be used for combinatorial problems. intlinprog is out, because the objective is not linear. I suppose you could formulate it in the form of a binary nonlinear programming problem. So optimize a binary vector that essentially extracts subsets of each of A and B. Even that fails though, because the order of the subsets are important.

Simple, but sub-optimal would be a greedy algorithm. At each iteration, just find the two closest points of the two sets. Pair them up, and then reduce A and B by those elements. Then repeat.

This would do the work to find the closest pair.

[IDX,D] = knnsearch(A,B);

For example, in the random set I generated...

[dmin,ind] = min(D)
dmin =
       0.00148090843713233
ind =
     7
[B(ind),A(IDX(ind))]
ans =
         0.726885133383238         0.725404224946106

Now just find the next closest pair, after those elements have been removed from A and B. Repeat until you have chosen J elements.

The above is a totally greedy algorithm, that may be reasonable if J is quite small compared to N.

Good luck however, if J is large compared to N, especially if N is large.

2 Comments
Show NoneHide None

Tony Elias on 2 Oct 2017

This is a great idea! Thank you! I didn't know about the existance of that function. I'll try out now

John D'Errico on 2 Oct 2017

I think it is fairly new. Some recent release introduced it, as part of the stats toolbox.

Sign in to comment.

Answer 2

Jan on 2 Oct 2017

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/359352-minimize-difference-between-two-populations#answer_283982

What exactly is "a data set"? Do you mean a [2 x 100] matrix? What is "the difference" you want to minimize? Does the order of the elements matter? Do you mean the Euclidean distance between two subvectors of the size [1, 50]? If so, contemplate. Start with thinking. How many solutions are possible? Would a brute force attack solve the problem or is an exhaustive search impossible in the next billion of years?

There is a huuuge number of ways to choose 50 elements out of 100. So think again: Would an approximation be enough? Perhaps a genetic algorithm or another global optimization technique?

Start with defining the problem exactly in a mathematical sense.
Then think about the magnitude of the problem.
Afterwards it might be important to know why you want to solve this: Is it a homework and you hear a lesson about global optimization or Monte Carlo methods? Did an evil professor ask you to solve this? Did this problem occur in your field of science, you asked your colleagues and they forwarded you to the Matlab forum?

1 Comment
Show -1 older commentsHide -1 older comments

Tony Elias on 2 Oct 2017

Edited: Tony Elias on 2 Oct 2017

Open in MATLAB Online

The problem is simple, I'm sorry for my explanation.

I have 2 vector of length n. I have to find the best j values for each of these 2 vectors in order to minimize the difference of the means of this two j values vectors. Ex:

n=10
j=4
A=[0 1 2 0 1 0 0 0 0 0]'
B=[0 0 0 0 0 0 0 0 0 0]'
Now of course for B the mean is = 0. We take the vector B*=[0 0 0 0]'
For A i find a 4x1 vector A* which minimizes the difference between mean(B*) and mean(A*).
In this case is A*=[0 0 0 0]';

I thougth achieve this with this obj fun:

min A*x/j - B*y/j 
and sum(x) = sum(y) = j
where x and y are vectors composed by 1 and zeros

Sign in to comment.

Minimize difference between two populations.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Minimize difference between two populations.

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments Show NoneHide None

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments