Finding the first 1 in a binary set using optimization

4 views (last 30 days)
Hello everybody,
Assume that a binary set V is given and the task is to find the first 1 in V. Note that
1) V is not sorted (so, we cannot benifit from binary search)
2) Although V is binary it is not known beforehand. It would be computationally expensive to tell
whether an element of V is 0 or 1. If I would know V then I could simple use the command find(V,1)
Therefore, I need to search efficiently. It seems that optimization is one efficient way to solve this problem. But,
unfortunately, it seems that the underlying optimization problem is not that easy to tackle. For instance, In
bellow, you see an example where I used genetic algorithm (I used this algorithm since it can tackle integer optimization problems)
clc;
A=[zeros(1,10^4) 1 zeros(1,100) 1 1 zeros(1,10^3)];
cost=@(i) i+10^8*(1-A(i)); % Here, in theory instead of 10^8, we should use 'infinity'
options = optimoptions('ga','Display','iter');
[x,f] = ga(cost,1,[],[],[],[],1,length(A),[],1,options)
Unfortunately, it often always fails to find the true solution which is x = 10^4+1. Rather it finds x = 1 as the
solution which is clearly wrong.
Any idea?
Thanks for your kind help in advance!
Babak
  3 Comments
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 27 Aug 2023
I apologize to use different letters. I should have used V in my code.
Please note that, the code I wrote was only and only a test. Even for this easy case, when
we know the set V beforehand, the optimization fails to find the correct solution.
Further, with a simple manipulation, it is possible to convert this code to a one where the set V is not known in advance (I could write a more complex objective function (the function 'Cost') ) so that it first finds A(i) and then proceeds. It could be as follows
cost=@(i)Cost(i);
function cost=Cost(i)
% Calculate V(i) % In my actuall problem this is a length code
cost = i+10^8*(1-V(i));
end
Sam Chak
Sam Chak on 27 Aug 2023
Edited: Sam Chak on 27 Aug 2023
I'm attempting to visualize your situation as the quest for searching the first Calculus book relative to the first call number order at the George Peabody Library. Ordinarily, you would peruse the library's catalogue, and after pinpointing the book, you would employ the call number to find it.
Let's assume that the library's homepage is inaccessible, and the digital catalogue is unavailable as well. Consequently, you engage with multiple librarians to aid you in physically locating the book. Is the metaheuristic optimization approach you're utilizing comparable to this scenario?
Image by Matthew Petroff / CC BY-SA.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 27 Aug 2023
Imagine that you had a case where location P is set, that all positive integer multiples of P were also set. Then suppose you examine location #60. If it is not set then you can rule out locations 1, 2, 3, 4, 5, 6, 10, 15, 20, 30 -- so in this hypothetical situation in which information about the value of one V entry could give you definitive information about the value of an earlier V entry, then there would be the potential for strategies that are better than linear search.
However, in your description about your matrices, it is not obvious to us that information about a particular V entry gives you definite information about any other V entry.
Let us think for the moment about the possibility that knowing (say) V(10) is 1 gives you the probabilistic result that V(1) has only (say) 1e-10 probability of being 1. Can you skip examining V(1) in that case? No! If your accumulated information so far gave V(1) only a 1 chance in 2^64 of being 1, then you would still need to examine V(1), because the task is not to find a "good approximation" or a "bounded cost best-effort" solution, but is instead to find the first entry in V with value 1. Knowing a probability distribution might help guide you in choosing upper bounds, but as long as there is any remaining probability in the first entry not yet examined, you still need to examine it.
Now let us consider the case where examining an entry cannot rule out an earlier entry. Suppose you have examined V(4) and found that it is 1, but you have not examine V(1) yet. Can you stop? There is nothing obvious to us in your description of your task that would allow you to stop without examining V(1): as best the volunteers have been able to make out, even knowing V(2) is 1 would not allow you to rule out V(1) being 1. So you always need to examine V(1) as best we can tell. There is a 100% probability that you will need to examine V(1) . If you were to examine V(2) before V(1) and found the value of V(2), would you be able to skip V(1) ? Not that we can tell: if V(2) == 1 then as far as well can see you would still need to examine V(1) . But examining V(2) first has a cost, and if V(1) turns out to be 1, you would have wasted the effort of examing V(2) first.
Likewise, suppose you examine V(3) first and determine it is 1. You examine V(1) and find it is 0. Can you skip examining V(2) and assume that V(3) is the first? It is not obvious to any of us that you could do that: as far as we can tell, you would still have to examine V(2), and if V(2) is 1 then the cost of examining V(3) first would have been wasted.
Yes, if you first examine V(10) and then examine V(5) and determine V(5) is 1 then you can skip V(6) through V(9) -- but you would have wasted the cost of evaluating V(10) first, and would have been better off examining V(5) first.
In any situation in which examining a latter entry cannot rule out an earlier entry, then by induction you can show that the most cost effective method is to examine V(1) then V(2) then V(3) and so on. A linear search, in other words. There is a 100% chance you will need to evaluate V(1). If V(1) == 0 then there is a 100% chance you need to examine V(2) . If V(2) == 0 there is a 100% chance you will need to evaluate V(3)... and so on.

More Answers (3)

Image Analyst
Image Analyst on 27 Aug 2023
Why not simply do
V=[zeros(1,10^4) 1 zeros(1,100) 1 1 zeros(1,10^3)];
index = find(V == 1, 1, 'first')
index = 10001
???
  6 Comments
Image Analyst
Image Analyst on 27 Aug 2023
Not really. Is there any difference between A and V? And if you want to somehow access A(27), like assign it to a new variable, that is virtually instant. Even generating a logical vector telling if A or V is exactly equal to 1 is virtually instant (unless A or V is hundreds of millions of elements long).
But it looks like your responses to @Bruno Luong that you don't really want find() regardless because you are trying to learn/use optimization, like a genetic algorithm or something: "I want to tackle it (via optimization) " and so are not interested in using standard, traditional functions.
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 27 Aug 2023
Yes, A and V are the same (I should have always used V. Sorry)
Next, you did not get what I meant. Yes, if the set V is known beforehand then an operation like x = V(27) is super fast. But, what if V is not known beforehand? Do you want me to tell you my actual problem? Then it is long and deviates us from the core of the problem. But, to give you an example consider the following example:
Assume that the matric X is given. Assume further that we have a dynamical system and consider an equilibrium of this system. The task is to see whether each column of X being used as initial condition for the dynamical system eventually converges to the equilibrium E or not (for instance, it can converge to another equilibrium or it can diverge to infinity). Now, define the set V as: V(i) = 1, if the i-th column of X (used as initial condition for our dynamical system) ends up into E, otherwise V(i) = 0. How to know that i-th initial consition ends up into the equilibrium E? It either einds up into E or it wont. The only way we can determine V(i) is to simulate our dynamical system and see where it ends up. This is time conssuming. So, here, I gave you asn example where we have a binary set V but determining its elements is expensive.
Was this clear?
You are wroing to think that "I do not want to use the command 'find' because I am trying to learn/use optimization". By the example I gave you you will clearly see that the use of 'find' command is not practical. That is why I intend to use optimization. Is this clear?

Sign in to comment.


Bruno Luong
Bruno Luong on 27 Aug 2023
Edited: Bruno Luong on 27 Aug 2023
You ask the same question several time. Without any other a priori knowledge, scan your array until 1 is meet.
There is no miracle algorithm to find it since there is not other way to guess where 1 first appears.
  6 Comments
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 27 Aug 2023
I did not completely understand your example but I agree with you that if the set V does not have autocorrelation then the optimization often fails (note that V is not random, it is deterministic but inspecting its values is time conssuming).
But, the elements of set V, often, have some rather high degrees of correlation. Therefore, if I can use optimization it should be compoutationally cheaper.

Sign in to comment.


John D'Errico
John D'Errico on 27 Aug 2023
Edited: John D'Errico on 27 Aug 2023
If V is given, then find(V,1) is perfect, and it will not be improved upon. And you say that V is given! So what is your question?
MAYBE, MAYBE, you mean that V is given, but unknown in a sense, that you cannot know the value of V unless you interrogate that element? That seems consistent with your comments.
Now, your goal is to somehow magically use optimization to find that first element. The problem is, suppose you sample element 100, this V(100), and you find that is is a 1. Is that the FIRST element of V that is 1? You can never know. And nothing will tell you that is the case, unless you evaluate EVERY element that precedes that element. This tells me if your goal is to find the FIRST element of this unknown binary vector that is a 1, then you need to simply start at the beginning, and test every element. Optimization cannot help you, since the unit elements of V are presumed to be random as far as we know. Optimization is meaningless in this case. I'm sorry, but it is.
Ok, is there anything we can do? Suppose V has length 1024. Evaluate V at some intermediate position., say V(512). If V(512) is a 1, then you know that the first element that is a 1 must lie no further than element 512. But if V(512)==0, then you know nothing. Again, this suggests your best scheme is to simply start at the very begnning.
It feels like it might help if we know the probability that any element of V is a 1. Now you could use probability theory to know the distribution of the first non-zero element, if any element has probability P of being a 1. That could tell you where to start looking, but does it really help? NO. In the end, you will always need to test the first element of V, and if is zero, then test the second element, and so on, because knowing if a later element of V is 1 does not tell you anything.
I'm sorry, but if you absolutely need to know the first element of V, then you need to start your search at element 1, and then proceed. This is the only way to know if you have found the first 1.
  4 Comments
John D'Errico
John D'Errico on 27 Aug 2023
Edited: John D'Errico on 27 Aug 2023
No. It does not at all justify your claim. Not even remotely.
An optimization will evaluate the vector V at multiple points. That you don't see that happening is irrelevant. You are not a little child, where if you don't see something happen in front of your nose, it does not happen. Function evals called by GA are no less costly than function calls in a loop, but in fact more costly, since you now have the overhead of GA.
A simple loop would have been more efficient, and the ONLY possible way to know that you have found the first true element in V.
An optimizer like GA will not know that no earlier element is 1, UNLESS IT TESTS ALL PREVIOUS VALUES. GA stops sometimes early, because it got lucky. And as you have seen, often GA will get it wrong. If you are looking for a probabilistic scheme, that MIGHT be different. But even then, I see no better scheme than to start at the beginning.
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 27 Aug 2023
Yes. If the optimization does not work well but gives me an answer which is not so far from the first 1 then still I would be happy.
Anyway, thanks for your kind helps.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!