Most efficient way to search in text arrays
24 views (last 30 days)
Show older comments
Dear All,
I have a large array of textual information (model element IDs). In this array, I frequently have to find the index of particular IDs. What is the computationally most efficient way to do this? Store the IDs in a character array, cell array of characters or string array? What is the most efficient way for indexing when working with text?
Thanks for your help! Uwe
0 Comments
Accepted Answer
Alain Kuchta
on 21 Apr 2017
Edited: Alain Kuchta
on 21 Apr 2017
Here are two possible approaches to accomplish this:
1) Use strcmp and find with string array
This option is O(n) for each lookup; in the worst case, every string in ids will be checked in order to find query. You can also use a cell array of character vectors for ids, in my test a string array was slightly faster.
>> ids = ["M1","M2", "M3"];
>> query = "M2";
>> index = find(strcmp(ids, query) == 1)
index =
2
2) Use containers.Map with char arrays as keys and indices as values
This option is O(n) to setup, but O(1) for each lookup. Meaning that regardless of how many ids are in your map, looking up each one will take the same amount of time.
>> ids = {'M1', 'M2', 'M3'};
>> indices = 1:length(ids);
>> idMap = containers.Map(ids, indices);
>> query = 'M2';
>> index = idMap(query)
index =
2
Here is a performance comparison. At each size increment the average time to compute 500 random queries was measured for each approach. Each approach used the same set of queries at each size increment. In my case, for less than ~1000 elements, find with strcmp is faster. But as the number of elements grows, containers.Map is the clear winner.
1 Comment
Walter Roberson
on 21 Apr 2017
The O(1) reference sounds as if containers.Map is using hashing -- which is a possibility but not one I see documented ?
True O(1) would tend to imply that it is using Perfect Hash, as regular hashes that can have collisions would have an O(n) or O(ln(n)) or O(n*ln(n)) term for the worst case as the table fills up.
More Answers (1)
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!