Most efficient way to search in text arrays

24 views (last 30 days)
Dear All,
I have a large array of textual information (model element IDs). In this array, I frequently have to find the index of particular IDs. What is the computationally most efficient way to do this? Store the IDs in a character array, cell array of characters or string array? What is the most efficient way for indexing when working with text?
Thanks for your help! Uwe

Accepted Answer

Alain Kuchta
Alain Kuchta on 21 Apr 2017
Edited: Alain Kuchta on 21 Apr 2017
Here are two possible approaches to accomplish this:
1) Use strcmp and find with string array
This option is O(n) for each lookup; in the worst case, every string in ids will be checked in order to find query. You can also use a cell array of character vectors for ids, in my test a string array was slightly faster.
>> ids = ["M1","M2", "M3"];
>> query = "M2";
>> index = find(strcmp(ids, query) == 1)
index =
2
2) Use containers.Map with char arrays as keys and indices as values
This option is O(n) to setup, but O(1) for each lookup. Meaning that regardless of how many ids are in your map, looking up each one will take the same amount of time.
>> ids = {'M1', 'M2', 'M3'};
>> indices = 1:length(ids);
>> idMap = containers.Map(ids, indices);
>> query = 'M2';
>> index = idMap(query)
index =
2
Here is a performance comparison. At each size increment the average time to compute 500 random queries was measured for each approach. Each approach used the same set of queries at each size increment. In my case, for less than ~1000 elements, find with strcmp is faster. But as the number of elements grows, containers.Map is the clear winner.
  1 Comment
Walter Roberson
Walter Roberson on 21 Apr 2017
The O(1) reference sounds as if containers.Map is using hashing -- which is a possibility but not one I see documented ?
True O(1) would tend to imply that it is using Perfect Hash, as regular hashes that can have collisions would have an O(n) or O(ln(n)) or O(n*ln(n)) term for the worst case as the table fills up.

Sign in to comment.

More Answers (1)

Uwe Ehret
Uwe Ehret on 22 Apr 2017
Dear Alain,
Thanks for your very helpful reply! The containers.Map option was new to me, but offers many very elegant and time-saving improvements to the program I am working on.
Uwe

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!