Custom List / Arbitrary Sequence Sort
Version 1.0.3 (69.9 KB) by
Stephen23
Sort a text array into the order of custom lists / arbitrary text sequences. Alphabetic sorting for many non-English languages.
Summary
Sort the elements of a text array (string/cell of char/...) into the order of custom/arbitrary text sequences, e.g. non-english alphabets. Inspired by MS Excel's "custom list" sorting feature, but extended with case-insensitive partial text matching using powerful regular expressions, for any number of sequences.
For example:
>> A = ["LargeBurger", "MediumCoffee", "SmallCoffee", "MediumBurger"];
>> sort(A) % for comparison
ans = ["LargeBurger" "MediumBurger" "MediumCoffee" "SmallCoffee"]
>> arbsort(A, ["small","medium","large"])
ans = ["SmallCoffee" "MediumBurger" "MediumCoffee" "LargeBurger"]
And of course the sorting itself can also be controlled:
- ascending/descending sort direction
- character case sensitivity/insensitivity
- diacritic sensitivity/insensitivity
- literal/regular-expression matching
- whole/partial text matching
Alphabetic Sorting
ARBSORT is particularly useful for sorting text of languages for which ASCII/Unicode character-code order does not provide the correct alphabetic sort. ARBSORT does not provide the countless language-specific collation rules, but it does sort text into the order specified by the provided alphabet and equivalent character arrays.
>> Ae = {'yo', 'os', 'la', 'ño', 'va', 'ni', 'de', 'ña'};
>> alfabeto = num2cell(['A':'N','Ñ','O':'Z']); % Spanish alphabet
>> arbsort(Ae, alfabeto)
ans = {'de', 'la', 'ni', 'ña', 'ño', 'os', 'va', 'yo'}
Replacement Substrings
The sorting rules of some languages require certain characters to be replaced with (or considered equivalent to) other characters. For example, in German the eszett character "ß" is sorted as it was written as "ss", and in some circumstances vowels with umlauts are sorted as that vowel without an umlaut, suffixed with "e":
>> Ag = ["Goethe", "Goldmann", "Gurke", "Göbel", "Göthe", "Götz"]; % character code order
>> B1 = arbsort(Ag, ["ß";"ss"]) % DIN 5007 Variante 1
B1 = [ "Göbel", "Goethe", "Goldmann", "Göthe", "Götz", "Gurke"]
>> B2 = arbsort(Ag, ["ä","ö","ü","ß"; "ae","oe","ue","ss"]) % DIN 5007 Variante 2
B2 = [ "Göbel", "Goethe", "Göthe", "Götz", "Goldmann", "Gurke"]
>> Bg = arbsort(Ag, ["ß";"ss"], num2cell(['aä','b':'o','ö','p':'u','ü','v':'z'])) % Österreichische Sortierung
Bg = [ "Goethe", "Goldmann", "Göbel", "Göthe", "Götz", "Gurke"]
Examples
>> Ab = {'L', 'XS', 'S', 'M', 'XL', 'S', 'M', 'XL', 'XS', 'L'};
>> [Bb,Xb] = arbsort(Ab, {'XS','S','M','L','XL'})
Bb = {'XS', 'XS', 'S', 'S', 'M', 'M', 'L', 'L', 'XL', 'XL'}
Xb = [2,9,3,6,4,7,1,10,5,8]
>> Ac = ["medium_test", "high_train", "low_train", "high_test", "medium_train", "low_test"];
>> arbsort(Ac, ["train","test"], ["low","medium","high"])
ans = ["low_train", "low_test", "medium_train", "medium_test", "high_train", "high_test"]
>> Ad = ["test_three", "test_one", "test_ninetynine", "test_two"];
>> arbsort(Ad, @words2num) % download WORDS2NUM from FEX 52925.
ans = ["test_one", "test_two", "test_three", "test_ninetynine"]
Cite As
Stephen23 (2023). Custom List / Arbitrary Sequence Sort (https://www.mathworks.com/matlabcentral/fileexchange/132263-custom-list-arbitrary-sequence-sort), MATLAB Central File Exchange. Retrieved .
MATLAB Release Compatibility
Created with
R2010b
Compatible with R2009b and later releases
Platform Compatibility
Windows macOS LinuxTags
Acknowledgements
Inspired by: Natural-Order Row Sort, Natural-Order Filename Sort, Customizable Natural-Order Sort, Interactive Regular Expression Tool
Inspired: Customizable Natural-Order Sort, Natural-Order Row Sort, Natural-Order Filename Sort, Interactive Regular Expression Tool
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Discover Live Editor
Create scripts with code, output, and formatted text in a single executable document.