Find continuous file name jump

1 view (last 30 days)
Tsuwei Tan
Tsuwei Tan on 9 May 2018
Commented: Tsuwei Tan on 9 May 2018
I have dozen thousand files with names like the following:
SHARK_225054651_41_0547_r001
SHARK_225054651_41_0548_r005
SHARK_225054651_41_0548_r009
...
SHARK_225054651_41_0619_r121
SHARK_225054651_41_0620_r125
...
SHARK_225062101_41_0621_r001
SHARK_225062101_41_0621_r005
SHARK_225062101_41_0622_r009
...
SHARK_225062101_41_0653_r121
SHARK_225062101_41_0654_r125
each file's name end up with .....r%%%, the three %%% digits are 001, 005,....up to 121, 125. Total thirty-two with increment equals four and the same SHARK_%%%%%%%%%_%%_%%%%_r%% name before _r%%%. Then another file starts over with the same r%%% iteration.
However, the actual file has a "jump" for instance r005 is missing between r001 and r009.
Is there a way to read out the file name with some logic loop to pick up the missing one?

Accepted Answer

Stephen23
Stephen23 on 9 May 2018
Edited: Stephen23 on 9 May 2018
C = { % fake data:
'SHARK_225054651_41_0548_r001'
'SHARK_225054651_41_0548_r005'
'SHARK_225054651_41_0548_r009'
'SHARK_225054651_41_0548_r021'
'SHARK_225054651_41_0548_r025'
'SHARK_225062101_41_0621_r005'
'SHARK_225062101_41_0621_r009'
'SHARK_225062101_41_0621_r013'
'SHARK_225062101_41_0621_r025'
};
T = regexp(C,'^(\w+)r(\d{3})$','tokens','once'); % split
A = cellfun(@(c)c(1),T); % 1st token
B = cellfun(@(c)c(2),T); % 2nd token
[U,~,X] = unique(A(:)); % 1st token unique only
V = str2double(B(:)); % 2nd token -> numeric values
G = 1:4:25; % the required values (change this to 1:4:125).
C = accumarray(X,V,[],@(v){setdiff(G,v)});
The outputs of interest to you are:
  • U the unique groups, e.g. 'SHARK_225054651_41_0547_' and 'SHARK_225062101_41_0621_' in my example fake data.
  • C the missing rXXX values.
These outputs are shown here:
>> U{1}
ans = SHARK_225054651_41_0548_
>> C{1}
ans =
13 17
>> U{2}
ans = SHARK_225062101_41_0621_
>> C{2}
ans =
1 17 21
You could easily loop over these, or display them in the command windows or your GUI, etc:
>> Z = [U,cellfun(@num2str,C,'uni',0)]';
>> for k = 1:numel(U), fprintf('%s: %s\n',U{k},sprintf('%3d, ',C{k})); end
SHARK_225054651_41_0548_: 13, 17,
SHARK_225062101_41_0621_: 1, 17, 21,

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!