How to parse an Nx1 string array without looping through N
6 views (last 30 days)
Show older comments
I have an Nx1 string array, and I can't figure out how to extract 6 chunks of text out of it and into an Nx6 cell array. The text elements are numbers, but it's simplest to not treat them as numbers at this juncture.
Here is a toy version of the string array, together with code that correctly parses out the necessary elements of CCYYMMDD and hhmm from the first element of the string array:
stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
"nsasondewnpnC1.b1.20020428.220500.cdf"; ...
"nsasondewnpnC1.b1.20020428.235900.cdf"; ...
"nsasondewnpnC1.b1.20020429.013100.cdf"; ...
"nsasondewnpnC1.b1.20020429.182500.cdf"];
charLaunch = textscan(stringFile(1),'%*18c %2c %2c %2c %2c %*c %2c %2c');
charLaunch =
1×6 cell array
{'20'} {'02'} {'04'} {'28'} {'18'} {'48'}
However, both
charLaunchAll = textscan(stringFile,'%*18c %2c %2c %2c %2c %*c %2c %2c');
and
charLaunchAll = cell(5,6);
charLaunchAll = textscan(stringFile(:),'%*18c %2c %2c %2c %2c %*c %2c %2c');
generate the same error message:
Error using textscan
First input must be a valid file-id or non-empty character vector.
Is there a way to extract these pieces of texts out of every array member without building a loop?
0 Comments
Accepted Answer
Stephen23
on 23 Apr 2020
Edited: Stephen23
on 23 Apr 2020
C = {...
'nsasondewnpnC1.b1.20020428.184800.cdf'; ...
'nsasondewnpnC1.b1.20020428.220500.cdf'; ...
'nsasondewnpnC1.b1.20020428.235900.cdf'; ...
'nsasondewnpnC1.b1.20020429.013100.cdf'; ...
'nsasondewnpnC1.b1.20020429.182500.cdf'};
out = regexp(C,'\d{2}','match');
out = vertcat(out{:})
I used a cell array of character vectors, but it will also work for a string array.
5 Comments
Stephen23
on 23 Apr 2020
Edited: Stephen23
on 23 Apr 2020
"... why textscan will work with a single element of a string array, but not with an entire array of strings?"
Because low-level string parsing functions parse one string element or one character vector, and textscan is ultimately just a fancy wrapper for low-level operations.
You might think of a string array as one thing, but really it is a container array of multiple character vectors, i.e. it contains lots of individual, separate character vectors, which are stored separately. Not so different from a cell array, really (search this forum for more accurate and detailed discussions on how string arrays are actually implemented).
Parsing a string array introduces ambiguities: e.g. what is the end-of-line character? textscan relies on identifying that character... but parsing a string array would (possibly, see below) require having no EOL character at all, and instead treating each string element as being de-facto delimited by some character (in which case you can trivially do this yourself, as I did in my last comment). You might think it is obvious that each string element should be treated as one line, but computers do not understand "obvious", they understand instructions in the form of code. Consider how this 2x1 string array should be parsed:
str = ["1";"2\n3"] % \n = newline
which of these should textscan(str,'%f') return?:
- [1;2;3] all values, identify both newline AND different string elements as having de-facto EOL.
- [1;2] newline causes parsing to finish.
- [1] second element does not parse.
- {[1];[2;3]} the output is not of the class requested, and the cell contents can have an arbitrary size.
- error second element throws an error.
If you say the first is the correct behavior, what about the next user who expects one of the other behaviors?
Note also that text files also consist of one long character vector (people think of them as having "lines", but really they are all one long character vector interspersed with newline characters), and low-level file parsing functions also parse just that one character vector.
More Answers (1)
Mohammad Sami
on 23 Apr 2020
Since the pattern in your string seems to be the same, you can use the format specification to convert the string directly to datetime as follows.
stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
"nsasondewnpnC1.b1.20020428.220500.cdf"; ...
"nsasondewnpnC1.b1.20020428.235900.cdf"; ...
"nsasondewnpnC1.b1.20020429.013100.cdf"; ...
"nsasondewnpnC1.b1.20020429.182500.cdf"];
fmt = "'nsasondewnpnC1.b1.'yyyyMMdd'.'HHmmss'.cdf'";
% the constant portion of your string is enclosed in 'single quotes';
d = datetime(stringFile,'InputFormat',fmt);
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!