How to parse an Nx1 string array without looping through N

Question

0 votes

I have an Nx1 string array, and I can't figure out how to extract 6 chunks of text out of it and into an Nx6 cell array. The text elements are numbers, but it's simplest to not treat them as numbers at this juncture.

Here is a toy version of the string array, together with code that correctly parses out the necessary elements of CCYYMMDD and hhmm from the first element of the string array:

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
charLaunch = textscan(stringFile(1),'%*18c %2c %2c %2c %2c %*c %2c %2c');

charLaunch =

1×6 cell array

{'20'} {'02'} {'04'} {'28'} {'18'} {'48'}

However, both

charLaunchAll = textscan(stringFile,'%*18c %2c %2c %2c %2c %*c %2c %2c');

and

charLaunchAll = cell(5,6);
charLaunchAll = textscan(stringFile(:),'%*18c %2c %2c %2c %2c %*c %2c %2c');

generate the same error message:

Error using textscan

First input must be a valid file-id or non-empty character vector.

Is there a way to extract these pieces of texts out of every array member without building a loop?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

Stephen23 on 23 Apr 2020

Edited: Stephen23 on 23 Apr 2020

Open in MATLAB Online

0 votes

Using one simple regular expression:

C = {...
    'nsasondewnpnC1.b1.20020428.184800.cdf'; ...
    'nsasondewnpnC1.b1.20020428.220500.cdf'; ...
    'nsasondewnpnC1.b1.20020428.235900.cdf'; ...
    'nsasondewnpnC1.b1.20020429.013100.cdf'; ...
    'nsasondewnpnC1.b1.20020429.182500.cdf'};
out = regexp(C,'\d{2}','match');
out = vertcat(out{:})

I used a cell array of character vectors, but it will also work for a string array.

5 Comments
Show 3 older comments Hide 3 older comments

Stephen23 on 23 Apr 2020

Edited: Stephen23 on 23 Apr 2020

Open in MATLAB Online

"... why textscan will work with a single element of a string array, but not with an entire array of strings?"

Because low-level string parsing functions parse one string element or one character vector, and textscan is ultimately just a fancy wrapper for low-level operations.

You might think of a string array as one thing, but really it is a container array of multiple character vectors, i.e. it contains lots of individual, separate character vectors, which are stored separately. Not so different from a cell array, really (search this forum for more accurate and detailed discussions on how string arrays are actually implemented).

Parsing a string array introduces ambiguities: e.g. what is the end-of-line character? textscan relies on identifying that character... but parsing a string array would (possibly, see below) require having no EOL character at all, and instead treating each string element as being de-facto delimited by some character (in which case you can trivially do this yourself, as I did in my last comment). You might think it is obvious that each string element should be treated as one line, but computers do not understand "obvious", they understand instructions in the form of code. Consider how this 2x1 string array should be parsed:

str = ["1";"2\n3"] % \n = newline

which of these should textscan(str,'%f') return?:

[1;2;3] all values, identify both newline AND different string elements as having de-facto EOL.
[1;2] newline causes parsing to finish.
[1] second element does not parse.
{[1];[2;3]} the output is not of the class requested, and the cell contents can have an arbitrary size.
error second element throws an error.

If you say the first is the correct behavior, what about the next user who expects one of the other behaviors?

Note also that text files also consist of one long character vector (people think of them as having "lines", but really they are all one long character vector interspersed with newline characters), and low-level file parsing functions also parse just that one character vector.

Leslie on 23 Apr 2020

Edited: Leslie on 23 Apr 2020

OK, thanks. I'd noticed that what I was trying to do "all at once" would have worked if I'd been reading a file and could have searched for the newline character, but didn't (or couldn't) carry that all the way forward to understanding how the string array was being stored. It just never occurred to me to do something like "ignore through the 'cdf' at the end of the string", which is an analog to the documentation's example of "ignore the rest of the line".

Sign in to comment.

Answer 2

Mohammad Sami on 23 Apr 2020

Open in MATLAB Online

0 votes

Since the pattern in your string seems to be the same, you can use the format specification to convert the string directly to datetime as follows.

stringFile = ["nsasondewnpnC1.b1.20020428.184800.cdf"; ...
              "nsasondewnpnC1.b1.20020428.220500.cdf"; ...
              "nsasondewnpnC1.b1.20020428.235900.cdf"; ...
              "nsasondewnpnC1.b1.20020429.013100.cdf"; ...
              "nsasondewnpnC1.b1.20020429.182500.cdf"];
fmt = "'nsasondewnpnC1.b1.'yyyyMMdd'.'HHmmss'.cdf'";
% the constant portion of your string is enclosed in 'single quotes';
d = datetime(stringFile,'InputFormat',fmt);

1 Comment
Show -1 older comments Hide -1 older comments

Leslie on 23 Apr 2020

Thanks, interesting useage that I didn't know about.

But I don't really want it in datetime format; I'd like the 2-digit text chunks. If I've got to clutter up my code with sending it to datetime & back, I might as well write the stupid loop. (I'm not meaning to be cranky at you; I'm just cranky that I spent a few hours today poring over documentation and Answers to do something that it seems I ought to be able to do!)

Sign in to comment.

How to parse an Nx1 string array without looping through N

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

How to parse an Nx1 string array without looping through N

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

5 Comments
Show 3 older comments Hide 3 older comments

1 Comment
Show -1 older comments Hide -1 older comments