extracting numbers after the particular string from cell array

1 view (last 30 days)
data={'333', 'AS C37 2021 03 28 00 05 30.000000 1 -0.884071511631E-03','abvc','400 55 a','AS G17 2021 3 28 0 17 30.000000 1 0.416843065644E-03'};
For example in the above cell array, how can I extract all YYYY MM DD HH MM SS (2021 03 28 00 05 30.00 and 2021 3 28 0 17 30.0)?
The related YYYY MM DD HH MM SS values always comes after AS [A-Z][0-9][0-9] (for example, AS C37 and AS G17). So, can we define the codes for extracting these values following this rule? The original size of the data cell array is 1x400000, therefore the speed is also an important factor.
  6 Comments
dpb
dpb on 2 Jul 2021
There may well be (probably is, no undoubtedly is) code to read these files available -- they might already have a MATLAB routine, even. Have you looked for what routines are available?
sermet OGUTCU
sermet OGUTCU on 2 Jul 2021
I just want to extract all dates YYYY MM DD HH MM SS (such as 2021 03 28 00 05 30.000000) from this cell array.

Sign in to comment.

Accepted Answer

dpb
dpb on 2 Jul 2021
Edited: dpb on 2 Jul 2021
Oh. I see I didn't look far enough down the file -- the header stuff ends at record 170; the other data starts at record 171.
tCOD=readtable('COD0MGXFIN_20210870000_01D_30S_CLK.clk','FileType','text', ...
'headerlines',170,'readvariablenames',0);
tCOD.Properties.VariableNames(3:8)={'Yr','Mn','Day','Hr','Min','Sec'};
tCOD.DateTime=datetime(tCOD{:,{'Yr','Mn','Day','Hr','Min','Sec'}});
leaves you with
>> [head(tCOD);tail(tCOD)]
ans =
16×12 table
Var1 Var2 Yr Mn Day Hr Min Sec Var9 Var10 Var11 DateTime
______ _____________ ____ __ ___ __ ___ ___ ____ ___________ __________ ____________________
{'AR'} {'BADG00RUS'} 2021 3 28 0 0 0 2 0.00044149 3.7396e-11 28-Mar-2021 00:00:00
{'AR'} {'ABMF00GLP'} 2021 3 28 0 0 0 2 -0.00024309 3.739e-11 28-Mar-2021 00:00:00
{'AR'} {'AJAC00FRA'} 2021 3 28 0 0 0 2 -0.00038427 3.7166e-11 28-Mar-2021 00:00:00
{'AR'} {'ALIC00AUS'} 2021 3 28 0 0 0 2 -2.4277e-09 3.7381e-11 28-Mar-2021 00:00:00
{'AR'} {'AMU200ATA'} 2021 3 28 0 0 0 2 -2.9659e-08 3.7474e-11 28-Mar-2021 00:00:00
{'AR'} {'ANKR00TUR'} 2021 3 28 0 0 0 2 1.9425e-08 3.7349e-11 28-Mar-2021 00:00:00
{'AR'} {'AREG00PER'} 2021 3 28 0 0 0 2 0.00046999 3.7485e-11 28-Mar-2021 00:00:00
{'AR'} {'ASCG00SHN'} 2021 3 28 0 0 0 2 -3.5686e-08 3.7378e-11 28-Mar-2021 00:00:00
{'AS'} {'R16' } 2021 3 28 23 59 30 1 -1.3127e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R17' } 2021 3 28 23 59 30 1 0.00041179 NaN 28-Mar-2021 23:59:30
{'AS'} {'R18' } 2021 3 28 23 59 30 1 7.1344e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R19' } 2021 3 28 23 59 30 1 -0.00013759 NaN 28-Mar-2021 23:59:30
{'AS'} {'R20' } 2021 3 28 23 59 30 1 -4.6221e-05 NaN 28-Mar-2021 23:59:30
{'AS'} {'R21' } 2021 3 28 23 59 30 1 -0.00019777 NaN 28-Mar-2021 23:59:30
{'AS'} {'R22' } 2021 3 28 23 59 30 1 -0.00010502 NaN 28-Mar-2021 23:59:30
{'AS'} {'R24' } 2021 3 28 23 59 30 1 3.6747e-05 NaN 28-Mar-2021 23:59:30
>>
There are only two (2) variables past the time field at the end of the table instead of three (3), hence the NaN elements for Var11.
You can either scan the file for the location of the "END OF HEADER" record to find the number of headerlines to skip or the probably is sufficient data within the file header to compute where that is -- although if the COMMENTS are freeform, there may not be a fixed number of records there and so it may just take scanning the file first...
Either way, this is much simpler and straightforward than trying to parse the cell array...that's fraught with difficulty in comparison.

More Answers (1)

sermet OGUTCU
sermet OGUTCU on 2 Jul 2021
I attached the original file.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!