Get variable value from imported text

2 views (last 30 days)
theo
theo on 20 Mar 2021
Commented: theo on 27 Mar 2021
Hello, I want to make database from the data inside my files
Here is the example of the data inside the file:
let's say I want to import the years, month, day (2020 10 3) in the first row of the data using importdata function to a matrix structure. The imported matrix is 10x1 cell in which 1 cell contains the string in 1 row. How to take just 2020 in first row string and store it in years, 10 in month, etc in the structure?
And another question, there is a table in the bottom of the data, at the example the header is in line 7. The header may be different for each file data, it may be in line 8 or 9. Is there any way to detect the header and import the data below it as a matrix structure?
edit here is attached the input data and the output example that I want
Here is the code that I tried to build:
St=struct('years',0.0,'month',0.0,'day',0.0, ...
'hours',0.0,'min',0.0,'sec',0.0);
files=dir('*-*');
N=length(files);
for i=1:N
file=files(i).name;
S=importdata(file);
St.years=S{1,1};
St.month=S{1,1};
St.day=S{1,1};
end
  3 Comments
Image Analyst
Image Analyst on 20 Mar 2021
No idea without the data. What do you want and want do you not want (want to ignore)? Anonymize it with generic data if you think it's too secret to post. Otherwise probably no one can help.
theo
theo on 20 Mar 2021
I'm really sorry, here is the example of the input data and the output data that I want

Sign in to comment.

Accepted Answer

Seth Furman
Seth Furman on 22 Mar 2021
Sounds like you could use readtable.
>> readtable('05-2350-33R.S202010','NumHeaderLines',7,'FileType','text','VariableNamingRule','preserve')
ans =
3×15 table
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO AIN AR TRES W DIS CAZ7
_______ ______ ______ ___ ____ _____ ____ ______ ____ _____ ____ ___ _________ ____ ____
{'JBS'} {'KZ'} {'IP'} NaN 2351 23.45 42 NaN NaN NaN NaN 33 -0.0001 2596 153
{'MTK'} {'KZ'} {'IP'} NaN 2351 25.93 60 NaN NaN NaN NaN 29 0.0001 2623 154
{'SLT'} {'KZ'} {'IP'} NaN 2351 28.15 57 NaN NaN NaN NaN 28 -0.0001 2650 152
Normally readtable tries to detect and ignore any header lines, but it looks like readtable has trouble doing this with your files, so you'll probably have to explicitly specify the number of header lines (see 'NumHeaderLines' above).
Consider readcell or textscan if you want to read non-tabular data, e.g. the data in the header lines.
  4 Comments
Seth Furman
Seth Furman on 26 Mar 2021
1) How to specify the number of header lines if not known in advance?
Assuming you know that the line with table variable names is the first line that starts with 'STAT', for instance, you can calculate the number of header lines by reading the file line-by-line, finding the index of the first line that starts with 'STAT', and subtracting one from that number to get the number of header lines. For instance, you can use readlines to read the file as separate lines and then use startsWith and find to get the first line starting with 'STAT'.
>> lines = readlines('05-2350-33R.S202010','WhitespaceRule','trim');
>> startsWithStat = startsWith(lines,'STAT');
>> indexFirstStat = find(startsWithStat,1)
indexFirstStat =
8
>> numHeaderLines = indexFirstStat - 1
numHeaderLines =
7
>> t = readtable('05-2350-33R.S202010','NumHeaderLines',numHeaderLines,'FileType','text','VariableNamingRule','preserve')
t =
3×15 table
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO AIN AR TRES W DIS CAZ7
_______ ______ ______ ___ ____ _____ ____ ______ ____ _____ ____ ___ _________ ____ ____
{'JBS'} {'KZ'} {'IP'} NaN 2351 23.45 42 NaN NaN NaN NaN 33 -0.0001 2596 153
{'MTK'} {'KZ'} {'IP'} NaN 2351 25.93 60 NaN NaN NaN NaN 29 0.0001 2623 154
{'SLT'} {'KZ'} {'IP'} NaN 2351 28.15 57 NaN NaN NaN NaN 28 -0.0001 2650 152
2) How to specify imported data types?
One method is to modify the imported values, e.g. using one of the integer conversion functions.
>> t.SECON = int64(t.SECON)
t =
3×15 table
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO AIN AR TRES W DIS CAZ7
_______ ______ ______ ___ ____ _____ ____ ______ ____ _____ ____ ___ _________ ____ ____
{'JBS'} {'KZ'} {'IP'} NaN 2351 23 42 NaN NaN NaN NaN 33 -0.0001 2596 153
{'MTK'} {'KZ'} {'IP'} NaN 2351 26 60 NaN NaN NaN NaN 29 0.0001 2623 154
{'SLT'} {'KZ'} {'IP'} NaN 2351 28 57 NaN NaN NaN NaN 28 -0.0001 2650 152
Another method is to specify import types using import options.
>> opts = detectImportOptions('05-2350-33R.S202010','FileType','text','NumHeaderLines',numHeaderLines,'VariableNamingRule','preserve')
opts =
FixedWidthImportOptions with properties:
Format Properties:
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
EmptyLineRule: 'skip'
Encoding: 'UTF-8'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
PartialFieldRule: 'keep'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'STAT', 'SP', 'IPHASW' ... and 12 more}
VariableTypes: {'char', 'char', 'char' ... and 12 more}
VariableWidths: [5 3 7 2 5 6 5 7 5 6 5 4 10 5 5]
SelectedVariableNames: {'STAT', 'SP', 'IPHASW' ... and 12 more}
VariableOptions: Show all 15 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
VariableNamingRule: 'preserve'
Location Properties:
DataLines: [9 Inf]
VariableNamesLine: 8
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
>> opts.VariableTypes{6} = 'int64';
>> t = readtable('05-2350-33R.S202010',opts)
t =
3×15 table
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO AIN AR TRES W DIS CAZ7
_______ ______ ______ __________ ____ _____ ____ __________ __________ __________ __________ ___ _________ ____ ____
{'JBS'} {'KZ'} {'IP'} {0×0 char} 2351 23 42 {0×0 char} {0×0 char} {0×0 char} {0×0 char} 33 -0.0001 2596 153
{'MTK'} {'KZ'} {'IP'} {0×0 char} 2351 26 60 {0×0 char} {0×0 char} {0×0 char} {0×0 char} 29 0.0001 2623 154
{'SLT'} {'KZ'} {'IP'} {0×0 char} 2351 28 57 {0×0 char} {0×0 char} {0×0 char} {0×0 char} 28 -0.0001 2650 152
theo
theo on 27 Mar 2021
Thank you for the explanation, it really helps me understand the methods that I can use.
Thanks for your help, I really appreciate it

Sign in to comment.

More Answers (0)

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!