How can I skip the first to lines from a text file and proceed to read the file with header lines in between the values?

39 views (last 30 days)
Hello,
I want to read the attached text file, which has header lines between the values. My aim is to just read the 2nd column.
I used following code:
fidi = fopen('BSH_HBM_tracer_productgrid_fine_20200701.txt','rt');
C = dlmread('BSH_HBM_tracer_productgrid_fine_20200701.txt', ' ', 2, 0);
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%*f%f', 'HeanderLines', 1, 'CollectOutput', true, 'CommentStyle',{'Section' });
M = cell2mat(C);
if isempty(M)
break
end
D{k1,:} = M;
fseek(fidi,0);
k1=k1+1
end
fclose(fidi)
And get following error message:
Error using dlmread (line 147)
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 1, field number 1) ==> Section 2020-183 01:00:00 54:39:15.1200
11:07:54.8400 Modeled svp from the BSH hydrodynamic model\n
I took the code from this answered question:
I just added the line:
C = dlmread('BSH_HBM_tracer_productgrid_fine_20200701.txt', ' ', 2, 0);
hoping to skip the first two lines.
Because I'm pretty new in reading text files with header lines between values into matlab, I don't know how to fix my code in order to get the desired outcome.
I work with R2020b.
Thanks in advance for your help.

Accepted Answer

Star Strider
Star Strider on 26 Nov 2021
Try this —
C1 = readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/814274/BSH_HBM_tracer_productgrid_fine_20200701.txt', 'HeaderLines',2)
C1 = 4128×12 cell array
{'Section'} {'2020-183' } {[01:00:00 ]} {[54:39:15 ]} {[11:07:54 ]} {'Modeled' } {'svp' } {'from' } {'the' } {'BSH' } {'hydrodynamic'} {'model' } {[ 0]} {[1.4897e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 1]} {[1.4897e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 3]} {[1.4899e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 5]} {[1.4899e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 7]} {[1.4900e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 9]} {[1.4900e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 11]} {[1.4901e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 13]} {[1.4901e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 15]} {[1.4881e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 17]} {[1.4840e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {'Section'} {'2020-183' } {[01:59:59 ]} {[54:39:15 ]} {[11:07:54 ]} {'Modeled' } {'svp' } {'from' } {'the' } {'BSH' } {'hydrodynamic'} {'model' } {[ 0]} {[1.4893e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 1]} {[1.4893e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 3]} {[1.4896e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 5]} {[1.4897e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing}
C11Idx = find(cellfun(@(x)strcmp(x,'Section'),C1(:,1)));
% dC11Idx = diff(C11Idx) % Information Only
% sdd = [min(dC11Idx); max(dC11Idx); std(dC11Idx)] % Information Only
Sections = cell(size(C11Idx)); % Preallocate
Col_2 = cell(size(C11Idx)); % Preallocate
for k = 1:numel(C11Idx)-1
Section{k} = C1(C11Idx(k):(C11Idx(k+1)-1),:);
Col_2{k} = C1((C11Idx(k)+1):(C11Idx(k+1)-1),2);
Col_2v{k,:} = cell2mat(Col_2{k});
end
Section{1}
ans = 11×12 cell array
{'Section'} {'2020-183' } {[01:00:00 ]} {[54:39:15 ]} {[11:07:54 ]} {'Modeled' } {'svp' } {'from' } {'the' } {'BSH' } {'hydrodynamic'} {'model' } {[ 0]} {[1.4897e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 1]} {[1.4897e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 3]} {[1.4899e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 5]} {[1.4899e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 7]} {[1.4900e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 9]} {[1.4900e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 11]} {[1.4901e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 13]} {[1.4901e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 15]} {[1.4881e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 17]} {[1.4840e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing}
Col_2{1}
ans = 10×1 cell array
{[1.4897e+03]} {[1.4897e+03]} {[1.4899e+03]} {[1.4899e+03]} {[1.4900e+03]} {[1.4900e+03]} {[1.4901e+03]} {[1.4901e+03]} {[1.4881e+03]} {[1.4840e+03]}
Section{end-1}
ans = 9×12 cell array
{'Section'} {'2020-183' } {[22:00:00 ]} {[54:40:45 ]} {[11:10:24 ]} {'Modeled' } {'svp' } {'from' } {'the' } {'BSH' } {'hydrodynamic'} {'model' } {[ 0]} {[1.4891e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 1]} {[1.4892e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 3]} {[1.4891e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 5]} {[1.4883e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 7]} {[1.4878e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 9]} {[1.4869e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 11]} {[1.4853e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing} {[ 13]} {[1.4837e+03]} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing} {1×1 missing } {1×1 missing}
Col_2{end-1}
ans = 8×1 cell array
{[1.4889e+03]} {[1.4889e+03]} {[1.4889e+03]} {[1.4883e+03]} {[1.4878e+03]} {[1.4868e+03]} {[1.4852e+03]} {[1.4836e+03]}
Use the cell2mat function to convert ‘Col_2v’ to a double vector —
Col_2_v = cell2mat(Col_2v)
Col_2_v = 3736×1
1.0e+03 * 1.4897 1.4897 1.4899 1.4899 1.4900 1.4900 1.4901 1.4901 1.4881 1.4840
This turned out to be something of an adventure!
Experiment if necessary to get different results.
.
  4 Comments
Sarah Preuß
Sarah Preuß on 8 Dec 2021
Hi,
I have a subsequently problem. The header is now slightly change and if I use readcell I get a cell where the complete header is in one cell and it's not segmented.
I really have no idea why, because the header has the same structure, but slightly different information. I tired by defining a dilimiter, but it did not help.
Do you have an idea, why this is happening?
Star Strider
Star Strider on 8 Dec 2021
The reason is that this file is significantly different from the previoous file, and it took a bit of effort to figure out how to parse it correctly. (I decided to stay with readcell and the rest, for consitency.)
I noticed that all the ‘Section’ information in this file shared a common first column, varying only in the second column. I decided to considate all that informaiton in the ‘SummaryMatrix’ variable at the end. That should make it easier to use the informatiion. The actual data begins with the second column and then proceeds for all the rest of the columns. Only the first column retains the identifying information for those values.
C1 = readcell('https://www.mathworks.com/matlabcentral/answers/uploaded_files/827335/CARIS_HIPS_SVP.txt', 'HeaderLines',2)
C1 = 12768×2 cell array
{'Section 2020-183 00:00:00 54:42:00 012:42:00 comment: interpoliert (30/29184 0.103% profiles used'} {'Daylight-Saving-Time-flag: 1)'} {'2.000 1480.411' } {1×1 missing } {'5.000 1480.538' } {1×1 missing } {'7.000 1480.565' } {1×1 missing } {'12.000 1480.300' } {1×1 missing } {'17.000 1475.200' } {1×1 missing } {'19.000 1474.328' } {1×1 missing } {'Section 2020-183 01:00:00 54:42:00 012:42:00 comment: interpoliert (46/29184 0.158% profiles used'} {'Daylight-Saving-Time-flag: 1)'} {'2.000 1480.412' } {1×1 missing } {'5.000 1480.539' } {1×1 missing } {'7.000 1480.570' } {1×1 missing } {'12.000 1480.330' } {1×1 missing } {'17.000 1475.244' } {1×1 missing } {'19.000 1474.325' } {1×1 missing } {'Section 2020-183 02:00:00 54:42:00 012:42:00 comment: interpoliert (62/29184 0.212% profiles used'} {'Daylight-Saving-Time-flag: 1)'} {'2.000 1480.421' } {1×1 missing }
C11Idx = find(any(cellfun(@(x)contains(string(x),"Section"),C1),2));
% dC11Idx = diff(C11Idx) % Information Only
% sdd = [min(dC11Idx); max(dC11Idx); std(dC11Idx)] % Information Only
Sections = cell(size(C11Idx)); % Preallocate
Cmtx = cell(size(C11Idx)); % Preallocate
for k = 1:numel(C11Idx)-1
idxrng = C11Idx(k)+1:(C11Idx(k+1)-1);
Cseg = C1(idxrng,1);
Cmtx{k} = reshape(cell2mat(cellfun(@(x)sscanf(x,'%f %f'),Cseg, 'Unif',0)),2,[]).';
% SectionHeader{k} = C1{1,k};
end
Cmtx = Cmtx.'; % Transpose To Row
CmtxCons = cell2mat(Cmtx); % Consolidated Double ARray
SummaryMatrix = [CmtxCons(:,[1 2]) CmtxCons(:,4:2:end)] % Second Columns Of All 'Section' Matrices With Respect To Common First Column
SummaryMatrix = 6×1824
1.0e+03 * 0.0020 1.4804 1.4804 1.4804 1.4804 1.4805 1.4805 1.4806 1.4807 1.4808 1.4809 1.4810 1.4811 1.4813 1.4815 1.4818 1.4820 1.4822 1.4824 1.4825 1.4826 1.4826 1.4826 1.4825 1.4824 1.4823 1.4823 1.4824 1.4825 1.4826 0.0050 1.4805 1.4805 1.4805 1.4806 1.4806 1.4806 1.4807 1.4807 1.4808 1.4809 1.4809 1.4811 1.4812 1.4813 1.4815 1.4816 1.4818 1.4819 1.4819 1.4820 1.4820 1.4820 1.4822 1.4823 1.4824 1.4824 1.4825 1.4826 1.4827 0.0070 1.4806 1.4806 1.4806 1.4806 1.4806 1.4807 1.4807 1.4807 1.4808 1.4809 1.4810 1.4811 1.4812 1.4813 1.4814 1.4815 1.4816 1.4817 1.4817 1.4817 1.4818 1.4819 1.4820 1.4822 1.4823 1.4824 1.4825 1.4826 1.4828 0.0120 1.4803 1.4803 1.4805 1.4806 1.4807 1.4808 1.4808 1.4809 1.4809 1.4810 1.4811 1.4812 1.4812 1.4813 1.4814 1.4815 1.4816 1.4817 1.4818 1.4819 1.4820 1.4820 1.4820 1.4820 1.4822 1.4824 1.4826 1.4829 1.4831 0.0170 1.4752 1.4752 1.4755 1.4757 1.4759 1.4761 1.4762 1.4764 1.4764 1.4765 1.4765 1.4765 1.4766 1.4767 1.4769 1.4770 1.4772 1.4773 1.4775 1.4776 1.4777 1.4778 1.4778 1.4779 1.4780 1.4782 1.4785 1.4791 1.4798 0.0190 1.4743 1.4743 1.4743 1.4743 1.4743 1.4743 1.4744 1.4744 1.4744 1.4745 1.4745 1.4745 1.4745 1.4746 1.4746 1.4747 1.4747 1.4748 1.4749 1.4750 1.4750 1.4751 1.4752 1.4753 1.4754 1.4756 1.4758 1.4761 1.4765
figure
surfc(1:numel(C11Idx)-1, SummaryMatrix(:,1).', SummaryMatrix(:,2:end), 'EdgeColor','interp')
grid on
xlabel('Sections')
ylabel('Column 1')
zlabel('Columns 2')
set(gca, 'YTick',SummaryMatrix(:,1))
title('Surface Plot Of Section data')
.

Sign in to comment.

More Answers (2)

Mathieu NOE
Mathieu NOE on 26 Nov 2021
hello Sarah
this code will retrieve the numerical data of the 2nd column (3744 values)
filename = 'BSH_HBM_tracer_productgrid_fine_20200701.txt';
k = 0;
a = readlines(filename);
[m,~] = size(a);
for ci = 1:m
tmp = split(char(a(ci,:)));
if numel(tmp) == 2
k = k+1;
data(k,1) = str2double(tmp{2});
end
end

Adam Danz
Adam Danz on 26 Nov 2021
Edited: Adam Danz on 26 Nov 2021
You could use readtable and then eliminate the columns and rows associated with the header lines.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/814274/BSH_HBM_tracer_poductgrid_fine_20200701.txt');
% Clean up
T(~isnan(T.Var3),:) = [];
T(:,3:end) = []
T = 3744×2 table
Var1 Var2 ____ ______ 0 1489.7 1 1489.7 3 1489.9 5 1489.9 7 1490 9 1490 11 1490.1 13 1490.1 15 1488.1 17 1484 0 1489.3 1 1489.3 3 1489.6 5 1489.7 7 1489.7 9 1489.8
This produces an n*2 table. You can also eliminate the first column if you're only intersted in the second column.
  2 Comments
Adam Danz
Adam Danz on 8 Dec 2021
To address your comment under another answer, this solution is still the simplest IMO and adding one line (detectImportOptions) will allow flexibility between the two files you've shared.
file = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/827335/CARIS_HIPS_SVP.txt';
opts = detectImportOptions(file,'Delimiter',' ');
T = readtable(file,opts);
% Clean up
T(~isnan(T.Var3),:) = [];
T(:,3:end) = []
T = 10944×2 table
Var1 Var2 ____ ______ 2 1480.4 5 1480.5 7 1480.6 12 1480.3 17 1475.2 19 1474.3 2 1480.4 5 1480.5 7 1480.6 12 1480.3 17 1475.2 19 1474.3 2 1480.4 5 1480.5 7 1480.6 12 1480.5

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!