The efficient way (in terms of speed and consistency) for parsing a big text file with textscan

10 views (last 30 days)
I have a text file consists of apprx 500 000 lines. I appended the header and some data parts in below.
#dP2021 4 3 0 0 0.00000000 288 u+U IGb14 FIT GFZ
## 2151 518400.00000000 300.00000000 59307 0.0000000000000
++ 10 10 10 6 10 6 10 8 8 8 8 8 8 8 8 8 8
++ 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 8 10
%c M cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%i 0 0 0 0 0 0 0 0 0
%i 0 0 0 0 0 0 0 0 0
/* PCV:IGS14_2148 OL/AL:FES2004 NONE YN CLK:CoN ORB:CoN
/* GeoForschungsZentrum Potsdam
/*
/*
* 2021 4 3 0 0 0.00000000
PC01 -34381.586112 24435.438444 69.245923 -596.854622
PE02 4493.250988 41924.015694 -226.819605 790.650809
PG03 -14754.803607 39520.337126 -938.295010 -436.165931
PG04 -39584.473454 14533.059977 -388.137635 370.305833
.
.
.
* 2021 4 3 0 5 0.00000000
PC01 -34381.437242 24436.228124 74.813357 -596.843988
PE02 4493.541869 41922.959643 -254.934261 790.641523
PG03 -14753.360421 39519.882073 -951.586932 -436.156224
PG04 -39584.568840 14533.349312 -380.297839 370.469467
.
.
.
EOF
I need to count separately for the all PC[0-9][0-9], PE[0-9][0-9], and PG[0-9][0-9] strings in the first column of data section after the header section and date. What is the efficent way for doing this using textscan?
  5 Comments
sermet OGUTCU
sermet OGUTCU on 15 May 2021
Dear @Jan, the format of output is not important but it will be created as string array such as;
output=["PC" "120";"PG" "200";"PE" "110"]
Sulaymon Eshkabilov
Sulaymon Eshkabilov on 15 May 2021
You can test: fscanf() that works in a similar way alike textscan(). Specifiers and other parameters are the same.

Sign in to comment.

Accepted Answer

Jan
Jan on 15 May 2021
Edited: Jan on 15 May 2021
Str = fileread(FileName);
C = strsplit(Str, '\n');
nPC = sum(strncmp(C, 'PC', 2));
nPG = sum(strncmp(C, 'PG', 2));
nPE = sum(strncmp(C, 'PE', 2));
If the file do not match into your RAM:
fid = fopen(FileName, 'r');
nPC = 0;
nPG = 0;
nPE = 0;
while ~feof(fid)
s = fgets(fid);
if strncmp(s, 'PC', 2)
nPC = nPC + 1;
elseif strncmp(s, 'PG', 2)
nPG = nPG + 1;
elseif strncmp(s, 'PE', 2)
nPE = nPE + 1;
end
end
fclose(fid);

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!