Remove or ignore certain row while reading from text files?

Hello. I have a number of text files in different subfolders which are in one main folder. My task was to read all the text files, convert all the read information in a particular format into a cell array and then write the cell array to an excel sheet.
The task is completely done, however there is a slight change in the data in text files. The new files that i have gotten have one extra row. Without that row my script runs totally fine. But with the new row added, i get this error:
Subscript indices must either be real positive
integers or logicals.
Error in taskFinal (line 52)
newPDU(i) = newPDU(i-1);
What I need is a little help regarding how to deal with this useless row.
The row number is 37 in the files. What I need is that while reading the data from the text files, either we ignore that row or also simply remove the line from the cell array when the data from the file is read into the cell array. There is only one word in that row which is " [7E8] ". The m-file and one text file is attached below.
Thank you for any kind of help.
EDIT: Text file attached.
EDIT: The unwanted row is present in some files while in some files it is not.

4 Comments

"Without that row my script runs totally fine." &nbsp Thus, I deleted the row, which contains [7E8], and tried to read the text-file with your code. (First, I converted it to a function.) I wanted to see what you write to the Excel-file. However, that didn't work. I received
Index exceeds matrix dimensions.
Error in taskFinal (line 50)
newParameter(i) = tempSplit(2);
K>> tempSplit
tempSplit =
'1.2 OBD (OBD [8])'
K>> whos tempSplit
Name Size Bytes Class Attributes
tempSplit 1x1 146 cell
BTW: Don't use path as the name of a variable. It's the name of a Matlab function.
Proposal: Upload the resulting Excel-file to show what you want to extract.
Hello. I have attached the excel file. You can see what this script is doing with the the text files data. And thanks for the tip for not using path as a variable name.
"however there is a slight change in the data in text files" &nbsp This reminds me of function I made long time ago to read a huge text file with descriptive information from a building automation system, BAS. With each revision of the BAS there was a number of changes in the text file format. The purpose of many changes was just to make the text more readable on screen. I guess, it wasn't intended to be read automatically. Eventually, I gave up to maintain the function.
Question: Do you foresee a need to maintain this script to account for changes in the file format and/or requirements to extract more information? Currently, you only read a fourth of the file.
probably i will not be needing to extract more information from the text files. Maybe in the near future i might need it depending on the needs of my colleague but right now i just have to read only one fourth of the file as you noted.

Sign in to comment.

 Accepted Answer

A quick and dirty solution: Delete the row, which causes trouble. Try
>> tic, preTaskFinal( 'h:\m\cssm\SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt' ); toc
Elapsed time is 0.580805 seconds.
where
function preTaskFinal( filespec )
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%s', 'Delimiter','\n' );
[~] = fclose( fid );
cac = cac{1};
is_spurious_row = strncmp( cac, '[7E8]', 5 );
cac( is_spurious_row ) = [];
fid = fopen( 'TempTxt4TaskFinal.txt', 'w' );
for jj = 1 : length( cac )
fprintf( fid, '%s\r\n', cac{jj} );
end
[~] = fclose( fid );
end
&nbsp
Here is a different implementation.
>> source_spec = 'SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt';
>> row_content = '[7E8]';
>> target_spec = 'temp.txt';
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.150321 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '\\<[ ]*%s\\s+?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end
and a slightly different one, which is faster
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.028050 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '(?<=\\n)\\s*?%s\\s*?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end

5 Comments

Thank you for the answer. However could you please tell me, how do I make this code general? i have to read a large number of files in one main folder, not just one file. Also how do i use fid in this case? because i did not use fid in my script before.
"make this code general" &nbsp General in what respect? The function already takes the filespec as an input argument. The name of the "fixed" file is however hard-coded.
from what i understand from the code that you have posted, it will work for just one file that I give as input argument. i have a number of files and i want to read them all, without that particular line. Correct me if i am wrong please.
hello. Thank you for your help. i got it done using strncmp as you suggested but in a slightly different way. here is what i did:
if (strcmp(parameter{1}, '[7E8]')) %look and compare [7E8]
parameter=parameter(2:end); %if found, ignore it and start from next row
end
it looked for the "[7E8]" in the first cell of parameter column and if its present there, then it simply moved on to the second row of the parameter coulumn. Now i dont know if its efficient enough or not but its working for me and thats all that i wanted.
Any further input on this issue from your side is appreciated.
Thank you again for your help.
I added a faster (and "better") implementation to the answer.

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!