Remove or ignore certain row while reading from text files?

Question

0 votes

Hello. I have a number of text files in different subfolders which are in one main folder. My task was to read all the text files, convert all the read information in a particular format into a cell array and then write the cell array to an excel sheet.

The task is completely done, however there is a slight change in the data in text files. The new files that i have gotten have one extra row. Without that row my script runs totally fine. But with the new row added, i get this error:

Subscript indices must either be real positive
integers or logicals.
Error in taskFinal (line 52)
                newPDU(i) = newPDU(i-1);

What I need is a little help regarding how to deal with this useless row.

The row number is 37 in the files. What I need is that while reading the data from the text files, either we ignore that row or also simply remove the line from the cell array when the data from the file is read into the cell array. There is only one word in that row which is " [7E8] ". The m-file and one text file is attached below.

Thank you for any kind of help.

EDIT: Text file attached.

EDIT: The unwanted row is present in some files while in some files it is not.

4 Comments
Show 2 older comments Hide 2 older comments

per isakson on 22 Jul 2016

Edited: per isakson on 23 Jul 2016

"however there is a slight change in the data in text files" &nbsp This reminds me of function I made long time ago to read a huge text file with descriptive information from a building automation system, BAS. With each revision of the BAS there was a number of changes in the text file format. The purpose of many changes was just to make the text more readable on screen. I guess, it wasn't intended to be read automatically. Eventually, I gave up to maintain the function.

Question: Do you foresee a need to maintain this script to account for changes in the file format and/or requirements to extract more information? Currently, you only read a fourth of the file.

yousaf obaid on 25 Jul 2016

Edited: yousaf obaid on 25 Jul 2016

probably i will not be needing to extract more information from the text files. Maybe in the near future i might need it depending on the needs of my colleague but right now i just have to read only one fourth of the file as you noted.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

per isakson on 21 Jul 2016

Edited: per isakson on 26 Jul 2016

Open in MATLAB Online

1 vote

A quick and dirty solution: Delete the row, which causes trouble. Try

>> tic, preTaskFinal( 'h:\m\cssm\SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt' ); toc
Elapsed time is 0.580805 seconds.

where

function    preTaskFinal( filespec )
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%s', 'Delimiter','\n' );
[~] = fclose( fid );
cac = cac{1};
is_spurious_row = strncmp( cac, '[7E8]', 5 );
cac( is_spurious_row ) = [];
fid = fopen( 'TempTxt4TaskFinal.txt', 'w' );
for jj = 1 : length( cac ) 
    fprintf( fid, '%s\r\n', cac{jj} );
end
[~] = fclose( fid );
end

&nbsp

Here is a different implementation.

>> source_spec = 'SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt';
>> row_content = '[7E8]';
>> target_spec = 'temp.txt';
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.150321 seconds.

where

function    remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf(  '\\<[ ]*%s\\s+?\\n'                     ...
            ,   regexptranslate('escape',row_content)   );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end

and a slightly different one, which is faster

>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.028050 seconds.

where

function    remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf(  '(?<=\\n)\\s*?%s\\s*?\\n'             ...
            ,   regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end

5 Comments
Show 3 older comments Hide 3 older comments

yousaf obaid on 26 Jul 2016

Edited: yousaf obaid on 26 Jul 2016

Open in MATLAB Online

hello. Thank you for your help. i got it done using strncmp as you suggested but in a slightly different way. here is what i did:

if (strcmp(parameter{1}, '[7E8]')) %look and compare [7E8]
    parameter=parameter(2:end);      %if found, ignore it and start from next row
    end

it looked for the "[7E8]" in the first cell of parameter column and if its present there, then it simply moved on to the second row of the parameter coulumn. Now i dont know if its efficient enough or not but its working for me and thats all that i wanted.

Any further input on this issue from your side is appreciated.

Thank you again for your help.

per isakson on 26 Jul 2016

Edited: per isakson on 26 Jul 2016

I added a faster (and "better") implementation to the answer.

Sign in to comment.

Remove or ignore certain row while reading from text files?

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Remove or ignore certain row while reading from text files?

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

5 Comments
Show 3 older comments Hide 3 older comments