Make this script faster

Question

1 vote

Dear all,

I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:

The first line is the header with variable names, and only column number 9 is strings, the rest is double

Is there a way to make this script run faster?

Thanks for the insight

 filename = 'file.txt' ;
 % - Get structure from first line.
 fid  = fopen( filename, 'r' ) ;
 line = fgetl( fid ) ;
 fclose( fid ) ;
 % - Build formatSpec for TEXTSCAN.
 fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
 % - Read full file.
 fid  = fopen( filename, 'r' ) ;
 data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
 fclose( fid ) ;
 data = ([data{:}]) ;
 data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
 data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);

5 Comments
Show 3 older comments Hide 3 older comments

jgg on 17 Dec 2015

I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.

Colin Edgar on 17 Dec 2015

Open in MATLAB Online

I can't make fscanf ignore the first "" string, for example:

frmt =  '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Colin Edgar on 17 Dec 2015

Edited: Colin Edgar on 17 Dec 2015

Open in MATLAB Online

0 votes

Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.

formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
 fid = fopen(flnm,'r');
    t1 = fgetl(fid);  %reads past heading, I know it's a hack but...
    t1 = fgetl(fid);
    t1 = fgetl(fid);
    t1 = fgetl(fid);
  mat = fscanf(fid, formatSpec, [12,inf]);
  mat = mat';  %transpose to correct layout
 fclose(fid);

Versus my old version which took ~15sec (similar to approach of OP)

formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
 fid = fopen(flnm,'r');
  C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
 mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Make this script faster

5 Comments
Show 3 older comments Hide 3 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Make this script faster

5 Comments Show 3 older comments Hide 3 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

0 Comments
Show -2 older comments Hide -2 older comments