count number of rows in csv outside of matlab

Question

Alexandra McClernon Ownbey on 26 Feb 2021

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab

Answered: Walter Roberson on 26 Feb 2021

I have 10000+ csv files I would like to import into matlab. I only need the data from the first and last rows for inlet and exit conditions. Each csv file has a different number of data points, so I do not know the length of the file imported a priori. I am trying to automate the import process. I can automate importing all the data or specific lines, but I do not know how to import the last row. The only way I can think of is to determine the number of rows in the file without importing the data (importing all the data takes a few hours) and import that row specifically. Does anyone know how I can do this? I have tried messing with textscan, but I have not had any luck.

5 Comments
Show 3 older commentsHide 3 older comments

Walter Roberson on 26 Feb 2021

Is there an upper limit on the number of characters per line in your csv files? For example are the lines more than 1 kilobyte each?

Alexandra McClernon Ownbey on 26 Feb 2021

I am using windows. I am not sure what the limit/line is. There are 8 columns in all files. The first row are the titles, the remaining rows are numbers with >8 figures in each cell

Sign in to comment.

Sign in to answer this question.

Answer 1

Jeremy Hughes on 26 Feb 2021

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_634339

Open in MATLAB Online

Turns out this is not as easy of a question as you might think, especially if your CSV contains data that might be double-quoted and that data contains a new line character.

e.g. "This data\nhas a new line","but this doesn't"

The number of lines might not really be that important, depending on what you're trying to do. But I don't know what that is.

If you're trying to avoid having all the data in memory at one time, I suggest reading up on tabularTextDatastore, as that helps automate working with large sets of data. There's a rich set of features you can use with datastores to make working with larger datasets eaiser, tall, transform, combine. None of these assume to know the size of each table. But again, without knowing what you plan do to with those files, that's hard to say if you can use it.

---- but to answer the original question ----

If you don't have any double-quoted data, it gets a lot easier to count lines of a CSV file. This code will scan the lines without importing any of the data. (It reads the file internally, but doesn't generate any output--thats what the %*... formats are about)

fid = fopen(filename);
numLines = 0;
while ~feof(fid)
    [~,c]=textscan(fid,'%*[^\r\n]%*[\r\n]',1,'Delimiter','','Whitespace','','EndOfLine','');
    if c > 0 % if c==0, then there wasn't a line there. this may happen at the end of the file.
        numLines = numLines + 1;
    end
end
fclose(fid);

If you want the data for the lines, this newer function should help:

https://www.mathworks.com/help/matlab/ref/readlines.html

---- if you have double-quoted strings with new lines (or don't know if you do) ----

If you want to count the actual lines and not count the ones in double quoted fields, then you really need to parse each CSV line and find the fields with double-quotes, and at that point you might as well be importing the data, but if you really just want to count the parsed lines, you can do something with importOptions.

This code will import only the first variable in the table as a string, but to do that, it will still parse the file and consider the quoted data that appears later in the line. It will be slower than the method above, but robust if you have quoted data that contains newlines. It will be faster than bringing in the whole table.

opts = delimitedTextImportOptions('Delimiter',',','ExtraColumnsRule','ignore','VariableTypes',"string");
T = readtable(filename,opts);
numLines = height(T);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

KSSV on 26 Feb 2021

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_633579

Open in MATLAB Online

csvFiles = dir('*.csv') ; 
N = length(csvFiles) ; 
f = cell(N,1) ; % first row 
l = cell(N,1) ; % last row
for i = 1:N
    data = csvread(csvFiles(i).name) ; 
    f{i} = data(1,:) ;
    l{i} = data(end,:) ; 
end

1 Comment
Show -1 older commentsHide -1 older comments

Alexandra McClernon Ownbey on 26 Feb 2021

Edited: Alexandra McClernon Ownbey on 26 Feb 2021

I am trying to read in the data without reading in the entire table. I already have a script that can read in all the csv files in separate folders. csvread takes too long. I know I can import the data into cells or a 3D matrix and then select the points I want, but this is a very round-about way of doing it.

Sign in to comment.

Answer 3

Walter Roberson on 26 Feb 2021

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_634364

Open in MATLAB Online

csvdir = 'appropriate_directory_name';  %use '.' for current directory
csvFiles = dir(fullfile(csvdir, '*.csv'));
filenames = fullfile({csvdir.folder}, {csvdir.name});
N = length(csvFiles) ; 
f = cell(N,1) ; % first row 
l = cell(N,1) ; % last row
for K = 1:N
    thisfile = filenames{K};
    [fid, msg] = fopen(thisfile, 'r');
    if fid < 0
        fprintf('failed to open file "%s" because "%s", ignoring it\n', thisfile, msg);
        next
    end
    fgetl(fid);   %skip header
    f{i} = cell2mat(textscan(fgetl(fid), ''));  %first line
    %data is 8 columns. We can be sure that columns are < 25 characters each
    fseek(fid, 256, 'eof');   %move to near end of file
    fgetl(fid);   %we positioned to middle of line, discard to end of line
    %look for the last non-empty line
    old_line = '';
    while ~feof(fid)
        new_line = fgetl(fid);
        if ~ischar(new_line); break; end  %EOF
        if ~isempty(strtrim(new_line))
            old_line = new_line;
        end
    end
    fclose(fid)
    l{i} = cell2mat(textscan(old_line, ''))
end

What this code is doing is opening each file, skipping a header line, reading the next line and converting it to numeric. Then it seeks to before the end of file and reads lines, discarding empty lines, including empty lines that occur at end of file, keeping the last non-empty line it finds, and converting the last non-empty line to numeric.

The code seeks to 256 characters before the end of file, skipping the rest of the file -- literally not reading it as much as is possible with the operating system. Why 256? Because it is a "nice round number" to computer scientists ;-) If the data was output as double precision, then it could take as many as 25 characters per entry such as '-6.32359246225409463e+110' plus the comma delimiter, maybe a space as well, so possibly 27*8+2 characters = 218 characters for the line. Using 256 gives a bit of slack in case we miscounted or there is something odd in the file.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

count number of rows in csv outside of matlab

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

count number of rows in csv outside of matlab

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments