Using textscan to skip specific rows

7 views (last 30 days)
Right now I'm interested in using textscan to read specific rows of a .csv file which looks something like this:
type value latency
92 trigger 19593
type value latency
20 trigger 20000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
I want to have textscan read the rows which have the values "92" and "19593", but to ignore the rows that are blank or the rows that have the words "type" "value" "latency". In addition, the documents I am scanning are of varying lengths, and I want textscan to automatically figure out which rows have the values of interest "92, 20, 14" and "19593" (latency values). How would I have textscan automatically scan the numerical values and ignore all other rows? In addition, I am not interested in scanning the column "value" at all. I only need to scan the two columns "type" and "latency".
Thank you in advance.

Accepted Answer

Stephen23
Stephen23 on 8 Jun 2017
Edited: Stephen23 on 8 Jun 2017
Second attempt: this time we will first convert the awful file format into a standard CSV format with one header line, no empty rows, no quoted numbers, and no repeated headers. I used regexprep and simply defined appropriate regular expressions:
fnm = 'events file.csv';
str = fileread(fnm);
rgx = {'\s*\n\s*"type"[a-z",]+','"(\d+)\s*"'};
rep = {'','$1'};
str = regexprep(str,rgx,rep);
You could also save str if you want an improved CSV file. Now we can easily extract the data from this improved CSV string:
fmt = '%f%*q%f%f%f';
opt = {'HeaderLines',1, 'CollectOutput',true, 'Delimiter',','};
C = textscan(str,fmt,opt{:});
M = C{1}
giving:
M =
92 14287 0 1
20 16459 0 2
14 19147 0 3
14 20646 0 4
14 22047 0 5
14 24431 0 6
14 26844 0 7
14 28819 0 8
14 30071 0 9
14 31843 0 10
14 34788 0 11
14 36883 0 12
14 38439 0 13
14 39571 0 14
14 40691 0 15
14 41368 0 16
14 42443 0 17
20 42708 0 18
14 44955 0 19
14 46092 0 20
14 47771 0 21
14 48632 0 22
14 52780 0 23
14 54312 0 24
14 56616 0 25
14 58724 0 26
14 59831 0 27
14 61911 0 28
14 62859 0 29
14 64884 0 30
14 67534 0 31
14 68367 0 32
14 70052 0 33
20 70311 0 34
14 72450 0 35
14 73476 0 36
14 75143 0 37
14 76615 0 38
20 76622 0 39
Tested on this file:

More Answers (1)

Stephen23
Stephen23 on 6 Jun 2017
Edited: Stephen23 on 7 Jun 2017
One approach is to play with the EndOfLine character (a little hacky, I know):
opt = {'MultipleDelimsAsOne',true, 'EndOfLine','p',...
'CollectOutput',true, 'HeaderLines',1};
fmt = '%*s%*s%*s%f%*s%f';
fid = fopen('temp2.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}(1:2:end,:);
giving:
>> M
M =
92 19593
20 20000
14 21000
14 21000
14 21000
14 21000
14 21000
14 21000
14 21000
As long as the file is not huge then this will be a reasonably effective method, although it does create a larger output matrix than required.
Note that your comment that the file "looks something like this" and the misleading "a .csv file" (whereas there is not a single comma anywhere in your example data) means that you are not providing an accurate enough specification. Reading files requires a very precise specification: every newline, every separator, non-printable characters, trailing spaces, trailing delimiters, repeated space characters, etc, etc, makes a huge difference to how a file is read. Because you did not provide us with a sample file I created my own one to test my code on, available here:
  9 Comments
Darren Kenney
Darren Kenney on 7 Jun 2017
Not large enough to crash the program (40 lines for each file), and 2012. I fixed the problem by changing the 'delimiter' from '\n\r\t ,"' to ','
Darren Kenney
Darren Kenney on 7 Jun 2017
Although, now the leftmost column displays all numbers as NaN which is what I assume you were trying to work around with '\n\r\t ,"'?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!