How can I extract particular columns of a huge text file?
Show older comments
I have a 8hour data and I need 3 particular colomns at particular time intervals of this huge text file. 1.I have used fget1: which reads line by line and I used a for loop. 2. I used textscan and converted to a matrix (Eventually my system crashes due to the size of the data)
Both of them take a very long time to run. Is there any better way to extract data from huge text files?
Thanks, Mitrra
4 Comments
dpb
on 16 Aug 2013
Well, one thing about sequential files is that they are, well, "sequential".
You could possibly do something clever if they are fixed-length records by reading as a stream file w/ fread and positioning the file pointer based on the (assumed known) record length and the spacing of the time differential.
Or, you could try using textscan in a loop w/ a given number of headerlines each pass to skip and then only read the line desired. Not sure it'll help much in speed but it should solve the memory problem.
Just how big is the file?
Mp897
on 16 Aug 2013
Cedric
on 17 Aug 2013
Could you copy/paste 10 to 20 lines of this file here on the forum? Depending the format, there are ways to extract relevant line/columns before scanning them.
Mp897
on 18 Aug 2013
Answers (2)
Ken Atwell
on 16 Aug 2013
I would use textscan, using "*" to eliminate the unneeded columns. Say you see columns 1, 3, 5:
textscan(fpi, '%f %*f %f %*f %f %*[^\n]);
This will only convert the necessary columns to binary, which should save a lot of time.
2 Comments
Mp897
on 18 Aug 2013
Ken Atwell
on 19 Aug 2013
Good to hear. Don't forget to accept the answer. :)
per isakson
on 16 Aug 2013
Edited: per isakson
on 16 Aug 2013
Reading specific chunks of a huge file, that is a job for memmapfile. However, character is not in its list of data types. The default type is uint8. Take a chance and try
mmf = memmapfile( 'h:\m\Code2TMW\Path_potential_name_conflict.txt' );
str = char( mmf.Data(1:64) )'
it returns
str =
Warning: Function C:\Program Files\MATLAB\R2013a\toolbox\matlab\
which is indeed the text of the first line. Surely, the encoding of the text file matters.
2 Comments
Mp897
on 16 Aug 2013
per isakson
on 18 Aug 2013
Edited: per isakson
on 18 Aug 2013
"So it gives huge number of rows" , which you have to parse with textscan or otherwise. The point is that you can read part of the file with
str = char( mmf.Data( huge_number+1 : huge_number+small_number_of_bytes ) )'
which gives a small number of rows.
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!