How can I extract particular columns of a huge text file?

I have a 8hour data and I need 3 particular colomns at particular time intervals of this huge text file. 1.I have used fget1: which reads line by line and I used a for loop. 2. I used textscan and converted to a matrix (Eventually my system crashes due to the size of the data)
Both of them take a very long time to run. Is there any better way to extract data from huge text files?
Thanks, Mitrra

4 Comments

Well, one thing about sequential files is that they are, well, "sequential".
You could possibly do something clever if they are fixed-length records by reading as a stream file w/ fread and positioning the file pointer based on the (assumed known) record length and the spacing of the time differential.
Or, you could try using textscan in a loop w/ a given number of headerlines each pass to skip and then only read the line desired. Not sure it'll help much in speed but it should solve the memory problem.
Just how big is the file?
Hi.Thanks for the reply. I had tried fread. It doesn't help with speed. I did try textscan like B = textscan(fpi,'%f %f %f %f %f %f %f %f').It takes forever to get B in the workspace. But din't try it in a loop. Will do that now. My file is 1.9GB . I appreciate your help. Thanks
Could you copy/paste 10 to 20 lines of this file here on the forum? Depending the format, there are ways to extract relevant line/columns before scanning them.
Hi its a text file which is 2gb.My computer crashes when I try to open it as a text file. It has 16 columns each double values. They have only numbers and no letters. I was able to extract the columns I needed using textscan(Took 1.5 mins).
Thanks

Sign in to comment.

Answers (2)

I would use textscan, using "*" to eliminate the unneeded columns. Say you see columns 1, 3, 5:
textscan(fpi, '%f %*f %f %*f %f %*[^\n]);
This will only convert the necessary columns to binary, which should save a lot of time.
You can read more about using "*" in the documentation for textscan.

2 Comments

Using * helped me extracted the coloumns I need in 2 mins. Thanks for your help.
Good to hear. Don't forget to accept the answer. :)

Sign in to comment.

Reading specific chunks of a huge file, that is a job for memmapfile. However, character is not in its list of data types. The default type is uint8. Take a chance and try
mmf = memmapfile( 'h:\m\Code2TMW\Path_potential_name_conflict.txt' );
str = char( mmf.Data(1:64) )'
it returns
str =
Warning: Function C:\Program Files\MATLAB\R2013a\toolbox\matlab\
which is indeed the text of the first line. Surely, the encoding of the text file matters.

2 Comments

Hi, Thanks for the reply. This works well.But I have a small doubt. What does str=char(mmf.Data(1:64))exactly do? In my case,It seems to extract the first 64 characters and converts each element in 1 row.So it gives huge number of rows. Thanks
"So it gives huge number of rows" , which you have to parse with textscan or otherwise. The point is that you can read part of the file with
str = char( mmf.Data( huge_number+1 : huge_number+small_number_of_bytes ) )'
which gives a small number of rows.

Sign in to comment.

Categories

Asked:

on 16 Aug 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!