reading in text files

I am try to read in 250 *.txt files. Each file resembles the attached picture. I have tried the following;
for k = 1:250
textFilename = sprintf('C58501_line_ssF_%04d.txt',k);
M = dlmread(textFilename,' ',1,0);
end
This reads the files in but not in a usable format. How do I go about loading these files in as 250 seperate files each one being a matlab file of the *.txt file without the header?
Thank you in advance.

14 Comments

What does ‘not in a usable format’ mean?

I need 250 separate files that are 7909x10 doubles after entering. Doing the above loop I get 1 7909x34 double. I am just looking how to get these all in and then create another loop to reduce those data sets to the 250 individual files that have the data from columns 1,2,3 and 6 for all the 250 files I am importing.

You're only getting one file because you're overwriting M each time you read the next file. You need to index M. I would suggest using cell arrays.

M(k) = {dlmread(textFilename,' ',1,0)};

That combined them so I can see all 250 files so thank you. However, it is still creating a file much larger than the .txt file. the .txt file is originally 7909x10. The code creates a 1x250 cell which works but inside the cell the 250 data sets are 7909x34 doubles. Is there anyway to keep that 1x250 cell and extract only the columns that I need out of each individual 7909x34 double and create another 1x250 cell?

To the best of my knowledge dlmread(), or any importer command besides xlsread(), cannot import only specific columns. You're going to need to import then entire thing and then redefine the matrix to only include the desired columns. This can be done in a new matrix, or by just redefining the old.

M{k} = M{k}(:,[1,2,3,6]); % Reshape original, all data outside of columns
                          % 1,2,3, and 6 will be lost
N{k} = M{k}(:,[1,2,3,6]); % Make a new matrix, keep all of the original
                          % data in M for later use

Awesome! thank you. Once I have done that I need to correct the two middle by a set value for each point, essentially normalizing them. I have tried something along the lines of

M(k) = {dlmread(textFilename,' ',1,0)};
    N{k} = M{k}(:,[1,2,3,6]);
    T{k} = M{k}(:,[1,2+10000,3+10000,6]);

but that just moves those cells 10000 values higher, correct?

Anything contained within the parentheses following an array is the indexing. Using [1,2+10000,3+10000,6] is still specifying a location, so it will look for columns 1, 6, 10002, and 10003. If you want to just add values then you need to do so outside of the indexing operation.

T{k}(:,[1,4]) = M{k}(:,[1,6]);
T{k}(:,[2,3]) = M{k}(:,[2,3]) + 10000;
"To the best of my knowledge dlmread(), or any importer command besides xlsread(), cannot import only specific columns. "
xlsread() can read ranges, but the columns must be consecutive.
readtable() can read ranges with consecutive columns.
If I recall correctly, if you are reading a text file, then if you adjust the properties returned by detectImportOptions(), it is possible to read only certain columns. I am not sure of this at the moment, though.
textscan() can be used with formats that deliberately skip data.
Note: if you specify a range for a .xml and you are not using MS Windows with Excel installed, then readtable() might end up reading all of the data and then throwing away the parts that are not wanted.
Thank you. Stepping back a few steps quickly, when I use
for k = 1:250
fileID = sprintf('C58501_line_ssF_%04d.txt',k);
M(k) = {dlmread(fileID,' ',1,0)};
end
it brings in the 250 text files I am looking for. I am looking for a way to do this where the data is directly copied over. Each data set is a 7909x10 and is being brought in as a 7909x34 double. What do I need to change about this to make it so that the text files come in directly as they are produced and reside as text files?
Another question on this... from the 7909 rows there are values of approximately 142 values where 1 is the ID or value in column 6, another 142 with ID equal to 2 and so on until the data set ends at 55. Can I filter this data based upon a range of values in column 6? So every line that has a 4,9,12,15 and 17 for example in column 6 (ID) would be pulled in and made a new data set?
Thank you.
The problem is the *.txt file has various values in each row of certain columns so matlab is separating them and grouping them, which I do not want.

Yes, it is possible to sort things based on specific values of rows. Going back to our previous example:

T{k} = T{k}(T{k}(:,4)==4||T{k}(:,4)==9||T{k}(:,4)==12||T{k}(:,4)==15||T{k}(:,4)==17,:);
% Note that this removes all other values of T{k}. If you want to keep them
% you need to save them as a different variable. Also note that all of the
% right side of the equation is indexing, there is no value adjustment.

Basically, you can include logic challenges in your indexing.

Please attach a sample file.
Here is a sample file. This file contains the same columns as the other 249, the sixth column will change slightly as it is the time of the particle flow path in days. I am concerned about columns 1,2,3 and 6 where column 1 is the ID, column 2 is the X position in m and column 3 is the Y position in meters. I need to import all the files as they are, reduce to those 4 columns, correct them so they are in a UTM layout by adding 10000 to every value in column 2 and 3 and sort out the rows where the ID is a set of numbers like 2,4,8,14 and 42 for example.

Sign in to comment.

Answers (1)

You should try tabularTextDatastore assuming everything has the same format.
ds = tabularTextDatastore(pathToFiles)
ds.SelectedVariableNames = ds.SelectedVariableNames([1,2,3,6]);
while hasdata(ds)
T = read(ds)
% do stuff
end

8 Comments

is

(pathToFiles)

the file names or do I enter this as "pathToFiles"?

pathToFiles would be a variable in which you have stored the name of the directory the files are in.
Im not quite sure what you mean by that? Do i set the files into a new folder and define a pathway in matlab for this to be accomplished? If so how do I go about that?
All files have a path and file name, you just need to know what it is. By specifying the full "pathToFiles" you are entering the path and the name.
For example you might have a file with the following designation:
C:\Users\Bob\Documents\Matlab\Testcode.m
In this case the file name is "Testcode.m" which is previously what you have been working with. The file path is the rest of the stuff showing where the file can be found. You will want to enter the full path and file name to specify the file and its location for the MATLAB script.
How do I do that in a process that reads all 250 files with names that are XYZ_0001 to XYZ_0250?
And then normalize columns 2 and 3 of the new output by the addition factors necessary to look at the data?
dinfo = 'C58501_line_ssF_%04d.txt';
pathToFiles = {dinfo.name};
ds = tabularTextDatastore(pathToFiles)
ds.SelectedVariableNames = ds.SelectedVariableNames([1,2,3,6]);
while hasdata(ds)
  T = read(ds)
  % do stuff
end

This returns an error of 'Struct contents reference from a non-struct array object'. Coming from the line

pathToFiles = {dinfo.name};

Sign in to comment.

Asked:

on 26 Mar 2018

Commented:

on 27 Mar 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!