How do I delimit (and replace) multiple CSV files in a loop?
9 views (last 30 days)
Show older comments
Alex Chen
on 4 Oct 2018
Commented: Walter Roberson
on 4 Oct 2018
I have several CSV files that contain data separated by commas (x, y, z) in column A. The headers are strings with parentheses ("XXXXXXXXXX (XX)", "YYYYYYYYYY (YYY)", "ZZZZZZZZZ"), while the data (x, y, z) are numerical values. Usually, I would highlight the A column and delimit the columns in Excel in order to separate my data into three separate columns in order to analyze. However, I have a lot of these files now, and I was wondering if I could write a for loop to read these files, delimit, replace the text in the original file in the correct number of columns, and save the new file again.
I tried csvread, but it's having trouble reading the commas. A sample CSV file is in the attachments. Thanks for any help in advance.
4 Comments
Accepted Answer
Walter Roberson
on 4 Oct 2018
dinfo = dir('*.csv');
filenames = {dinfo.name};
for K = 1 : length(filenames)
thisfile = filenames{K};
[~, basename, ~] = fileparts(thisfile);
newfile = [basename '.txt'];
S = fileread(thisfile);
S(S=='"') = ''; %delete all double-quotes
fid = fopen(newfile, 'w');
fwrite(fid, S);
fclose(fid);
end
This writes to a different file name to avoid processing the same file more than once.
If you really insisted you could write to the same file again. You do run the risk of repeatedly processing the same file if you run the same code again, but in this special case it should be harmless since the code does not actually assume that there are any double-quotes in the file. If you needed the column header to have "" around each of the parts then you would need more work for the case of processing an already-processed file, but it would not be too bad.
2 Comments
Walter Roberson
on 4 Oct 2018
I ran my code on the sample csv. The .txt it produced was identical to your attached desired csv, with the exception that your desired csv has transformed the two 0.0 into 0 .
I said explicitly that the code produces a new file so that the unprocessed files can be distinguished from the processed files. Any time you are experimenting with transforming files, it is always better to output to a different file so that you do not run the risk of a failure leaving your only copy of the file corrupted.
With more complicated transformations it is sometimes important that you can tell an unprocessed file from a processed file because the transformation might destroy information. For example suppose part of the transformation had been to remove an extra column, then if you write the output back to the same file and then ran the code again, then you would end up having deleted two columns.
In this particular case, re-running the transformation is harmless, provided that no corruption occurs during the writing.
You can change the line
fid = fopen(newfile, 'w');
to
fid = fopen(thisfile, 'w');
but if you do then you have those risks.
There are additional steps you can take, such leaving the fid = fopen(newfile, 'w'); as-is but adding after the fclose() code such as
bakfile = [basename '.bak'];
movefile(thisfile, bakfile);
movefile(newfile, thisfile);
then you would get .csv outputs but the original file would be preserved with .bak in case you needed to restore it.
More Answers (1)
KSSV
on 4 Oct 2018
files = dir('*.csv') ;
N = length(files) ;
for i = 1:N
[data,txt,raw] = xlsread(files(i).name) ;
% data are your numbers..do wjat you want
end
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!