create csv files by extracting data from a starting csv file

Hi everyone, I am writing to you as not being familiar with matlab I do not know the commands and I would need your help to create a script.
I have a large csv file that cannot be opened with excel, within this file there are many lines that are composed of both numbers and letters, within these lines there are codes formed by letters and numbers (eg UPRE 14234) that identify the element that interests me and to which they belong, in total there are 21 different UPRE codes.
What I would like is to divide the initial csv file into 21 csv files each composed with the lines concerning only one of those UPRE codes.
How can I do? Can anyone tell me what functions to use and how to write the script?

 Accepted Answer

I suggest you work with lower level utilities:
  • grep or egrep if you are using Mac or Linux
  • findstr or select-string if you are using MS Windows
These will be much more efficient than the equivalent MATLAB code.
Another option is to make use of the fact that Perl is shipped with MATLAB.

18 Comments

Example:
infile = 'exp_misure_prelievo.txt';
outdir = 'UPRE_files';
UPREs = {'UPRE_S14SPLO_901', 'UPRE_S14LCRN_901'};
if ~exist(outdir, 'dir'); mkdir(outdir); end
for K = 1 : length(UPREs)
this_UPRE = UPREs{K};
outfile = fullfile(outdir, [this_UPRE '.txt']);
cmd = sprintf('FINDSTR "|\"%s\"|" < "%s" > "%s"', this_UPRE, infile, outfile);
[status, result] = system(cmd);
if status ~= 0
fprintf('Problem processing %s, output was:\n', this_UPRE);
fprintf('%s\n', result);
end
end
fprintf('Done\n');
it seems not to work, this is what it appears to me
I was reading off the file name exp_misure_prelievo from the Excel image you posted, but I did not know the file extension.
When I look at the list of files in your current folder, I see the MATLAB source code file Untitled.m and I see the new directory just created, but the only other thing I see is Name which I cannot tell anything about.
You should change the line
infile = 'exp_misure_prelievo.txt';
to reflect your actual input file name -- fully qualified if it is not in the current directory.
For example if that Name file is your input data, then
infile = 'Name';
ok now it created me two files but they are empty
We will need an actual file to test with.
Note: I have not booted into Windows for a few months, so I am not sure when I would be able to test findstr
The data you posted an image of should be enough for testing.
I don't know why the files it saves me come out empty
sorry I saw that you had defined the input file as a vector, instead of defining it so I imported the data on matlab, now having the data on matlab how can I extract them and create the csv files?
infile needs to hold the name of the file to be processed. Do not use importdata for it unless the file you are importing contains the name of the input file (and you are prepared for the fact that importdata will return a struct or table)
I am still waiting for the file of sample data.
The code I posted creates the csv files.
I don't understand which file you want, the starting file?
in https://www.mathworks.com/matlabcentral/answers/631569-create-csv-files-by-extracting-data-from-a-starting-csv-file#comment_1098684
you posted an image of the first 21 lines of your data file. Please extract those lines and attach them as a file. 21 lines should be enough to create a proof of concept.
That looks like it will be fine.
I have been trying to fall asleep for the last 7 hours so it will be a few hours before I am at my desk to test with (and more hours to get Windows booted... I haven't booted it for a month or more)
I tried it as is but the files it generates are empty.
hello, sorry for the trouble, did you by chance have the opportunity to try to create the csv files?
I managed to upgrade my Windows to the latest release and all the other update housekeeping tasks (except I didn't finish defragmenting the drive) . I managed to install Mac OS Catalina and use that to download all of the Mac and Windows install files and install R2020b on Catalina, which gives me the files I need to flip over to Windows to install R2020b. So I have everything ready.
... but by then it was close to 4am so I started playing a game that I had been waiting on for the last year, as it needed Catalina and I had always been too busy answering questions to install Catalina before.
The below is tested.
Please note that you will need to add all strings that you want searched for into the UPREs cell array, and you must use exact matches.
The command that is created is quite sensitive to which quotes are used and how many are used. The rules for quoting strings in MS Windows are not well documented, and are quite different from Unix. The rules for handling double-quotes are particularly strange.
This code expects a text file, not a .xlsx file.
infile = 'exp_misure_prelievo.txt';
outdir = 'UPRE_files';
UPREs = {'UPRE_S14SPLO_901', 'UPRE_S14LCRN_901'};
if ~exist(outdir, 'dir'); mkdir(outdir); end
for K = 1 : length(UPREs)
this_UPRE = UPREs{K};
outfile = fullfile(outdir, [this_UPRE '.txt']);
cmd = sprintf('FINDSTR "|"""%s"""|" "%s" > "%s"', this_UPRE, infile, outfile);
[status, result] = system(cmd);
if status ~= 0
fprintf('Problem processing %s, output was:\n', this_UPRE);
fprintf('%s\n', result);
end
end
fprintf('Done\n');
I had to do a lot of work to get to the point of being able to debug this problem. I only boot Windows every few months, and it turned out to be a truly remarkable amount of work to get my Mac to share some files with Windows.

Sign in to comment.

More Answers (2)

Without knowing all of the details of what is in your files it is hard to give a specific answer. However for general advice I would suggest using the matlab function readcell. This will read the entire file into a cell array. You can then work with the data in that cell array relatively easily indexing by rows and columns. Maybe you can do the processing you want on this overall cell array and it isn't even necessary to create many smaller csv files unless you need them for some other task.

1 Comment

the first line looks like this: ANNO|"MESE"|"GIORNO_H"|"ID_ELEMENTO"|"TIPO_ELEMENTO"|"VERSIONE"|"DATA_VAL_SAPR"|"COD_IMPIANTO"|"CODICE_UP"|"EEA"|"EUA"|"EEI"|"EUI"|"EEC"|"EUC"
from the second on they are like this:
2017|1|29-GEN-17 19:45:00|"PVP_S14SPLO_901"|"PVP"|1|19-MAG-17 10:59:40|"S14SPLO"|"UPRE_S14SPLO_901"|0|1476|0|60|0|12
UPRE_S14SPLO_901 this is the code that interests me.
I would need the 21 files to be able to rework them later.
I am attaching an image of the starting csv file.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!